Free(dom) software, GNU/Linux, Fish, Family

Jul 282011

Repositories can be a little dry.  They can turn people off and take up can be constrained because repositories are, historically, a little drab.  JISC’s Kultur project looked in to this problem and discovered that researchers (especially from the Humanities) want some pizazz in the tools they use and so added lightboxes, thumbnails and slideshows to EPrints.  This part of EXPLORER aims to implement the same functionality.  There are two factors that make this a complex task.  The first is that EPrints is created using similar but fundamentally different technologies to DSpace.  The second is that research archives may contain many types of output from research.

A light box displaying thumbnails might look like this:

A lightbox with thumbnails

A lightbox with thumbnails of just images

Most examples of lightboxes showing thumbnails contain thumbnails of images.  Mixing thumbnails of images, videos, documents and datasets is hard when you consider that there are many types of document, many types of video, many types of dataset and that the first portion of an object, as a thumbnail, may misrepresent the research output.

The approach to take will be to expect the contributor to offer a thumbnail and put up a default thumbnail as an alternative.

As usual, we will try to use open source libraries for most of this portion of EXPLORER but from experience we expect most libraries will be an ill fit.  If you imagine a JavaScript library that displays thumbnails of images with a lightbox or slideshow of larger images, processing of the larger images to thumbnails must be carried out and the thumbnails must be stored or processed in real time.  The libraries will be unaware that their context is DSpace running as a Tomcat application.  As mentioned, most libraries will not be able to create thumbnails from all objects in the repository.

UPDATE 10th August

DSpace has a feature which creates thumb nails as a batch job.  This is the preferred route for thumbnail creation.  However, there is no support for video thumbnails and there are questions over the quality of thumbnails created.

Twitter’s @jukesie pointed me at:

Update 11th August
Discovered that you can upload thumbnails to an item but they are in a ‘bundle’ related to the item and are not associated with a bitstream.  I’m looking at a work around that allows this:
DSpace Thumbs And Bitstreams

DSpace Thumbs And Bitstreams

Without more code we can’t change the order of the bitstreams through this administration page.  The first thumbnail is the thumbnail that will be associated with the item but I might be able to use the description to imply the order.  ‘2 main’ would become the main thumbnail associated with the item.  This idea implies that the first 3 bitstreams of the item will have the 3 thumbnails.  If there are 4 bitstreams we need an extra thumbnail.  If ‘main’ appears by itself then it will be associated with the item but not displayed against a bitstream.  There might not be a bitstream bundled with the item.

Update 11th August 4.45pm

The above works for browsing.  Not too painful for administration.

 Update 26th August

This snippet of code allows the THUMBNAIL bundle to be available when processing the CONTENT bundle having sorted the THUMBNAILs by description:

      <xsl:variable name="thumbs">
        <xsl:for-each select="mets:fileSec/mets:fileGrp[@USE='THUMBNAIL']/mets:file/mets:FLocat[@LOCTYPE='URL']">
          <xsl:sort select="@xlink:label" />
          <xsl:copy-of select="."/>

The following effectively zips up the to bundles:

 <xsl:for-each select="mets:fileSec/mets:fileGrp[@USE='CONTENT']/mets:file/mets:FLocat[@LOCTYPE='URL']">
 <xsl:variable name="pos" select="position()" />
        <xsl:variable name="description" select="@xlink:label"/>
        <xsl:variable name="format" select="@xlink:label"/>
 <xsl:variable name="thumbhref" select="exsl:node-set($thumbs)/*[$pos]/@xlink:href"/>
        <xsl:variable name="ahref" select="@xlink:href"/>

There is a problem.  The THUMBNAIL bundle will more or less need to be the same length as the CONTENT bundle.  It may be shorter but it can’t have gaps.  Administratively, this mean that dummy place holders may be needed to achieve the effect.

Complicating matters more I want to extend the proof of concept to include more control over the THUMBNAIL and CONTENT bundles by adding more to the THUMNAIL bundle descriptions:

Bitstreams and Descriptions

Bitstreams and Descriptions


Hidden in the descriptions of the THUMBS there are 3 layouts; a column of thumbnails and their attachments, a video video_1 and a carousel which displays thumbs using a sliding lightbox.  With the proof of concept code, right now, the three thumbs of video_1 all displayed.  This needs to be changed so only one thumbnail is used because video_1 is made up of an MP4, WebM and OGV video.

There are lots of ways to make life easier and lots to make them harder.  Through culture we could expect users not to include DOCs, PDFs etc in the carousel and have them displayed separately.  I haven’t come across, yet, a library that deals with every media type.  Video and image might be reasonable.  Video, image and audio may be going a step to far and including other media types in the carousel where there aren’t any handlers native to the browser (on any platform big or small) seems much too far.

 Posted by at 9:43 am
Jul 272011

Video delivered over the web is difficult.  Really difficult.  Firstly, like the “Browser Wars” depicted here:

Firefox and Chrome duke it out with IE and Safari

Firefox and Chrome duke it out with IE and Safari

there is now a war, or skirmishes, between Google, Adobe, Apple, Mozilla and lastly Microsoft.  Before Real’s realplayer and later Adobe’s Flash video (FLV) there were as many video formats as there were software vendors in the Microsoft ecosystem.  At the height of Real and Adobe’s success there was a problem in that the players and the video codecs were restricted to two (popular) platforms.  They were not accessible to everyone.  Adobe went further than Real to make Flash work in more places and video on the web has gained huge traction.  Apple to compete has worked hard with Quicktime and lately has concentrated on MP4 with h.264 video encoding.  MP4 is supported by Adobe’s Flash but Flash breaks the user experience on the web and breaks accessibility rules.  The answer to this would appear to HTML5 with its support for the <video> mark up.  Currently, there are two battles: the battle between Adobe’s Flash and the HTML5 way of doing things.  HTML5 will win.  Adobe is already converting it product to produce HTML5 rather than Flash and Flex.  The other battle which is a bigger problem in this context is the format war.

Many devices are capable of playing video.  There are three ‘popular’ platforms for video playback: Apple, Android and Microsoft that exist on the desktop and as mobile devices.  There are also all of the Linux devices (desktop, mobile and consumer devices like TV.) Apple (especially), Android and Microsoft are picky in how video should be packaged for playback with little overlap.  Linux is of course more forgiving.  There are, at least, 3 ways to package video (the video and the audio) and, at least, 6 codecs to use.  Mozilla would have us use the Ogg family of codecs because they believe that that are licence free and that they don’t impinge on other software patents.  Patents are not a problem in every jurisdiction but commercial organisations tend to work with what works in the USA.  This work would be easier without patents on software but here we are.  Microsoft are currently siding with Apple.  Apple will gain from monies collected when h.264 (video) is no longer free and their own audio codec tends to be default in MP4.  Microsoft don’t have their own codec.  Google have been flexible but recently bought On2 and have developed VP8 (video) as WebM.  Google are dropping support for h.264 in their ChromeOS.  Flash is run on Android right now.  One of the other factors is that hardware support has favoured h.264 video.  Other formats must be easier to decode (and therefore bigger in file size) on, what I call, compromised devices.  Later we will see hardware support for WebM depending on how the format war plays out.

What does this little lot mean to this project?  Mostly, it means compromise.  At some point in the future it will be easier to code video and HTML to deliver video from a streaming server but right now you either have to code for a restricted set of clients or right ‘clever’ code that inspects the web browser and delivers the correct video format in the correct mark up.  I found a project that is free to use Video JS that combines Javascript, browser interrogation and some document re-writing to present the correct video and HTML5 or Flash HTML code.  With three versions of the same video encoded for the lowest denominator of mobile devices the video should play back on nearly all recent devices.

That’s great but unfortunately, for this project, there is a further wrinkle.  The above works but you need a web server that can stream video and that is sympathetic to the client.  That it will allow the client to jump to portions of the video using its chosen method.  This works with Apache (with h264_streaming_module and some work arounds for Apple mobile clients) but we’re dealing with Apache’s Tomcat which doesn’t have the h264_streaming_module available to it and the DSpace code doesn’t have anything similar (that I know of).

The problem we want to solve is the ability to jump in the video to a specific part.  I took two approaches to solving this:

1. I looked at code that supports byte streaming.  Byte streaming allows ‘clever’ browsers to ask a web server for a chunk of a file from anywhere within the file.  This is the way forward and it’s likely that all web clients will support this.  The function is supported in Tomcat (Cocoon) but is switched off in DSpace because it broke a popular PDF client.  I switched it back on and hoped that we’ll find a work around for broken PDF readers.  I created PHP that is called by Apache when the video ends in .webm, .mp4, .ogg.  The PHP supports Byte Streaming.  This method doesn’t support Flash and so, breaks support for Android and old web browsers that only support Flash and not HTML5 Video.

2. Inspiration hit.  I thought I’d cracked the problem.  I know that I can, broadly, support video using Apache.  One way would be to create a video streaming server that uses VideoJS (h264_streaming_module and my work arounds for Apple) and store the video objects there.  This in practice would break the paradigm of a repository.  Only the URL pointing to the video would be kept in DORA there would be an administrative overhead should we change DORA; we would have to remember to change the streaming server too.  Inspiration came in the form of redirects.  I thought that, if I can create a symbolic link to the object (which has a meaningless name) on the Tomcat file system with a file name that has meaning to Apache I could get Tomcat to ask the browser to redirect in the same transaction handing the streaming responsibility to Apache.  The correct HTML is delivered to the browser.  The symbolic link is created if it hasn’t been created before and the browser is told to get the video from Apache instead.  The solution was quick to code but, and it’s a big but, where all web clients support redirects in a general way not all browsers support redirects within their video playback functionality.  This includes all Flash playback.

As it stands, we are using method 1 because if we don’t come back to it this is the method that should be supported later by newer browsers and mobile devices.  I would, given the chance, see if there is a streaming solution for Tomcat that supports Byte Streaming and Flash type streaming.

Mar 082011

Project AIR was concerned with the automatic and hands off population of OAI-PMH compliant Open Research Archives with linked data taken from institutional web sites .  A high aim indeed which is recognised and mollerated.  The tool crawls a web site to find research output for academics local to the institution.  Machine learning combined with a web crawler make up the meat of the project.  It is hard to extract linked data from a content management system where the content is at best prescribed and at worst free prose but either way not compatible with any system linking to a repository.  Another way to do this is by developing a culture around the submission of research outputs to the group responsible for the repository.  Cultures are very fluid things, they fade in and out of existence in different parts of an organisation and therefore a tool like this helps even if it doesn’t complete the effort.  The application of this tool to DORA will provide another way of adding outputs to the repository without manual input and repeated effort which is always of benefit to all users.

 Posted by at 9:16 am
Mar 082011

The Kultur project was funded by JISC. It was undertaken by Universities of Southampton, Arts London, Creative Arts and the Visual Arts Data Service. The project hoped to address the need to manage non-text outputs effectively. The primary aim was to establish a shared practice across the sector which DMU hopes to benefit from.  The project effected this, essentially, by changing the way the popular EPrints Open Research Archive deals with different media; images and video, by looking at workflow, metadata, preservation, curation, IPR and by looking at the way different types of users approach typically cold or sterile user interfaces of research archives or repositories.  Repositories should go well beyond just being used for REF outputs!

The project worked with core standards; Dublin Core for metadata and the Joint Academic Coding System JACS for its subject classification  but the project also deviated from standards in line with the users’ needs.  Two profiles were looked adding the profiles; Images Application Profile and Moving Images Application Profile to EPrints.  The Moving Images Application Profile wasn’t finished before the end of the project but work continues.  It’s likely that the code will need to be re-visited as HTML5 matures.

In the implementation of the project:

  • the range of research activity was captured
  • metadata was added to to reflect the new media’s attributes
  • real examples were found early in the project
  • provision was made for digital capture of some research outputs
  • a user survey was performed
  • features added : lightbox, slideshow (and video viewer)
  • more features: tabbed abstract pages, visual search results

As part of the EXPLORER project we are looking to apply what was done and learnt during KULTUR to our repository.  De Montfort University’s Open Research Archive is built using DSpace which is a very different animal to EPrints.  On initial inspection, DSpace has the hooks for code to treat objects, in the repository, with differing media types accordingly.  This would probably mean a different HTML markup for each media type which will then be handled by the CSS or Javascript to perform the magic.  The university’s research archive officer and technical lead have said that DSpace will support differing workflows and styles which can be selected by the user.

We will work either with the latest DSpace stable code or with the SVN.  We will also look for DSpace plugins that match our needs or can be adapted.  We will keep the Blog updated with our progress.