Jul 282011

Repositories can be a little dry.  They can turn people off and take up can be constrained because repositories are, historically, a little drab.  JISC’s Kultur project looked in to this problem and discovered that researchers (especially from the Humanities) want some pizazz in the tools they use and so added lightboxes, thumbnails and slideshows to EPrints.  This part of EXPLORER aims to implement the same functionality.  There are two factors that make this a complex task.  The first is that EPrints is created using similar but fundamentally different technologies to DSpace.  The second is that research archives may contain many types of output from research.

A light box displaying thumbnails might look like this:

A lightbox with thumbnails

A lightbox with thumbnails of just images

Most examples of lightboxes showing thumbnails contain thumbnails of images.  Mixing thumbnails of images, videos, documents and datasets is hard when you consider that there are many types of document, many types of video, many types of dataset and that the first portion of an object, as a thumbnail, may misrepresent the research output.

The approach to take will be to expect the contributor to offer a thumbnail and put up a default thumbnail as an alternative.

As usual, we will try to use open source libraries for most of this portion of EXPLORER but from experience we expect most libraries will be an ill fit.  If you imagine a JavaScript library that displays thumbnails of images with a lightbox or slideshow of larger images, processing of the larger images to thumbnails must be carried out and the thumbnails must be stored or processed in real time.  The libraries will be unaware that their context is DSpace running as a Tomcat application.  As mentioned, most libraries will not be able to create thumbnails from all objects in the repository.

UPDATE 10th August

DSpace has a feature which creates thumb nails as a batch job.  This is the preferred route for thumbnail creation.  However, there is no support for video thumbnails and there are questions over the quality of thumbnails created.

Twitter’s @jukesie pointed me at:

Update 11th August
Discovered that you can upload thumbnails to an item but they are in a ‘bundle’ related to the item and are not associated with a bitstream.  I’m looking at a work around that allows this:
DSpace Thumbs And Bitstreams

DSpace Thumbs And Bitstreams

Without more code we can’t change the order of the bitstreams through this administration page.  The first thumbnail is the thumbnail that will be associated with the item but I might be able to use the description to imply the order.  ‘2 main’ would become the main thumbnail associated with the item.  This idea implies that the first 3 bitstreams of the item will have the 3 thumbnails.  If there are 4 bitstreams we need an extra thumbnail.  If ‘main’ appears by itself then it will be associated with the item but not displayed against a bitstream.  There might not be a bitstream bundled with the item.

Update 11th August 4.45pm

The above works for browsing.  Not too painful for administration.

 Update 26th August

This snippet of code allows the THUMBNAIL bundle to be available when processing the CONTENT bundle having sorted the THUMBNAILs by description:

      <xsl:variable name="thumbs">
        <xsl:for-each select="mets:fileSec/mets:fileGrp[@USE='THUMBNAIL']/mets:file/mets:FLocat[@LOCTYPE='URL']">
          <xsl:sort select="@xlink:label" />
          <xsl:copy-of select="."/>

The following effectively zips up the to bundles:

 <xsl:for-each select="mets:fileSec/mets:fileGrp[@USE='CONTENT']/mets:file/mets:FLocat[@LOCTYPE='URL']">
 <xsl:variable name="pos" select="position()" />
        <xsl:variable name="description" select="@xlink:label"/>
        <xsl:variable name="format" select="@xlink:label"/>
 <xsl:variable name="thumbhref" select="exsl:node-set($thumbs)/*[$pos]/@xlink:href"/>
        <xsl:variable name="ahref" select="@xlink:href"/>

There is a problem.  The THUMBNAIL bundle will more or less need to be the same length as the CONTENT bundle.  It may be shorter but it can’t have gaps.  Administratively, this mean that dummy place holders may be needed to achieve the effect.

Complicating matters more I want to extend the proof of concept to include more control over the THUMBNAIL and CONTENT bundles by adding more to the THUMNAIL bundle descriptions:

Bitstreams and Descriptions

Bitstreams and Descriptions


Hidden in the descriptions of the THUMBS there are 3 layouts; a column of thumbnails and their attachments, a video video_1 and a carousel which displays thumbs using a sliding lightbox.  With the proof of concept code, right now, the three thumbs of video_1 all displayed.  This needs to be changed so only one thumbnail is used because video_1 is made up of an MP4, WebM and OGV video.

There are lots of ways to make life easier and lots to make them harder.  Through culture we could expect users not to include DOCs, PDFs etc in the carousel and have them displayed separately.  I haven’t come across, yet, a library that deals with every media type.  Video and image might be reasonable.  Video, image and audio may be going a step to far and including other media types in the carousel where there aren’t any handlers native to the browser (on any platform big or small) seems much too far.

 Posted by at 9:43 am
Jul 272011

Video delivered over the web is difficult.  Really difficult.  Firstly, like the “Browser Wars” depicted here:

Firefox and Chrome duke it out with IE and Safari

Firefox and Chrome duke it out with IE and Safari

there is now a war, or skirmishes, between Google, Adobe, Apple, Mozilla and lastly Microsoft.  Before Real’s realplayer and later Adobe’s Flash video (FLV) there were as many video formats as there were software vendors in the Microsoft ecosystem.  At the height of Real and Adobe’s success there was a problem in that the players and the video codecs were restricted to two (popular) platforms.  They were not accessible to everyone.  Adobe went further than Real to make Flash work in more places and video on the web has gained huge traction.  Apple to compete has worked hard with Quicktime and lately has concentrated on MP4 with h.264 video encoding.  MP4 is supported by Adobe’s Flash but Flash breaks the user experience on the web and breaks accessibility rules.  The answer to this would appear to HTML5 with its support for the <video> mark up.  Currently, there are two battles: the battle between Adobe’s Flash and the HTML5 way of doing things.  HTML5 will win.  Adobe is already converting it product to produce HTML5 rather than Flash and Flex.  The other battle which is a bigger problem in this context is the format war.

Many devices are capable of playing video.  There are three ‘popular’ platforms for video playback: Apple, Android and Microsoft that exist on the desktop and as mobile devices.  There are also all of the Linux devices (desktop, mobile and consumer devices like TV.) Apple (especially), Android and Microsoft are picky in how video should be packaged for playback with little overlap.  Linux is of course more forgiving.  There are, at least, 3 ways to package video (the video and the audio) and, at least, 6 codecs to use.  Mozilla would have us use the Ogg family of codecs because they believe that that are licence free and that they don’t impinge on other software patents.  Patents are not a problem in every jurisdiction but commercial organisations tend to work with what works in the USA.  This work would be easier without patents on software but here we are.  Microsoft are currently siding with Apple.  Apple will gain from monies collected when h.264 (video) is no longer free and their own audio codec tends to be default in MP4.  Microsoft don’t have their own codec.  Google have been flexible but recently bought On2 and have developed VP8 (video) as WebM.  Google are dropping support for h.264 in their ChromeOS.  Flash is run on Android right now.  One of the other factors is that hardware support has favoured h.264 video.  Other formats must be easier to decode (and therefore bigger in file size) on, what I call, compromised devices.  Later we will see hardware support for WebM depending on how the format war plays out.

What does this little lot mean to this project?  Mostly, it means compromise.  At some point in the future it will be easier to code video and HTML to deliver video from a streaming server but right now you either have to code for a restricted set of clients or right ‘clever’ code that inspects the web browser and delivers the correct video format in the correct mark up.  I found a project that is free to use Video JS that combines Javascript, browser interrogation and some document re-writing to present the correct video and HTML5 or Flash HTML code.  With three versions of the same video encoded for the lowest denominator of mobile devices the video should play back on nearly all recent devices.

That’s great but unfortunately, for this project, there is a further wrinkle.  The above works but you need a web server that can stream video and that is sympathetic to the client.  That it will allow the client to jump to portions of the video using its chosen method.  This works with Apache (with h264_streaming_module and some work arounds for Apple mobile clients) but we’re dealing with Apache’s Tomcat which doesn’t have the h264_streaming_module available to it and the DSpace code doesn’t have anything similar (that I know of).

The problem we want to solve is the ability to jump in the video to a specific part.  I took two approaches to solving this:

1. I looked at code that supports byte streaming.  Byte streaming allows ‘clever’ browsers to ask a web server for a chunk of a file from anywhere within the file.  This is the way forward and it’s likely that all web clients will support this.  The function is supported in Tomcat (Cocoon) but is switched off in DSpace because it broke a popular PDF client.  I switched it back on and hoped that we’ll find a work around for broken PDF readers.  I created PHP that is called by Apache when the video ends in .webm, .mp4, .ogg.  The PHP supports Byte Streaming.  This method doesn’t support Flash and so, breaks support for Android and old web browsers that only support Flash and not HTML5 Video.

2. Inspiration hit.  I thought I’d cracked the problem.  I know that I can, broadly, support video using Apache.  One way would be to create a video streaming server that uses VideoJS (h264_streaming_module and my work arounds for Apple) and store the video objects there.  This in practice would break the paradigm of a repository.  Only the URL pointing to the video would be kept in DORA there would be an administrative overhead should we change DORA; we would have to remember to change the streaming server too.  Inspiration came in the form of redirects.  I thought that, if I can create a symbolic link to the object (which has a meaningless name) on the Tomcat file system with a file name that has meaning to Apache I could get Tomcat to ask the browser to redirect in the same transaction handing the streaming responsibility to Apache.  The correct HTML is delivered to the browser.  The symbolic link is created if it hasn’t been created before and the browser is told to get the video from Apache instead.  The solution was quick to code but, and it’s a big but, where all web clients support redirects in a general way not all browsers support redirects within their video playback functionality.  This includes all Flash playback.

As it stands, we are using method 1 because if we don’t come back to it this is the method that should be supported later by newer browsers and mobile devices.  I would, given the chance, see if there is a streaming solution for Tomcat that supports Byte Streaming and Flash type streaming.

Jul 202011

As a representative of the EXPLORER project I attended the Kultivate conference on 15th July. 

It was an excellent opportunity to talk to people about the project and to find out what Kultivate has been up to.  To find out more about Kultivate go to http://www.vads.ac.uk/kultur2group/projects/kultivate/.  My thanks to Marie-Therese for giving us the opportunity to join in.

Many discussions focused on how to display non-text outputs in repositories and it appears that the work we are doing on EXPLORER will be of help to other Dspace users in this area.  We are able to display Videos correctly within the repository and are working on other media formats.  Expect an update from Owen soon on this. 

Otherwise we are currently working on some new advocacy materials for DORA and reviewing policies and protocols.  The DMU website it currently being updated and DORA will be included in this later in the year.  More on this soon we hope.