When I lived in L.A. it seemed like everyone wanted to be a movie star. The Starbucks barista waiting to be discovered as he pronounced “Frappuccino,” friends scheming to be placed on a reality show and win a trip to a tropical island, and the many writers trying to get their latest script into the hands of Steven Spielberg. The recent boom in online video and its associated capture hardware has created a new class of stars. The next American Idol might submit a cover song to YouTube and video of a child’s first steps are uploading to the Web for the world to see. How can search engines discover these new sources of video, extract relevant information, and successfully handle user queries? In this post I will take a look at the present state of video search, how machines make sense out of movies, and take a peek inside the state of the art.
A multiplexed video file contains a set of sequenced pictures with an accompanying audio track. Digital video often adds a header describing the recorded work to assist in playback and location. Due to a video’s composition many technologies from image search and audio search still apply, but with a few optimizations to take advantage of a larger amount of correlated data.
A general search engine contains links from all over the Web, including links to video files. A specialized video index may be formed by combing through a link index looking for links adhering to known file extensions:
- Audio Video Interleave, an older format popular on Windows machines.
- QuickTime container, popular on Apple computers.
- MPEG-4 Part 14 files. M4V is a popular expression used by Apple’s iTunes.
- Windows Media video.
- Streaming video using Microsoft technologies. The Advanced Streaming Format is not exclusive to video as it may contain streaming audio.
- Adobe Flash videos.
- DivX Media Format
- 3G mobile phone format
- RealVideo format by RealNetworks.
- MPEG-1 or MPEG-2 video file.
- Theora video format.
Windows Live Search allows users to restrict searches to pages containing links to files containing one or more file extensions, such as a search for mov, wmv, or m4v files on pages mentioning “dance.” Bookmarking site del.icio.us uses this method to identify video bookmarked by its users.
Video found in the wild are often described and referenced within HTML pages. Here’s an example of how an audio file might be described within a web page link:
<a href="firststeps.mov" type="video/quicktime" hreflang="en-us" title="A longer description of the target video"> A short description</a>
href attribute points to the location of the video file. The
type value provides a hint for user agents about the type of file on the other end of the link. The
hreflang attribute communicates the base language of the linked content. The
title attribute provides more information about the linked resource, and may be displayed as a tooltip in some browsers. The element value, “A short description,” is the linked text on the page.
It’s not very likely publishers will produce more data than the functional effort of
href. Title is a semi-visible attribute and therefore more likely to be included in the description, but still uncommon. It’s possible to identify video by a given MIME type such as video/quicktime but few sites provide the advisory hint of
type in their HTML markup. Collecting a file’s MIME type requires “touching” the remote file, and will most likely return default values of popular hosting applications such as Apache or IIS, so a search engine is likely better off relying on a local list of mapped extensions and helper application behaviors.
Some videos are embedded in the page, complete with plugin handler descriptions that allow a webpage viewer to play back the audio file directly from its page context. This content may take the form of an
object or an
embed in the page markup. The old-style
embed element seems to be preferred by the autogenerated HTML of popular video sites, presumably for backwards compatibility with more web browsers. Embedded content often specifies a preferred handler plugin and possibly a “movie” parameter, but it’s difficult to tell from the markup if the referenced file is a video.
A search engine may apply special handling to embeds from well known video hosts to gather link data for resource discovery and ranking. A YouTube video embed references the same identifier used to construct the URL of the full web page, and could be counted towards that page’s total citations.
It is possible for a publisher to provide more information about a video item and its alternate formats using a syndication format namespace extension such as Yahoo! Media RSS. Details such as bitrate, framerate, audio channels, rating, thumbnail, total duration, and even acting credits can be applied to information about the remote resource without actually “touching” the file. This method is currently used by large publishers such as CNN to provide Yahoo! with constant updates for its sites.
Producers of Quicktime, MPEG-4, or H.264 video may provide more information about their content using Apple’s podcasting namespace. Extra information such as subtitle, total duration, rating, thumbnail, and keywords may be associated with video content using this namespace. This data is displayed in the iTunes Store and by other compatible applications.
MPEG-4 video (drawing by Apple)