Adobe Flash logo On June 30 Google and Adobe announced a new indexer optimized for Flash (SWF) discovered by its web crawlers. The new partnership takes advantage of a server-side Flash player optimized for a search engine indexing environment and unidirectional text (e.g. no Hebrew or Arabic). Search engines previously discovered the location of a SWF file on the Web and perhaps indexed its metadata but did not take a deep look inside its binary content. Last month’s announcement was a big change for both Adobe and major search engines as it is now possible to run a very GUI-based Flash file at the command line and interpret both its text content and interaction opportunities. In this post I will walk through what we currently know about the search engine Flash runtime and how it affects search engine optimization in Flash.

Build for a blind, deaf user

Search engine indexers are blind and deaf. They open a file, examine its contents, and try to deduce meaning through your page structure and its content. A web page designed for screen readers will also expose more content to search engines not evaluating your page’s full render state of content, layout, and interactions.

Search engines utilizing Flash player indexing are still restricted to this screen reader approach. Accessible Flash applications complete with names, labels, reading order and XMP should continue to be more search engine friendly than other SWF files on the Web. Google’s tips for creating accessible, crawlable sites still apply, but in a new Flash context.

The server-side Flash Player

If we want to understand how search engines such as Google might interpret Flash content we’ll first need to take a look at the Flash Player itself. Adobe provides little details in its official SWF searchability FAQ but we can infer a few implementation details. How would you rewrite Flash Player for server-side indexing of SWF content?

The search engine Flash Player is likely a scaled-down, secure version optimized for machine readers. Strip out video, audio, fonts, and file system access. The server side Flash player should open a binary SWF file, pull out the functionality it understands, and create a data tree of all possible actions. These features are actually quite similar to a screen reader interface, but Adobe is instead targeting a Linux-based headless runtime. I believe the guts of the Flash Player for servers is built using the same accessibility abstraction layer Adobe currently uses for Windows and could extend to platform-level binds Mac and Linux desktops.

The Adobe Flash Player creates a list of objects on the screen at each render and records this list into an accessible data tree (according to a 2005 white paper by Bob Regan). This data tree is updated with each change in the application state, allowing any application listening in to update an object model of clickable buttons, labels, and links.

Adobe interfaces with OS-level accessibility frameworks on Windows currently and could extend this model to every major desktop platform. The Windows version of Flash Player binds to Microsoft Active Accessibility. Mac versions of Flash could bind to Universal Access. On GNOME the player could bind to the Assistive Technology Service Provider Interface (at-spi). A server-side version of Flash likely builds upon this same abstracted accessibility object model, passing screen objects to the search engine indexer for further interpretation or interaction.

Windows Live Search was noticeably missing from the server-side Flash player announcement for search engines. It’s possible Adobe has developed a server-side Flash Player for Linux that is not yet compatible with the Windows Server environment of Microsoft’s Windows Live Search.

Accessing deep content

Googlebot can fill out forms, click buttons, and navigate deep within your site. Clickable Flash objects will likely behave the same way, exposing new content paths for Googlebot within your larger SWF. Flash websites can help ensure deep indexing of SWF content by adding individual SWF fragments to their sitemap. Reading order will likely play a roll in selecting important content on your page, and I expect Googlebot may follow the first item in your reading order sooner than the last.

Googlebot still throws out references to a anchor name fragment in the URL (e.g. #section=menu) and this announcement does not change the general behavior of Google’s URL storage and analysis.

Do Flash versions matter?

Emperor Tamarin monkey

The official announcement from Google and Adobe makes it seem like all Flash is now universally indexed regardless of your Flash version but I think that’s bogus. If a search engine wanted to index JavaScript they might run Rhino on the server and interpret results. If you wanted to build an advanced interpreter of Flash content you might use Tamarin or its derivatives, an AVM2 (Flash 9+) virtual machine. I believe AVM2-compatible SWF files will enjoy better search exposure than binaries built for the older AVM. I can’t prove it; just a hunch.

Dynamic object insertion

Googlebot will detect common JavaScript libraries such as SWFObject used to dynamically insert Flash content at page load. Publishers can back up the dynamic insertion JavaScript with a noscript element just in case Google doesn’t discover your dynamic insertion. Sticking with standard dynamic insertion libraries will help ensure your content is discovered through expected behaviors.

Summary

The new search version of Flash Player opens the binary SWF format to interpretation by text-focused search engines. Flash developers can take additional steps to package SWF content for accessibility and search discoverability. Developing for modern virtual machines, adding accessibility hooks, and wrapping your SWF in XMP.