July 2008 Archives

  1. Jul23

    Writing Flash for search engines

    Flash logo On June 30 Google and Adobe announced a new indexer optimized for Flash (SWF) discovered by its web crawlers. The new partnership takes advantage of a server-side Flash player optimized for a search engine indexing environment and unidirectional text (e.g. no Hebrew or Arabic). Search engines previously discovered the location of a SWF file on the Web and perhaps indexed its metadata but did not take a deep look inside its binary content. Last month's announcement was a big change for both Adobe and major search engines as it is now possible to run a very GUI-based Flash file at the command line and interpret both its text content and interaction opportunities. In this post I will walk through what we currently know about the search engine Flash runtime and how it affects search engine optimization in Flash.

    Build for a blind, deaf user

    Search engine indexers are blind and deaf. They open a file, examine its contents, and try to deduce meaning through your page structure and its content. A web page designed for screen readers will also expose more content to search engines not evaluating your page's full render state of content, layout, and interactions.

    Search engines utilizing Flash player indexing are still restricted to this screen reader approach. Accessible Flash applications complete with names, labels, reading order and XMP should continue to be more search engine friendly than other SWF files on the Web. Google's tips for creating accessible, crawlable sites still apply, but in a new Flash context.

    The server-side Flash Player

    If we want to understand how search engines such as Google might interpret Flash content we'll first need to take a look at the Flash Player itself. Adobe provides little details in its official SWF searchability FAQ but we can infer a few implementation details. How would you rewrite Flash Player for server-side indexing of SWF content?

    The search engine Flash Player is likely a scaled-down, secure version optimized for machine readers. Strip out video, audio, fonts, and file system access. The server side Flash player should open a binary SWF file, pull out the functionality it understands, and create a data tree of all possible actions. These features are actually quite similar to a screen reader interface, but Adobe is instead targeting a Linux-based headless runtime. I believe the guts of the Flash Player for servers is built using the same accessibility abstraction layer Adobe currently uses for Windows and could extend to platform-level binds Mac and Linux desktops.

    The Adobe Flash Player creates a list of objects on the screen at each render and records this list into an accessible data tree (according to a 2005 white paper by Bob Regan). This data tree is updated with each change in the application state, allowing any application listening in to update an object model of clickable buttons, labels, and links.

    Adobe interfaces with OS-level accessibility frameworks on Windows currently and could extend this model to every major desktop platform. The Windows version of Flash Player binds to Microsoft Active Accessibility. Mac versions of Flash could bind to Universal Access. On GNOME the player could bind to the Assistive Technology Service Provider Interface (at-spi). A server-side version of Flash likely builds upon this same abstracted accessibility object model, passing screen objects to the search engine indexer for further interpretation or interaction.

    Windows Live Search was noticeably missing from the server-side Flash player announcement for search engines. It's possible Adobe has developed a server-side Flash Player for Linux that is not yet compatible with the Windows Server environment of Microsoft's Windows Live Search.

    Accessing deep content

    Googlebot can fill out forms, click buttons, and navigate deep within your site. Clickable Flash objects will likely behave the same way, exposing new content paths for Googlebot within your larger SWF. Flash websites can help ensure deep indexing of SWF content by adding individual SWF fragments to their sitemap. Reading order will likely play a roll in selecting important content on your page, and I expect Googlebot may follow the first item in your reading order sooner than the last.

    Googlebot still throws out references to a anchor name fragment in the URL (e.g. #section=menu) and this announcement does not change the general behavior of Google's URL storage and analysis.

    Do Flash versions matter?

    Emperor Tamarin monkey

    The official announcement from Google and Adobe makes it seem like all Flash is now universally indexed regardless of your Flash version but I think that's bogus. If a search engine wanted to index JavaScript they might run Rhino on the server and interpret results. If you wanted to build an advanced interpreter of Flash content you might use Tamarin or its derivatives, an AVM2 (Flash 9+) virtual machine. I believe AVM2-compatible SWF files will enjoy better search exposure than binaries built for the older AVM. I can't prove it; just a hunch.

    Dynamic object insertion

    Googlebot will detect common JavaScript libraries such as SWFObject used to dynamically insert Flash content at page load. Publishers can back up the dynamic insertion JavaScript with a noscript element just in case Google doesn't discover your dynamic insertion. Sticking with standard dynamic insertion libraries will help ensure your content is discovered through expected behaviors.

    Summary

    The new search version of Flash Player opens the binary SWF format to interpretation by text-focused search engines. Flash developers can take additional steps to package SWF content for accessibility and search discoverability. Developing for modern virtual machines, adding accessibility hooks, and wrapping your SWF in XMP.

  2. Jul08

    Google App Engine optimizations

    Google App Engine

    I have developed a few web applications powered by Google App Engine since its launch in May. It has been a fairly easy transition from my traditional programming in Python and Django backed by MySQL to the distributed App Engine environment, Bigtable, and the limitations of each. I have learned a few App Engine best practices over over the past month and would like to share some best practices for App Engine development gained mostly through trial and error. In this post I will share data optimization tips for Google's hosted Bigtable instance, reduce the errors and resource usage of your application, and add a few steps to your deployment checklist.

    Key-based lookups

    I program Django applications referenced by a set of short unique object labels named slugs. A slug column is uniquely queried across a model and easily indexed for fast scans. In the Bigtable world of Google App Engine slugs are optimally stored as a model's key name. Key names are limited to 500 bytes and must be unique across your defined entity. This unique key lookup directly copies the entity into memory without needing to scan an entire distributed hashtable.

    Entity key names provide very fast lookups for developers who like to plan ahead. You cannot alter the key name once it's set and it cannot start with a number or underscores. If you can accept these limitations within your code you'll experience an even snappier reads from your data store.

    Reduce indexed columns

    It's tempting to choose a Datastore property by its input helper or based on names similar to a SQL equivalent. So what's the difference between a short String and Text? An index.

    According to Guido, a 300 byte string stored as Text is the same size as String but without an index. If you have a short string you never query or sort you'll optimize your data queries if it's stored as Text.

    Define a favicon

    App Engine developers should define favicon.ico, robots.txt, and other frequently requested file paths. Google App Engine logs frequent errors inside your administrative console if it has to hunt for your icon with every browser request.

    Define the location of your static favicon file directly from app.yaml for fast response times:

    - url: /favicon.ico
      static_files: static/favicon.ico
      upload: static/favicon.ico
    

    You should follow a similar pattern for robots.txt and optionally the verification files from Google Webmaster Tools, Yahoo! Site Explorer, and Windows Live Search.

    Define default 400 and 500 response templates

    Your site is not perfect. Visitors will inevitably request pages that do not exist or generate an internal server error. Your site should define default templates for 404 and 500 status codes or risk displaying whatever is sitting on Google's NetScaler.

    Google App Engine default 500 page

    The screenshot above shows an error page of an App Engine application without a defined 500 handler. A link on the page suggests a visit to Google's support website where your visitors will find no support options of interest.

    Django developers should define 404.html and 500.html in your app's templates directory. Django will load and render each file for the default page_not_found and server_error views respectively.

    Deploy and request

    Developers should prime Google's distributed server networks by issuing requests for key URLs a few minutes after deploy. These automated requests trigger your memcache storage and distribute your app instance across Google's distributed servers. The first request requires more CPU cycles and memory than subsequent requests as Google tries to prioritize active application instances and their versions. You can speed things up by always issuing one or more requests after a successful deploy.

    This process is not unlike flushing and re-populating CDN PoPs with new content from your origin server or propagating dynamic handlers across your front-end cluster. It's best to kick off the process early and have the latest version of your content waiting for new visitors on subsequent requests.

    Summary

    Google App Engine simplifies the scaling process but is not a magic cloud that will erase all latency and resource usage issues in your app. App Engine requires new approaches to data storage, data latency, and resource requirements in a metered and opaque environment. Hopefully my trials and experience will speed up your App Engine web apps as you create new services in the cloud.

  3. Jul01

    Announcing Widget Summit 2008

    Widget Summit logo

    I am hosting a my third annual Widget Summit conference November 3rd and 4th at Hotel Nikko in San Francisco. The two-day widget event will once again educate and connect a a widget ecosystem of publishers, toolmakers, developers, and service providers across a variety of platforms including desktop, mobile, web, and social networks. I enjoy taking a look beyond the hype with a sold-out audience interested in building better syndicated content experience through distributed widgets.

    The widget industry is constantly evolving as publishers extend their reach beyond their web address and into remote locations already bustling with activity. The popularity of a single site pales in comparison to the aggregate crowds gathered in front of their Windows Vista desktops, iPhones, or My Yahoo! homepages. In the past year we've seen new context added to our widget environments connecting us to the location, friend list, or shared application of our widget community wherever they may interact with our content. Today's smartest widgets enjoy a close bind with their parent platform's features, regularly poll their home base for relevant updates, and reach new audiences through targeted and integrated content interactions.

    At my first widget conference in 2006 we struggled with the name "widget" and this new distribution network most people interpreted as a Flash badge on MySpace. Last year iPhone web applications and the social canvas of Facebook was all the rage, with new opportunities in the enterprise slowly emerging through the rollout of Windows Vista and personal information dashboards powered by software as a service offerings from established consumer brands such as Google and Netvibes.

    A lot has changed in the widget space in the 8 months since the last Widget Summit. Widgets are going mainstream, with the startup valuations and press coverage to match. Somewhere among the fog of hype are useful opportunities to reach targeted audiences on their platform of choice. Let's take a look at some of the big changes we've seen since October 2007.

    • New collaborative technologies such as OpenSocial and its open-source reference container Apache Shindig are quickly creating new widget environments at companies that could not afford to create their own implementations from scratch. MySpace, Orkut, Hi5, LinkedIn, and Yahoo! have all committed to a standard set of widget APIs.
    • The Facebook platform is in the middle of its first big changes since its 2.0 release in May 2007. Shifting concepts of profile display, authoring, and member interaction will require new upgrades or fresh opportunities for completely new applications.
    • The iPhone continues to spark interest in mobile web app development based on single-browser environments. iPhone 2.0 will put smartphones in the hands of a worldwide audience for about the price of a ubiquitous iPod and hopefully expand mobile data opportunities.
    • Advertising networks have created separate product offerings specifically focused on widgets. DoubleClick syndicates and tracks widgets through its DART platform. AOL's Platform-A recently announced widget-specific advertising and sponsorship powered by TACODA's trail of cookie bounties.
    • The enterprise continues to adopt software as a service and widgets are no exception. Google, IBM, and Microsoft are extending their hosted software into large companies and bundling the latest widget technologies inside an integrated package.
    • Consumer electronics ship with widgets built-in. Your next car, GPS unit, television, or alarm clock may contain customized widget content.

    These are just a few of the large trends creating new opportunities for publishers extending the reach of their content through widgets. We'll cover all the major widget platforms and opportunities at this year's Widget Summit, providing the business sense and development basics to kick off your new widget initiatives in 2009.

    You may have noticed this blog grow quiet over the past few months as I rebuilt the conference software behind Widget Summit and aligned the many business details needed to create the best possible experience. In the next week I'll share some of the technical details behind my new sites and services.

    Registration for Widget Summit is now open with early bird pricing of $795 for the two-day conference in downtown San Francisco on November 3rd and 4th (the Monday and Tuesday before Web 2.0 Summit). I hope you can join us for what should be our best conference yet!

Niall Kennedy Niall Kennedy is a web technologist in San Francisco, California in the United States. I am very interested in the world of... MORE »

Search this weblog:

Subscribe:

Latest feature: Widget development

Archives: Popular Categories

Sites: More from Niall