Recently in Programming Category

Programming tips, news, and libraries.

  1. Mar15

    Measuring efficiency in the cloud

    Electricity meter

    In the world of cloud computing every action has a cost. Every HTTP request fires off a chain of actions, each uniquely measured on a variety of billable meters. Gone are the days of idle or unused resources on our local servers. Cloud computing charges by the sip (when sips are available) aligning business goals of resource efficiency and its cost. The cloud computing world shares many similarities with the plug-in and go world of electricity, including the need to run green for the sake of resources and cost savings. What can the world of green energy teach us about the future of cloud computing? How can we measure computing resources in the cloud for efficiency, replacement costs, and cost savings? I shared a few ideas on green clouds at last week's Ignite at ETech.

    Marginal resources at a marginal cost

    Cloud computing marginal cost components

    At the most basic level cloud computing is a marginal measure of resource consumption across processor time, memory use, disk use, and bandwidth consumption. Cloud consumption meters are much more precise, measuring every process and its dependent system APIs. Wasteful programs, processes, or external libraries carry a real and measurable cost. Upgrading to the latest version of an application or library, complete with bug fixes and inefficiencies can save real money each month. We might even choose between similar software packages based on their measured efficiency.

    Google App Engine measures each HTTP method, differentiates between database selects and updates, and even measures the system cost of each e-mail message. Below is a highly simplified break-down of App Engine's consumption meters detailing available resources in a day. App Engine users may purchase additional access to most APIs.

    MeterAllowed
    Request Handling
    CPU Time6:30
    HTTP requests1,333,328
    Outgoing bandwidth1 GB
    Incoming bandwidth1 GB
    HTTPS requests1,333,328
    Secure outgoing bandwidth1 GB
    Secure incoming bandwidth1 GB
     
    DB
    Database calls10,368,000
    Database CPU time62:07
    Database size1 GB
    Inserts or Updates12 GB
    Selects116 GB
     
    Memcache
    Calls8,640,000
    add, set10 GB
    fetch50 GB
     
    Email
    Within Google5,000
    Outside Google2,000
    Size of message body61.4 MB
    Number of attachments2,000
    Size of attachments102 MB
     
    External requests
    HTTP requests657,084
    POST, PUT4 GB
    GET4 GB
     
    Dynamic images
    Transformations2,592,000
    Source images1 GB
    Manipulations5 GB
     
    Deployments250

    App Engine is extremely precise in its application meters and I only included a small sampling! A reduction in resource consumption carries real cost savings and opens up additional headroom for other processes within your application.

    Writing efficient code for the cloud

    The specialized cloud stack and its meters exposes code inefficiencies that may have gone undetected in a standard hosting environment. Programmers who learn the inner workings of their virtual machine under an environment of constraints will ultimately write better code in any system. Java developers with experience on mobile have operated under device constraints that serves them well in an environment of assumed excess. A cloud programmer taught to tune his code for the interpreter and its inner workings will similarly benefit inside and outside the cloud.

    Coding inside a constrained environment such as App Engine has changed how I write web applications. I was able to look through Guido van Rossum's code, notice different styles and techniques, and inquire about his coding style. It turns out my code was wasteful in ways I had not considered, yet by observing the virtual machine's architect and tuner I learned how to provide the right processing hints and optimizations to speed up code and reduce its resource consumption. Training engineers for cloud computing isn't a specialized investment competing with the cost of machines; it's a long-term investment in style and proficiency that should pay off in a measurable way under a metered runtime environment.

    Measuring efficiency of packaged applications

    What if installable web applications carried their own cloud efficiency ratings similar to Energy Star ratings placed on home appliances? We will have Energy Star ratings for web servers and datacenters beginning in May to create a direct comparison of energy usage across server vendors. Similar measurements can be applied to installable software packages or library dependencies in the cloud.

    WordPress cloud rating mockup

    The mockup above shows a possible cloud rating for WordPress inside a PHP cloud instance. A potential WordPress customer could compare efficiency ratings and total cost of operation of WordPress's code base over the course of a month or a year. He might then compare WordPress against products of similar functionality but varying operational costs such as Drupal or Serendipity. Measuring inefficiencies could motivate software vendors to reduce waste in their products to speed up execution times and save customers real money. Directly measuring resource consumption in this way motivates change up and down the value chain.

    We are already somewhat familiar with energy ratings in our daily lives. We evaluate a new refrigerator or washing machine based on its initial price as well as its total cost to operate and repair. Applications bundled into a machine instance for easy deploys could be similarly measured in a direct to cloud and managed cloud provisioning structure.

    Rolling back the meter

    Solar panels, wind turbines, Google AdSense

    Google, Microsoft, and Amazon operate cloud computing farms and advertising platforms to pay back the cost of operation. I expect AdSense, AdCenter, and Amazon Associates programs will offer discount premiums for customers generating ad revenue from the same company powering their web presence. Google and Microsoft could also make calls to their own APIs free or cheap to cloud customers since such requests would not need to touch the public Internet.

    Consumption dashboards

    Tendril Vantage customer portal

    Cloud dashboards of the future might provide insights into our software in the same ways smart meters and home monitoring solutions hope to measure our electricity use. Electricity monitoring startups are trying to raise awareness around resource consumption, highlight wasteful outliers, and ultimately affect change. Cloud computing companies such as Google can apply lessons learned from funding smarter power meters to cloud computing dashboards of the future.

    Summary

    In the world of cloud computing every action has a direct and measurable cost. Companies can calculate the savings or a business decision such as increasing cache times from 5 minutes to 10 minutes or new code decisions such as dependent library updates. The meters of the cloud will make us much more aware of our server computing consumption and provide new motivations for change.

    Cloud computing changes the infrastructure we use to power our applications. It also changes how we program by introducing constraints, optimization rewards, and systems designed for parallelization and scale. Some of our fear over change and lock-in is a lack of familiarity in operating at scale across distributed nodes. Programming against cloud computing systems retrains software engineers for a world of symmetric multiprocessing and better prepares us for our a future of multiple computing processes in our racks or in the cloud.

  2. Mar14

    The anatomy of cloud computing

    Cloud computing is changing the way we provision hardware and software for on-demand capacity fulfillment. Lately I have been thinking about the ways on-demand servers, storage, and CDNs are changing the way we develop web applications and make business decisions. Gone are the days of idle CPUs, empty memory, or unused drive space. The cloud charges us for what we use as we use it (assuming capacity is available). In this post I will provide an overview of the cloud hosting landscape with a particular focus on cloud utilization by web companies. I will walk through a managed infrastructure stack and examine a few major business targets.

    1. The hardware
    2. The platforms
    3. The managed cloud stack
      1. High availability
      2. Security
      3. Stable, efficient OS
      4. Programming Language Business Logic
    4. The client layer
      1. Attached storage
      2. Database
      3. Cache
    5. Cloud consumers
      1. Web application developers
      2. Back office tasks
      3. Disaster recovery
    6. Summary

    The hardware

    In 1943 Thomas J. Watson of IBM famously proclaimed "there is a world market for maybe five computers." Today we look back and laugh at such a proclamation but the statement really did hold up for approximately 10 years. Into the 1950s IBM designed computers for a possible market of 20 companies, of which 5 were expected to purchase such a machine. In 1953 IBM was pleasantly surprised to find 18 of 20 companies purchased the IBM 701, provind the business of back office processing and a new division for the tabulating giant.

    Last week Rick Rashid of Microsoft was quoted as saying around 20 percent of the world's servers are sold to a handful of companies: Microsoft, Google, Yahoo!, and Amazon. Three of those four companies are cloud resellers, renting small slices of their compute farms to businesses all over the world. 198 megawatt datacenters may be the new mainframe, with consumption units charged in minutes and bytes much like the time sharing relationships of the 1970s.

    IBM again caught my interest last year with its Kittyhawk project from Jonathan Appavoo, Volkmar Uhlig, and Amos Waterland in New York. IBM is currently researching ways to repurpose the massively parallel Blue Gene supercomputers for the datacenters of the Web. It's possible your future web application will run on a computer originally designed for gene sequencing and nuclear weapons testing.

    Hardware and data operations are again consolidating towards major players. These specialist providers are building at a scale and specialization most web businesses can't match. On-demand infrastructure of the cloud makes it cheaper and more efficient to outsource needed operational function to teams of experts already keeping some of the largest web companies in the world running every day.

    The platforms

    Microsoft and Google are the newest entrants into the cloud computing arena, focusing their efforts their respective programming languages of expertise. Microsoft's Windows Azure services platform will likely be the best platform for C# and ASP.Net development as it is tuned by the creators of .Net, IIS, and SQL Server. Google has similarly applied its expertise in the Python language and distributed web nodes to its Google App Engine product. The App Engine cloud is tuned by top contributors to the Python language including its BDFL, Guido van Rossum. App Engine utilizes custom Google software, Google Front End and Megastore, for web serving and storage. Cloud developers on either platform are using a similar set of hardware and software as the proven web-scale platforms of Live.com and Google. I expect Google App Engine will add support for Java in the near-future, their second major language offering and the most popular language among Google's own services.

    Language specialists are building managed stacks on top of generic cloud platforms such as Amazon Web Services' EC2. Engine Yard sells a custom, managed AMI optimized for the Ruby language and its Rails framework. Rackspace's Mosso subsidiary and others optimize for the latest versions of PHP + MySQL, attracting performance-minded applications in search of a tuned cloud instance. I am not aware of any major language contributors of Ruby or PHP employed at either company but the platforms do attempt to find their own niche among a broad offering of scalable hosting providers.

    Amazon's EC2 is the most well-known cloud computing provider and, as previously mentioned, the baseline service for other companies building value-added solutions. The AMI, a machine image formatted deployment in the Amazon cloud, is the basic building block of EC2 virtualization and the primary interaction point of Amazon's customers. Amazon resells premium operating system and application packages on behalf of companies such as Microsoft, IBM, and Oracle but it's possible such specializations will instead be absorbed by the software publishers themselves as they roll out their own hosted clouds (such as Azure or IBM Blue Cloud).

    The cloud computing software stack is trending towards an integrated, managed experience maintained by some of the top contributors to each programming language and related components. More generic cloud platforms will need to stay up-to-date with managed technologies on their platform and/or establish a strong reseller relationship to more specialized cloud managers.

    The managed cloud stack

    Cloud stack illustration

    Managed cloud providers handle an entire stack of infrastructure needed to deliver web applications at scale. A solid cloud computing environment abstracts the basics of a computing environment away from the implementors and lets them focus on adding value with each new application. Managed cloud hosting providers need to offer the following basic layers to stay relevant in a web developer's world.

    High availability

    Any web application needs to be available to legitimate visitors from all over the world. A true cloud creates spans the entire globe, defeating the speed of light on behalf of its customers with a server point of presence in multiple simultaneous locations. The cloud provider needs to effectively receive and route incoming requests to the appropriate virtualized application instance on behalf of its customers.

    Google and Microsoft replicate each application instance to multiple physical locations. AT&T Synaptic Hosting spans multiple locations for its enterprise customers.

    Security

    Web applications should be protected from intrusion and abuse at the network layer. In a cloud computing world application security is a lot like click fraud in advertising: every bad action carries a marginal cost. Cloud providers need to guard customers against potential external abuse and intrusion.

    Google, Microsoft, and Amazon have their eyes on many incoming requests each day. Google serves App Engine requests off the same hardware handling Google Front End, keeping bad requests away from search, ads, and your apps.

    Stable, efficient OS

    Web applications rely on a stable, efficient operating system to interface with hardware, manage filesystems, and allocate resources. The cloud server operating system is a stripped down version of standard installations without a need for direct hard drive interfaces or other peripherals.

    Amazon EC2 AMI Quick Start

    Amazon EC2 highlights the operating system behind every machine image. Older versions of Fedora and Windows Server are the default "quick start" options available to each new account. Google and Microsoft clouds run on custom operating systems tailored for web use. Windows Azure is a stripped-down version of the latest Windows Server. Google runs a Linux-based OS tuned by its infrastructure team.

    Programming Language Business Logic

    Every managed cloud platform includes a dynamic language virtual machine and an appropriate web services gateway. Language functions too closely associated with the parent operating system and its libraries are stripped away, leaving only a pure operating environment for a machine interpreter. External dependencies such as GNU tools and custom compilers will not function within the cloud language abstraction layer. Cloud services bundle a dynamic language runtime into an easily spawned instance for standard and efficient interpretation across many application instances.

    Google App Engine supports most functions of the Python language with additional support for the Django framework, WebOb, and PyYAML. Developers may replace these built-in libraries with newer or customized versions at an additional performance and usage cost. App Engine passes web requests into the programming language environment through the Web Server Gateway Interface.

    The cloud client layer

    Attached storage

    Cloud applications don't operate in a vacuum. Dynamic applications persist their application state and logic through database and file storage. In the cloud world the database and the file server are cloud services unto themselves, operating in an isolated and specialized layer. This isolation makes the storage layer swappable from the rest of the cloud stack and presents new opportunities for competition.

    Static files fall into two major categories based on their planned consumption. Files under 1 MB in size can be consumed by most clients in a single request, matching the expected simple request/response model of the platform. Files over 1 MB in size need to be broken into more manageable parts, or ranges, for a sequenced download. Static cloud storage can be broken up into differing solutions by file size or file type, providing the best possible solution for the storage and delivery task at hand.

    Google App Engine offers static file storage separate from its dynamic runtime. App Engine supports up to 1,000 files and has a 10 MB HTTP response limit.

    Amazon Web Services offers static file serving through its Simple Storage Service (S3) origin server and CloudFront CDN services. Amazon allows private and public file storage and can even charge individual users of third-party services for their use through DevPay.

    Attached storage is by far the most diverse service offering for companies evaluating a specialized solution. I prefer storage providers with widely supported file management APIs, smart settings for MIME types and caching HTTP headers, and a primary functionality of serving files our to the worldwide web. I expect popular storage providers will bundle more CDN services in the future through an exclusive up-sell partnership. I also expect a new class of storage middleware optimized for minimizing files, cleaning up images, or transcoding video will set up new programmable front-ends backed by popular storage providers.

    Database storage

    Databases are the preferred way of persisting structured data powering web applications. Cloud service providers have tuned and rewritten database functionality for the cloud, opening up new opportunities for scalable data services across multiple dynamic application instances. Cloud databases are distributed, replicated, and largely transactional. Cloud databases can be separated from the rest of the cloud stack through RESTful APIs between different vendors but there is a definite latency advantage to coupling of data and its interpreter.

    Microsoft offers SQL Server as a web service as part of the Azure services stack. Google App Engine offers Megastore, an abstraction layer on top of BigTable, as a service API within an App Engine instance or as a separate remote API. Amazon's SimpleDB brings together EC2 processing with S3 data storage. Greenplum offers PostgreSQL as a stand-alone cloud offering.

    Cloud databases are typically more limited in functionality than their local counterparts. App Engine returns up to 1000 results. SimpleDB times out within 5 seconds. Joining records from two tables in a single query breaks databases optimized for scale. App Engine offers specialized storage and query types such as geographical coordinates.

    The database layer of a cloud instance can be abstracted as a separate best-of-breed layer within a cloud stack but developers are most likely to use the local solution for both its speed and simplicity.

    Cache

    Our web applications receive multiple requests for the exact same resource. We should be able to place a pre-assembled version of our web pages, images, and XHR data into a local memory cache for fast serving on multiple requests. On our own servers we frequently use memcached, Varnish, Squid, etc. The cloud stack should include a storage cache as its first layer of request processing.

    Google App Engine includes a memcache API written by Brad Fitzpatrick, creator of memcached. Windows Azure will supposedly support Velocity caching in the near future.

    Cloud consumers

    Clouds reflected on building

    The target market of a cloud computing platform will affect its stack completeness, feature sets, and future support. Cloud terminology seems to be thrown around as a magical buzzword but there are major usage cases emerging.

    Web application developers

    New web applications start small and may sometimes experience exponential growth on a worldwide basis. Web developers evaluating the cloud stack are likely starting from scratch without the concerns of switching from a legacy system or alternate implementation.

    Cloud computing abstracts tiered architecture, operations planning, and other nuances from companies specializing in bring new ideas to market quickly. Web developers prefer a cloud stack tuned for fast web performance. Geographically distributed dynamic instances are important at least as an upgrade option to protect a new business from a rewrite at varying levels of scale.

    I believe cloud providers offering a complete managed stack will attract web development specialists to their platform. Google App Engine, Mosso, and Windows Azure compete in this space.

    Back office tasks

    Enterprise applications are moving out of the local server closet and into the cloud. Medium- to large-sized companies are replacing in-house maintenance of machines and applications with software and infrastructure as a service. Project management, employee tracking, payroll, and many other common functions have made their way into the software-as-a-service realm. More customized applications will migrate to cloud hosting and take their place alongside the anchor tenants of the groupware and collaboration suites.

    Windows Azure, Salesforce's Force.com, and Google App Engine show strong promise as integrated back office add-ons. Microsoft and Google already have a solid footing in enterprise groupware services through Exchange Online and Google Apps respectively. Force.com can be closely tied to the popular Salesforce CRM application for sales and marketing teams.

    More generic back office functions can operate on any cloud hosting provider with a properly maintained disk image. A new class of hosting provider operates as an abstraction layer between multiple clouds by maintaining the appropriate images and deployment scripts for any given task. Companies such as Aptana, CohesiveFT, RightScale, and many others span multiple cloud hosting providers with a single management interface. Cloud management companies can monitor multiple providers and create spot pricing market for computing resources.

    Back office solutions represent the largest possible growth area for cloud hosting providers. Platforms with strong existing anchor tenants can add on new services combining software-as-a-service and infrastructure-as-a-service. Generic cloud hosting providers will likely be tapped for general tasks directly or though a cloud management layer.

    Microsoft is promoting its cloud hosting solutions through its partner channels. Microsoft partners receive a 12% commission on the first year of revenues and 6% commission on all future revenues. Google offers a 20% discount to Google Apps Authorized Resellers over the life of the account.

    Excess capacity

    Hosting solutions need to scale up to meet peak demand. Peak demand could occur for an hour each day, one day a year (Black Monday in the retail sector), or one month out of twelve (college basketball playoffs). Cloud computing lets businesses pay only for what they use when they use it. Servers are not sitting around in your datacenter depreciating in value and consuming resources while you wait for peak load to occur.

    Excess capacity needs may be predictable and cyclical, allowing a business to integrate cloud computing into their computing workflow with ease. Generic cloud computing platforms offer the best migration costs as businesses clone their own local machine images for execution in a cloud computing environment.

    Disaster recovery

    Business operations need to stay online when catastrophe strikes. An earthquake in California, a hurricane in Florida or Texas, or a power outage anywhere in the world could knock your business offline instantly. A hot backup in the cloud spins up when your primary site is down. An on-demand backup facility is a lot cheaper than physical investments as companies invest in contingency planning.

    Amazon Web Services recently introduced reserved machine instances for companies who must be absolutely sure they will be able to operate in an environment of strained cloud capacity. Reserved instances receive priority allocation of cloud resources in exchange for an upgrade fee and lower monthly usage charges. Reserved instances are the VIP treatments of the cloud hosting world.

    Demand response programs are common in utility sectors such as electricity. Businesses can opt to be the last ones kicked off the grid in a low-capacity environment in exchange for higher consumption costs.

    Summary

    Cloud computing is picking up steam and there are a few early winners. The most promising solutions from large vendors are still in a technology preview stage but should be open for general use by the end of the year. Startups developing new applications should pick the best solutions provider based on the strength of their stack offering and usage pricing. Some cloud layers can easily be abstracted to best-of-breed solutions.

    I hope you enjoyed this summary of the world of cloud hosting! There is a lot going on and this post just scratches the surface of how our computing world is changing.

  3. Oct03

    Better Design Through Code

    Every day our web applications ignore useful visitor data. We respond to single request based on a domain and a path without listening to the capabilities, location, preferences, and favorite interactions of our visitors and their requesting agent. A few weeks ago I challenged a room full of designers at PARC to rethink what's possible on the Web and rely on adaptive programming techniques to serve the right content to the right audience at the right time. I titled the 50-minute talk "Better Design Through Code" and walk through latent capabilities of servers and browsers ready and waiting to deliver personalized, adaptive content to unique Web visitors.

    BayCHI presentation slide capture

    I prefer recorded presentations to static shared slideshows. Each movie has 3GPP timed text chapters indexed by slide if you would like to jump ahead to a particular part of the presentation. The whole process is very experimental yet an interesting way to reach new audiences.

    Classify incoming requests

    Incoming requests contain more than a domain and a path. Servers can listen to full request data and segment your audience based on key factors such as preferred language, browser capabilities, or requesting device such as a TV or mobile phone. Listening for key navigation clues reduces visitor input and delivers the best content possible quickly and easily.

    Location filters

    Broad data options can be quickly narrowed through location-based targeting. Web sites can store simple lookup tables to identify the location of their audience at various confidence levels as broad as a home country or as specific as a postal code. New data-driven location services such as Gears or Loki offer even greater location precision by searching the local network for mapped devices on your local network, within radio range, or even receiving signals from GPS satellites.

    Detecting installed software

    Software installed on our computers leave browser-addressable footprints in the form of MIME and URL schemes meant to connect our browsers, webpage embeds, or downloaded files with the appropriate installed application. We can detect installed software on the requesting visitor's machine by testing these known MIME footprints and establishing connections between our web application and the best possible handler on the client. Want to send a photo RSS feed to iPhoto without confusing your users with technical jargon? Test it. Need to communicate an physical address or seamlessly hand off a podcast subscription? Identify tethered GPS or music players on your visitors' machines and dynamically create links to desktop-addressable software from within your webpage.

    Detect favorite websites

    The final part of my presentation focused on identifying the favorite websites and web services of a visiting user to improve site content. Browsers leave a history trail to help us quickly navigate to our favorite resources and identify previously viewed content. We can connect our audience to the web applications and services they care about by testing websites of interest against the current browser history and displaying the best activity prompts to each unique visitor.

    Summary

    Every time a web page loads we throw out potentially useful data. With just a little effort we can thrill our users with custom, adaptive experiences based on their unique computing and personality profiles for increased engagement and conversions. This presentation outlines some of the reasonably easy methods of customization available to site owners seeking more intelligent methods of visitor interaction through smart server- and client-side applications.

  4. Jul23

    Writing Flash for search engines

    Flash logo On June 30 Google and Adobe announced a new indexer optimized for Flash (SWF) discovered by its web crawlers. The new partnership takes advantage of a server-side Flash player optimized for a search engine indexing environment and unidirectional text (e.g. no Hebrew or Arabic). Search engines previously discovered the location of a SWF file on the Web and perhaps indexed its metadata but did not take a deep look inside its binary content. Last month's announcement was a big change for both Adobe and major search engines as it is now possible to run a very GUI-based Flash file at the command line and interpret both its text content and interaction opportunities. In this post I will walk through what we currently know about the search engine Flash runtime and how it affects search engine optimization in Flash.

    Build for a blind, deaf user

    Search engine indexers are blind and deaf. They open a file, examine its contents, and try to deduce meaning through your page structure and its content. A web page designed for screen readers will also expose more content to search engines not evaluating your page's full render state of content, layout, and interactions.

    Search engines utilizing Flash player indexing are still restricted to this screen reader approach. Accessible Flash applications complete with names, labels, reading order and XMP should continue to be more search engine friendly than other SWF files on the Web. Google's tips for creating accessible, crawlable sites still apply, but in a new Flash context.

    The server-side Flash Player

    If we want to understand how search engines such as Google might interpret Flash content we'll first need to take a look at the Flash Player itself. Adobe provides little details in its official SWF searchability FAQ but we can infer a few implementation details. How would you rewrite Flash Player for server-side indexing of SWF content?

    The search engine Flash Player is likely a scaled-down, secure version optimized for machine readers. Strip out video, audio, fonts, and file system access. The server side Flash player should open a binary SWF file, pull out the functionality it understands, and create a data tree of all possible actions. These features are actually quite similar to a screen reader interface, but Adobe is instead targeting a Linux-based headless runtime. I believe the guts of the Flash Player for servers is built using the same accessibility abstraction layer Adobe currently uses for Windows and could extend to platform-level binds Mac and Linux desktops.

    The Adobe Flash Player creates a list of objects on the screen at each render and records this list into an accessible data tree (according to a 2005 white paper by Bob Regan). This data tree is updated with each change in the application state, allowing any application listening in to update an object model of clickable buttons, labels, and links.

    Adobe interfaces with OS-level accessibility frameworks on Windows currently and could extend this model to every major desktop platform. The Windows version of Flash Player binds to Microsoft Active Accessibility. Mac versions of Flash could bind to Universal Access. On GNOME the player could bind to the Assistive Technology Service Provider Interface (at-spi). A server-side version of Flash likely builds upon this same abstracted accessibility object model, passing screen objects to the search engine indexer for further interpretation or interaction.

    Windows Live Search was noticeably missing from the server-side Flash player announcement for search engines. It's possible Adobe has developed a server-side Flash Player for Linux that is not yet compatible with the Windows Server environment of Microsoft's Windows Live Search.

    Accessing deep content

    Googlebot can fill out forms, click buttons, and navigate deep within your site. Clickable Flash objects will likely behave the same way, exposing new content paths for Googlebot within your larger SWF. Flash websites can help ensure deep indexing of SWF content by adding individual SWF fragments to their sitemap. Reading order will likely play a roll in selecting important content on your page, and I expect Googlebot may follow the first item in your reading order sooner than the last.

    Googlebot still throws out references to a anchor name fragment in the URL (e.g. #section=menu) and this announcement does not change the general behavior of Google's URL storage and analysis.

    Do Flash versions matter?

    Emperor Tamarin monkey

    The official announcement from Google and Adobe makes it seem like all Flash is now universally indexed regardless of your Flash version but I think that's bogus. If a search engine wanted to index JavaScript they might run Rhino on the server and interpret results. If you wanted to build an advanced interpreter of Flash content you might use Tamarin or its derivatives, an AVM2 (Flash 9+) virtual machine. I believe AVM2-compatible SWF files will enjoy better search exposure than binaries built for the older AVM. I can't prove it; just a hunch.

    Dynamic object insertion

    Googlebot will detect common JavaScript libraries such as SWFObject used to dynamically insert Flash content at page load. Publishers can back up the dynamic insertion JavaScript with a noscript element just in case Google doesn't discover your dynamic insertion. Sticking with standard dynamic insertion libraries will help ensure your content is discovered through expected behaviors.

    Summary

    The new search version of Flash Player opens the binary SWF format to interpretation by text-focused search engines. Flash developers can take additional steps to package SWF content for accessibility and search discoverability. Developing for modern virtual machines, adding accessibility hooks, and wrapping your SWF in XMP.

  5. Aug22

    Flash Player adds H.264, AAC support

    Flash Player logo

    Yesterday Adobe released a beta version of its Flash Player browser plugin capable of decoding H.264 video, AAC audio, and associated rich metadata. Web browsers utilizing Flash 9.0.60.184 or higher will now be able to playback content encoded for digital television, iPods, and high-end mobile phones using international standards. Adobe's support for these standardized audio and video codes will streamline the production process for desktop and web video, hopefully reducing time-to-market and opening more video catalogs to online viewers. A beta version of the new player, Flash Player 9 Update 3 Beta 2 "Moviestar", is available from Adobe Labs.

    Flash video sites such as MySpace or YouTube currently encode video content using the On2 TrueMotion VP6 codec and MP3 audio built-in to Flash 8 and above. Some sites also output content in H.264 with AAC audio for playback on handheld devices such as the iPod, iPhone, or Nokia N-series handsets. The new Flash Player lets publishers skip the extra step of VP6 encoding and pipe in H.264 content using their existing web players. Flash programs rely on the same NetStream method used for existing Flash video with a few new optional callbacks for metadata and encoding types.

    Adobe licensed core codec technologies from MainConcept for x86, PowerPC, and ARM processor architectures. The new media technologies will be bundled with the next major Flash Player release and Adobe AIR (formerly code-named Apollo), both expected this Fall. The new technology will also power Adobe Media Player (formerly code-named Philo), expected in early 2008.

    Hardware acceleration

    AAC and H.264 are ISO standards introduced in 1997 and 2003 respectively. Over the past 4-10 years hardware manufacturers have introduced specialized hardware encoders and decoders for the professional video industry to speed-up the production and presentation process. Like most new hardware technologies initial solutions cost thousands of dollars and were beyond the reach of most consumers but we're finally starting to see low-priced hardware optimized for multimedia encoding and decoding. The recent acceleration in hardware encoding and decoding solutions is partially driven by the large data processing requirements of high-definition H.264 video on Blue-ray and HD-DVD media.

    Current H.264 hardware sampling

    Enhanced metadata support

    Flash Player now supports 3GPP time text tracks, iTunes metadata ("ilst" atom), and chapter listings for easy-to-navigate playback and searchability. Flash developers will need to listen for and handle each format but publishers may choose to output a full transcript or keyword markers with every video.

    Chapters technology lets publishers addressable parts of a movie. The nightly news might contain a chapter marker for each story or a music video countdown might list the start of each new video as a distinct chapter.

    Timed text is a closed-caption format for audio and video. A content producer might sync a full transcript to audio or video input to improve the parsing abilities of search engines, foreign language translations, or persons with disabilities.

    Technical notes

    The new Flash player decodes Base, Main, and High H.264 profiles and Main, LC, and HE AAC profiles. Sound is mixed down to two-channels and resampled to 44.1Khz according to Adobe developer Tinic Uro. This downmixing is a limitation of the current Flash sound engine, which dates back to 1996 and will likely need to be rewritten for the current publishing environment and ActionScript 3 architecture.

    There is currently no support for third-party streaming services. Media companies who would like to stream H.264 and AAC content to the new Flash Players need to use the upcoming Flash Media Server 3.

    Conclusion

    Web video and its production process just received a major upgrade with Adobe's latest decoders in Flash 9. New opportunities for hardware acceleration, streamlined encoding, and multiple device support will increase the amount of video available for playback within web pages. Media companies have a new level of archival confidence this week as well, with one major international formatting option delivering quality video for the foreseeable future.

    We will not see a change in online video overnight. Once Adobe releases the final version of this new Flash 9 player users will need to upgrade, either automatically through the Player's built-in update system or through a separate download, before publishers can feel confident switching their Flash video players to H.264 sources.

    One big story that has yet to play out is Flash Lite and AIR on mobile systems. Adobe would like to compete with Microsoft and Sun in this application space and already has a major proving ground in Japan. Flash Lite 3 is based on Flash 8 and already shipping on devices such as Chumby so it may be too late for the ActionScript 3 player paired with the underlying ARM codecs. Adobe AIR may be bundled separately with mobile carrier contracts and is expected to have Flash 9 features such as H.264 and AAC included.

  6. Jun18

    Widgets on your iPhone

    iPhone innovative applications

    Steve Jobs announced the iPhone development platform at last week's Worldwide Developer Conference to sighs of disappointments. Mac developers were anxious to develop new applications for the the most anticipated consumer electronics device in years, only to be told they should code fancy websites instead. The 9-minute iPhone development demonstration during the WWDC keynote was a bit confusing for anyone new to Apple widget development. In this post I'll break down a few Apple widget components, transport you to the iPhone development world, and explain a few restrictions and lock-downs common in the mobile phone industry.

    Dashboard under the hood

    Apple's Dashboard application acts as a bridge between web technologies and your desktop. Basic widgets contain HTML, CSS, and JavaScript describing widget structure, styling, and interaction respectively. Multiple widgets utilizing this same base technology form a single process group on OS X 10.5 (Leopard) and minimize the total amount of system resources (CPU, memory, etc.) required by each new widget.

    Dashboard widgets can access the local operating system's look and feel through the Apple JavaScript classes inside your system's WidgetResources directory. These specialized JavaScript resources expose a scrollbar, slider, buttons, animations, and widget flip controls specific to the operating system and Apple's UI of the moment.

    Apple Dashboard widgets may also tap into local resources such as your computer's iSight camera, your MacBook's current battery levels, songs in your iTunes libraries, or contacts in your Address Book. Any application may add a widget-plugin as a Cocoa bundle to allow widget access to application-specific data and functionality.

    Today's Dashboard widgets take advantage of web browsing technology, plugins, and local application resources exposed to the widget engine via specialized plugin interfaces. Dashboard is an part of your computer's Dock application. Dashboard widgets exist behind a single Dashboard icon; they do not have individually callable Dock icons out of the box.

    Dashboard experience ported to iPhone

    iPhone innovative applications

    What if Apple's desktop widget were ported to a mobile device such as the iPhone? The iPhone runs OS X, and contains the essential components necessary for widget operation on a mobile device.

    iPhone widgets would operate inside the mobile WebKit library. They would have access to device-specific UI elements such as stylized buttons, smooth transitions, and personalization options. Pieces of the underlying operating system and installed applications would be exposed via widget plugins. Widget files would be distributed as a bundle, downloaded to the iPhone over the air or via a tethered sync. Each widget could have access to limited system resources such as iPhone battery life, WiFi signal strength, the local Address Book, or the iPhone's built-in camera.

    iPhone developer features announced at WWDC

    There were two types of iPhone announcements at WWDC last week: public statements made by Steve Jobs and Scott Forstall during the conference keynote and NDA-bound statements to developers during conference sessions. I'll only cover the public statements in this post.

    iPhone developer features

    iPhone applications will "utilize the full Safari engine" and "look exactly like apps on the iPhone." Interpretation: Applications created for the iPhone will be powered by WebKit technology and have access to Apple-specific JavaScript libraries to create the look-and-feel of the underlying Apple OS. This behavior is similar to the current Dashboard experience.

    Write applications using Asynchronous JavaScript and XML (Ajax). Interpretation: The iPhone's web browsing technology supports XMLHttpRequest as a data retrieval method. This statement could also mean Apple will support JavaScript programmability of a local sandboxed CoreData store delivered as XML but that's more advanced and unlikely due to no current offline storage support on the desktop browser.

    "Integrate with iPhone services." You can make a phone call, send an e-mail, or lookup a location in the built-in Google Maps application from any web application. Interpretation: The iPhone's Safari browser contains the same data detection features for phone numbers, e-mail addresses, and address data seen in Mail.app in Leopard. This detected data can be passed into its default handler as an automatically-generated hyperlink. This statement could also mean WebKit applications will have access to special plugins created for system-level services similar to the desktop API but that may be too hopeful.

    "Instant distribution." "Easy to update." "Sandboxed on iPhone." Interpretation: Widget bundles are not stored on the iPhone. All files are retrieved from the a remote server and treated as a web resource. Your files are cached and have the same access restrictions as a standard public Internet site.

    Safari vs. widgets

    iPhone widgets are small applications powered by WebKit launched from the iPhone application menu. Web applications created by third-party developers for the iPhone are three clicks away from the same home screen -- Web, Bookmarks, bookmark name -- but have similar functionality. Personalization data such as stock tickers or your local ZIP code is stored inside a browser cookie.

    iPhone widgets store resource files such as images, HTML, CSS in the iPhone's local storage and update the entire widget with the operating system. iPhone widgets pull data updates such as stock prices or the latest weather report from a remote server or could also access locally stored data such as a dictionary.

    Safari-based applications request each resource from a remote server and poll for cache updates with each page load. If your weekly weather display contains a sun, cloud, and cloud with rain your application might poll a remote server for possible changes to each of the three images with every display of your weather page.

    AT&T or Apple restriction?

    Apple developers wanted at least iPhone widget-level application marketing and were visibly disappointed by Apple's keynote announcement last week. It's still unclear if AT&T or Apple is keeping third-party developers off the main app menu. I can only postulate based on existing developer programs from each company.

    AT&T certifies applications to operate on phones in its network across multiple operating systems. Productivity applications receive an enterprise solution certification after successfully passing security, reliability, and network usage tests and paying fees starting at $1000 a test. Enterprise applications are usually available for free and side-loaded (updated via a tether) by corporate IT departments. Consumer applications are typically distributed through AT&T's MEdia Net portal after similar testing and certification fees for a purchase fee split between AT&T and the developer. This process is the "orifice" Steve Jobs referenced in a 2004 Wall Street Journal interview.

    Current video iPods feature games purchased from the iTunes Store. Apple currently distributes 14 games created in-house and through external companies such as Astraware who specialize in porting games to PDAs and cell phones. The current iPod games platform is not open to third party or "homebrew" creations. Anyone can create their own iQuiz, using a specially formatted text file (essentially a fancy Note).

    New developers could enter the iPhone application menu through AT&T, Apple, or both.

    Ten days until iPhone

    The iPhone will be available at 6 p.m. on June 29, or a little over 10 days from the time I write this post. More developer documentation may emerge after the device's official release. Hardware and software hackers will likely pull the device apart in search of custom modifications already present on Palm Treo devices or the Sony PSP.

    Hopefully this long post clarifies the data we already know about applications and widgets on the iPhone. The device and its software was certainly under a tight release schedule and it's reasonable to assume new features are on their way in new versions of operating system hardware and software expected over the next six months. There is a developer story on the iPhone, but Apple has not communicated this story very well to their developer base over the past 6 months. They're battling the same closed carrier system as any other mobile application provider, so expect slow change assisted by market leverage.

  7. Mar21

    Adobe Apollo: beyond the hype

    Adobe Apollo logo

    Adobe released early bits of its next big product bet on Monday morning, a web and desktop hybrid code-named Apollo. Apollo is the first child born out of the Adobe-Macromedia merger of April 2005, bringing together the desktop strength of Adobe PDF combined Macromedia's web-savvy Flash and Apple's web browser engine. Apollo will continue to receive heavy marketing from Adobe building towards a 1.0 launch in the fall. In this post I'll break down the components of Adobe's Apollo framework, identify opportunities for application development, and compare the promised features against other software offerings.

    What is Apollo?

    Apollo combines Adobe Reader, Flash Player, and Apple's Safari browser engine into a single desktop application for the Windows and Mac platforms. Apollo applications have access to the local file system and are placed on your taskbar or dock just like you'd expect from an application.

    Apollo will be available as an autoupdate for Flash 9 users and will have similar distribution to existing free products from Adobe such as Reader and Flash.

    Adobe Internet TV: Philo

    Adobe Philo Rocketboom

    Adobe's first big Apollo app is an Internet video application codenamed Philo. The pervasiveness of Flash Player created multi-billion dollar Internet video startups powered by the Flash video format. The Philo team hopes to expand the display size and quality of distributed videos and get publishers encoding using the latest Flash video encoders. Publishers can skin the entire video player, delivering MTV content in what looks like a MTV video player, or a Rocketboom-themed player shown above..

    I have not seen or used Philo, but it should directly compete with Democracy player and possibly Joost. The popularity of online video and cobranded players should help accelerate Apollo's adoption.

    Apollo components

    Expect Apollo to use the latest version of all available components at ship time. Apollo requires ActionScript 3, meaning content must be written for Flash 9 and above in order to interact with the Apollo program. HTML, JavaScript, and CSS are handled by WebKit, the open source browser engine behind Apple Safari, Apple Dashboard widgets and the Nokia S60 browser. PDFs will support features such as digital signing and approval, and it's probably best to develop on the recently released PDF 1.7 format.

    Step out of the browser

    Modern web applications are pushing the web browser to its limits and already taking advantage of desktop functionality such as JavaScript execution and browser plugins such as Flash or Quicktime. Publishers can take that same web application running inside a browser tab, wrap it in Apollo descriptors, and create a cross-platform desktop application.

    Most Apollo applications will likely be repurposed web pages running inside a specialized environment. Mac users already have some of this functionality today as fans of a particular service have created WebKit-based applications combining desktop familiarity with a constantly connected web application.

    Imagine your heavy, always-open web apps leaving your browser tab and creating an application-like presence in your taskbar. With a few extra hooks into the Apollo runtime the web application could access files on your hard drive such as your address book, music library, or calendar.

    Plugins such as Flash are second-class citizens within the web browser, receiving limited resources even when displayed in the active window. Safari and WebKit lead Dave Hyatt recently explained some of the plugin issues in detail, including complicated state of balancing system resources and user expectations. A stand-alone application removes the limits of these resource constraints, letting an Apollo application write more data to disk, consume more CPU cycles, and interact with other application data on your computer (for better or worse).

    Existing WebKit apps

    Pyro Campfire application

    Pyro is a desktop application for the Mac that takes 37signals' Campfire chat application out of the browser and into the desktop environment. Pyro accesses local UI elements such as new message displays, and supports system notifications using Growl. Campfire might stay open during your entire workday, and it's useful to have a separate application window and desktop features associated with that workflow.

    Pandoraman logo

    PandoraMan takes Pandora's Flash-based streaming music player out of a web browser window and into an application in your dock. You might listen to music throughout the day or enjoy quitting your web browser often, and a desktop application such as PandoraMan helps the music keep playing.

    Mac OS X (Objective C/Cocoa) applications taking advantage of the WebCore framework compete with Apollo on the Mac platform. Mac applications built on WebCore can bind to other libraries on the Mac system, taking advantage of notifications, system libraries, and native UI elements.

    Windows Presentation Foundation

    Windows Presentation Foundation (WPF) is part of Microsoft's .Net Framework version 3.0. The presentation layer runs on a user's graphics card, taking advantage of the specialized hardware to create a the glassy look of Windows Vista or offload other system tasks. WPF UI effects will likely become expected behavior from future applications.

    Windows developers can take advantage of Internet Explorer libraries on the machine to render HTML while maintaining the same zones and access controls defined by other parts of the system. A C# developer should be able to write a small application to wrap a website, including plugins such as Flash.

    Microsoft's planned release of WPF/E will extend the reach of Microsoft runtimes across operating systems and web browsers. WPF/E is the most direct competitor to Apollo outside the native application space with support for animation, graphics, and common audio and video codecs.

    Summary

    Apollo extends the reach of the Flash development community onto the desktop, creating new opportunities for application development using ActionScript 3. The ActionScript development community can now deploy applications onto cell phones using Flash Lite, inside a web browser using Flash Player, and onto the desktop using Apollo.

    Apollo's PDF electronic document support will play a role within the enterprise, opening up smarter form handling and reliability. An enterprise already dependent on PDF workflow and accountability may tap into Apollo for a consistent work flow across the company.

    I've heard the write-once run-anywhere many times over the past 10 years, but few applications have actually delivered. Java and Java Web Start are the closest historical comparisons, but the demand for multimedia content creates a new breed of competitors in the form of Flash, Apollo, and WPF/E.

    I'm still a fan of native application development to create the most feature-rich and well-integrated application possible in the smallest resource footprint. Java and ActionScript programmers can extend the reach of their code base without learning too many new things, and I definitely understand that attraction, but serious applications should be well integrated.

    Adobe has allocated $100 million towards investing in companies that enhance its engagement platform and is especially interested in funding Apollo companies. As of last month Adobe had invested in 6 companies, including word processing company Virtual Ubiquity. Companies might develop for Apollo to take advantage a strategic investment from Adobe at reasonable terms.

    Apollo in its current form seems overhyped, but the cross platform development space will definitely look different in a year as we see new toolkits from big companies executed inside and outside of the browser. It's not too difficult for a web application to pop out of the web browser and into a standalone web technology, and the marketing and investment dollars being spent by large companies such as Adobe and Microsoft should help boost the visibility of cutting edge web apps.

  8. Feb22

    Yahoo! centralizes its JavaScript network with free hosting

    Yahoo! is opening up the JavaScript powering its websites a bit more tonight, encouraging developers to directly reference libraries on its servers from within their webpages. Yahoo! User Interface Hosting opens up versioned access to the popular YUI Library, creating faster load times for sites across the web using Yahoo's optimized, geo-distributed, and reliable data centers.

    Yahoo! UI hosting sample code

    Many websites utilize common libraries for JavaScript development, creating a drop-down menu, file retrieval, or chart rendering using a library such as Prototype, script.aculo.us, dojo, and many others. If five Ruby on Rails sites utilize the same script.aculo.us library for effects you'll have to download the same file(s) five times from each of the five different domains. Centralized resources such as YUI Hosting create a single download source requiring one file download regardless of the number of sites taking advantage of the YUI library.

    Yahoo! is a global company and spends a lot of money serving up web content as fast as possible in London, San Francisco, or Tokyo. The central YUI files are on that same network, creating a shorter path from a user's browser to required files needed to enhance a website. Pulling files from a separate domain also creates an opportunity for more parallel content downloads, circumventing the two requests per host limit in Firefox and Internet Explorer.

    Yahoo! will be logging each request and its page origination, so if you are worried about privacy and providing pageview numbers to outside sources the hosted version of YUI may not be for you (grab a download, host your own).

    A web widget feature

    Version 2.2 of YUI, released on Tuesday, includes support for a new global variable named YAHOO.env. Web widget developers can reference this variable to determine if YUI is already present on the page for additional functionality or before loading a conflicting library. It's a useful feature for blog sidebars, letting your widget peacefully co-exist with a del.icio.us, Flickr, or MyBlogLog widgets/badges without unnecessarily weighing down the page.

    Summary

    I think the Yahoo! Interface Library will continue to gain traction thanks to its heavy development, extensions, and documentation. It's already being used by large sites such as The Wall Street Journal and SmugMug and across the revamp of the Yahoo! network, which are some key votes of confidence important in new technology adoption.

  9. Nov30

    Google Mondrian: web-based code review and storage

    Google Mondrian logo

    Guido van Rossum unveiled his first Google project, Mondrian, tonight during a Python tech talk at the Google campus in Mountain View. Mondrian is a web-based code review system built on top of a Perforce and BigTable backend with a Python-powered front-end. Mondrian is a pretty impressive system and is currently in use across Google.

    Shared Development Environment

    Google uses a company-wide Perforce depot with almost no developer branches. Each developer has their own NFS workspace readable by anyone in the company, including automated processes. An administrative process takes snapshots of each developer workspace including local development environments accessed over SSH. Files within these snapshots can be compared to checked-in data, encrypted, and archived.

    Previous methods of review

    Previous to Mondrian code review was conducted largely over e-mail using Google command-line wrappers built on top of Perforce. A developer could initiate a code review from within the g4 mail tool, which would fire off an e-mail and begin a review thread. When the developer receives a response of "looks good to me," or lgtm for short, they could proceed to checkin. Changes could be compared using tkdiff.

    Design-level reviews are often conducted by e-mailing around Word documents or editing a team wiki. Recently some design reviews have moved onto an internal version of Google Docs.

    Web-based collaboration meets code review

    Mondrian code review

    The Mondrian tool creates a much better workflow by creating task-specific dashboards, in-line commenting, well-tracked statistics, and more. The application is built on top of Python open source libraries such as the Django framework, smtpd.py mail service, and the wsgiref web server software.

    Code reviews can be initiated and completed from within the Mondrian interface. A developer requests a review from another user or a group of users to kick off the process. Each invited reviewer can add comments directly underneath a line of code or reference the entire file. You can request and diff the file against previous versions as well. It's a pretty slick interface, lightly highlighting each line of code as you hover, and popping open a comment box in response to a double-click. Comments can be saved as a draft and shared at a later time.

    Putting the entire code review process online means you never have to worry about referencing the most recent version of a file or losing e-mails. Mondrian captures every outgoing e-mail related to the workflow, looks for key data such as revision numbers, and updates a to-do list accordingly.

    More on BigTable

    Mondrian uses BigTable as backend storage for user data. More specifically, it's used to store:

    • Change metadata such as a description or list of files
    • Comments entered through the web interface or via e-mail
    • Encrypted file snapshots taken from user workspaces
    • Per-user data such as active changes or last view dates

    Summary

    The Mondrian web code review system is pretty impressive. Guido estimates he has spent about 25% of his work time on the project since joining Google in December 2005. Mondrian served as Guido's introduction to Google technologies and processes with the help of a few other Googlers treating it as a side-project. The application is so deeply intertwined with Google technologies it's not likely to be available as open source until Subversion and a backend such as SQLite can be supported.

    Guido's full talk, including a demo of Mondrian, is available on Google Video.

  10. Oct04

    Google Code Search

    Google Code Search

    Google has a new search product focused on source code. It peeks inside tarballs and other recognized formats, allowing you to search the index by regex, license, or language. It's pretty easy to see how many projects are using a given library (such as feedparser or magpie) and keep inventing new ways to explore software.

    You can access the code search engine through a GData Atom feed for easy integration wherever you choose.

    I find Google Code Search is easier to use than Koders, and may come in handy when looking for different ways of approaching a particular programming problem or library.

Niall Kennedy Niall Kennedy is a web technologist in San Francisco, California in the United States. I am very interested in the world of... MORE »

Search this weblog:

Subscribe:

Recently Popular

Archives: Popular Categories

Sites: More from Niall