Measuring efficiency in the cloud

Analog electricity meter

In the world of cloud computing every action has a cost. Every HTTP request fires off a chain of actions, each uniquely measured on a variety of billable meters. Gone are the days of idle or unused resources on our local servers. Cloud computing charges by the sip (when sips are available) aligning business goals of resource efficiency and its cost. The cloud computing world shares many similarities with the plug-in and go world of electricity, including the need to run green for the sake of resources and cost savings. What can the world of green energy teach us about the future of cloud computing? How can we measure computing resources in the cloud for efficiency, replacement costs, and cost savings? I shared a few ideas on green clouds at last week’s Ignite at ETech.

Marginal resources at a marginal cost

Cloud computing marginal costs

At the most basic level cloud computing is a marginal measure of resource consumption across processor time, memory use, disk use, and bandwidth consumption. Cloud consumption meters are much more precise, measuring every process and its dependent system APIs. Wasteful programs, processes, or external libraries carry a real and measurable cost. Upgrading to the latest version of an application or library, complete with bug fixes and inefficiencies can save real money each month. We might even choose between similar software packages based on their measured efficiency.

Google App Engine measures each HTTP method, differentiates between database selects and updates, and even measures the system cost of each e-mail message. Below is a highly simplified break-down of App Engine’s consumption meters detailing available resources in a day. App Engine users may purchase additional access to most APIs.

Request Handling
CPU Time6:30
HTTP requests1,333,328
Outgoing bandwidth1 GB
Incoming bandwidth1 GB
HTTPS requests1,333,328
Secure outgoing bandwidth1 GB
Secure incoming bandwidth1 GB
Database calls10,368,000
Database CPU time62:07
Database size1 GB
Inserts or Updates12 GB
Selects116 GB
add, set10 GB
fetch50 GB
Within Google5,000
Outside Google2,000
Size of message body61.4 MB
Number of attachments2,000
Size of attachments102 MB
External requests
HTTP requests657,084
Dynamic images
Source images1 GB
Manipulations5 GB

App Engine is extremely precise in its application meters and I only included a small sampling! A reduction in resource consumption carries real cost savings and opens up additional headroom for other processes within your application.

Writing efficient code for the cloud

The specialized cloud stack and its meters exposes code inefficiencies that may have gone undetected in a standard hosting environment. Programmers who learn the inner workings of their virtual machine under an environment of constraints will ultimately write better code in any system. Java developers with experience on mobile have operated under device constraints that serves them well in an environment of assumed excess. A cloud programmer taught to tune his code for the interpreter and its inner workings will similarly benefit inside and outside the cloud.

Coding inside a constrained environment such as App Engine has changed how I write web applications. I was able to look through Guido van Rossum‘s code, notice different styles and techniques, and inquire about his coding style. It turns out my code was wasteful in ways I had not considered, yet by observing the virtual machine’s architect and tuner I learned how to provide the right processing hints and optimizations to speed up code and reduce its resource consumption. Training engineers for cloud computing isn’t a specialized investment competing with the cost of machines; it’s a long-term investment in style and proficiency that should pay off in a measurable way under a metered runtime environment.

Measuring efficiency of packaged applications

What if installable web applications carried their own cloud efficiency ratings similar to Energy Star ratings placed on home appliances? We will have Energy Star ratings for web servers and datacenters beginning in May to create a direct comparison of energy usage across server vendors. Similar measurements can be applied to installable software packages or library dependencies in the cloud.

WordPress cloud rating mockup

The mockup above shows a possible cloud rating for WordPress inside a PHP cloud instance. A potential WordPress customer could compare efficiency ratings and total cost of operation of WordPress’s code base over the course of a month or a year. He might then compare WordPress against products of similar functionality but varying operational costs such as Drupal or Serendipity. Measuring inefficiencies could motivate software vendors to reduce waste in their products to speed up execution times and save customers real money. Directly measuring resource consumption in this way motivates change up and down the value chain.

We are already somewhat familiar with energy ratings in our daily lives. We evaluate a new refrigerator or washing machine based on its initial price as well as its total cost to operate and repair. Applications bundled into a machine instance for easy deploys could be similarly measured in a direct to cloud and managed cloud provisioning structure.

Rolling back the meter

Solar panels, wind turbines, Google AdSense

Google, Microsoft, and Amazon operate cloud computing farms and advertising platforms to pay back the cost of operation. I expect AdSense, AdCenter, and Amazon Associates programs will offer discount premiums for customers generating ad revenue from the same company powering their web presence. Google and Microsoft could also make calls to their own APIs free or cheap to cloud customers since such requests would not need to touch the public Internet.

Consumption dashboards

Tendril monitoring dashboard view

Cloud dashboards of the future might provide insights into our software in the same ways smart meters and home monitoring solutions hope to measure our electricity use. Electricity monitoring startups are trying to raise awareness around resource consumption, highlight wasteful outliers, and ultimately affect change. Cloud computing companies such as Google can apply lessons learned from funding smarter power meters to cloud computing dashboards of the future.


In the world of cloud computing every action has a direct and measurable cost. Companies can calculate the savings or a business decision such as increasing cache times from 5 minutes to 10 minutes or new code decisions such as dependent library updates. The meters of the cloud will make us much more aware of our server computing consumption and provide new motivations for change.

Cloud computing changes the infrastructure we use to power our applications. It also changes how we program by introducing constraints, optimization rewards, and systems designed for parallelization and scale. Some of our fear over change and lock-in is a lack of familiarity in operating at scale across distributed nodes. Programming against cloud computing systems retrains software engineers for a world of symmetric multiprocessing and better prepares us for our a future of multiple computing processes in our racks or in the cloud.

The anatomy of cloud computing

Cloud computing is changing the way we provision hardware and software for on-demand capacity fulfillment. Lately I have been thinking about the ways on-demand servers, storage, and CDNs are changing the way we develop web applications and make business decisions. Gone are the days of idle CPUs, empty memory, or unused drive space. The cloud charges us for what we use as we use it (assuming capacity is available). In this post I will provide an overview of the cloud hosting landscape with a particular focus on cloud utilization by web companies. I will walk through a managed infrastructure stack and examine a few major business targets.

  1. The hardware
  2. The platforms
  3. The managed cloud stack
    1. High availability
    2. Security
    3. Stable, efficient OS
    4. Programming Language Business Logic
  4. The client layer
    1. Attached storage
    2. Database
    3. Cache
  5. Cloud consumers
    1. Web application developers
    2. Back office tasks
    3. Disaster recovery
  6. Summary

The hardware

In 1943 Thomas J. Watson of IBM famously proclaimed “there is a world market for maybe five computers.” Today we look back and laugh at such a proclamation but the statement really did hold up for approximately 10 years. Into the 1950s IBM designed computers for a possible market of 20 companies, of which 5 were expected to purchase such a machine. In 1953 IBM was pleasantly surprised to find 18 of 20 companies purchased the IBM 701, provind the business of back office processing and a new division for the tabulating giant.

Last week Rick Rashid of Microsoft was quoted as saying around 20 percent of the world’s servers are sold to a handful of companies: Microsoft, Google, Yahoo!, and Amazon. Three of those four companies are cloud resellers, renting small slices of their compute farms to businesses all over the world. 198 megawatt datacenters may be the new mainframe, with consumption units charged in minutes and bytes much like the time sharing relationships of the 1970s.

IBM again caught my interest last year with its Kittyhawk project from Jonathan Appavoo, Volkmar Uhlig, and Amos Waterland in New York. IBM is currently researching ways to repurpose the massively parallel Blue Gene supercomputers for the datacenters of the Web. It’s possible your future web application will run on a computer originally designed for gene sequencing and nuclear weapons testing.

Hardware and data operations are again consolidating towards major players. These specialist providers are building at a scale and specialization most web businesses can’t match. On-demand infrastructure of the cloud makes it cheaper and more efficient to outsource needed operational function to teams of experts already keeping some of the largest web companies in the world running every day.

The platforms

Microsoft and Google are the newest entrants into the cloud computing arena, focusing their efforts their respective programming languages of expertise. Microsoft‘s Windows Azure services platform will likely be the best platform for C# and ASP.Net development as it is tuned by the creators of .Net, IIS, and SQL Server. Google has similarly applied its expertise in the Python language and distributed web nodes to its Google App Engine product. The App Engine cloud is tuned by top contributors to the Python language including its BDFL, Guido van Rossum. App Engine utilizes custom Google software, Google Front End and Megastore, for web serving and storage. Cloud developers on either platform are using a similar set of hardware and software as the proven web-scale platforms of and Google. I expect Google App Engine will add support for Java in the near-future, their second major language offering and the most popular language among Google’s own services.

Language specialists are building managed stacks on top of generic cloud platforms such as Amazon Web ServicesEC2. Engine Yard sells a custom, managed AMI optimized for the Ruby language and its Rails framework. Rackspace’s Mosso subsidiary and others optimize for the latest versions of PHP + MySQL, attracting performance-minded applications in search of a tuned cloud instance. I am not aware of any major language contributors of Ruby or PHP employed at either company but the platforms do attempt to find their own niche among a broad offering of scalable hosting providers.

Amazon’s EC2 is the most well-known cloud computing provider and, as previously mentioned, the baseline service for other companies building value-added solutions. The AMI, a machine image formatted deployment in the Amazon cloud, is the basic building block of EC2 virtualization and the primary interaction point of Amazon’s customers. Amazon resells premium operating system and application packages on behalf of companies such as Microsoft, IBM, and Oracle but it’s possible such specializations will instead be absorbed by the software publishers themselves as they roll out their own hosted clouds (such as Azure or IBM Blue Cloud).

The cloud computing software stack is trending towards an integrated, managed experience maintained by some of the top contributors to each programming language and related components. More generic cloud platforms will need to stay up-to-date with managed technologies on their platform and/or establish a strong reseller relationship to more specialized cloud managers.

The managed cloud stack

Cloud computing stack

Managed cloud providers handle an entire stack of infrastructure needed to deliver web applications at scale. A solid cloud computing environment abstracts the basics of a computing environment away from the implementors and lets them focus on adding value with each new application. Managed cloud hosting providers need to offer the following basic layers to stay relevant in a web developer’s world.

High availability

Any web application needs to be available to legitimate visitors from all over the world. A true cloud creates spans the entire globe, defeating the speed of light on behalf of its customers with a server point of presence in multiple simultaneous locations. The cloud provider needs to effectively receive and route incoming requests to the appropriate virtualized application instance on behalf of its customers.

Google and Microsoft replicate each application instance to multiple physical locations. AT&T Synaptic Hosting spans multiple locations for its enterprise customers.


Web applications should be protected from intrusion and abuse at the network layer. In a cloud computing world application security is a lot like click fraud in advertising: every bad action carries a marginal cost. Cloud providers need to guard customers against potential external abuse and intrusion.

Google, Microsoft, and Amazon have their eyes on many incoming requests each day. Google serves App Engine requests off the same hardware handling Google Front End, keeping bad requests away from search, ads, and your apps.

Stable, efficient OS

Web applications rely on a stable, efficient operating system to interface with hardware, manage filesystems, and allocate resources. The cloud server operating system is a stripped down version of standard installations without a need for direct hard drive interfaces or other peripherals.

Amazon EC2 AMI quick start

Amazon EC2 highlights the operating system behind every machine image. Older versions of Fedora and Windows Server are the default “quick start” options available to each new account. Google and Microsoft clouds run on custom operating systems tailored for web use. Windows Azure is a stripped-down version of the latest Windows Server. Google runs a Linux-based OS tuned by its infrastructure team.

Programming Language Business Logic

Every managed cloud platform includes a dynamic language virtual machine and an appropriate web services gateway. Language functions too closely associated with the parent operating system and its libraries are stripped away, leaving only a pure operating environment for a machine interpreter. External dependencies such as GNU tools and custom compilers will not function within the cloud language abstraction layer. Cloud services bundle a dynamic language runtime into an easily spawned instance for standard and efficient interpretation across many application instances.

Google App Engine supports most functions of the Python language with additional support for the Django framework, WebOb, and PyYAML. Developers may replace these built-in libraries with newer or customized versions at an additional performance and usage cost. App Engine passes web requests into the programming language environment through the Web Server Gateway Interface.

The cloud client layer

Attached storage

Cloud applications don’t operate in a vacuum. Dynamic applications persist their application state and logic through database and file storage. In the cloud world the database and the file server are cloud services unto themselves, operating in an isolated and specialized layer. This isolation makes the storage layer swappable from the rest of the cloud stack and presents new opportunities for competition.

Static files fall into two major categories based on their planned consumption. Files under 1 MB in size can be consumed by most clients in a single request, matching the expected simple request/response model of the platform. Files over 1 MB in size need to be broken into more manageable parts, or ranges, for a sequenced download. Static cloud storage can be broken up into differing solutions by file size or file type, providing the best possible solution for the storage and delivery task at hand.

Google App Engine offers static file storage separate from its dynamic runtime. App Engine supports up to 1,000 files and has a 10 MB HTTP response limit.

Amazon Web Services offers static file serving through its Simple Storage Service (S3) origin server and CloudFront CDN services. Amazon allows private and public file storage and can even charge individual users of third-party services for their use through DevPay.

Attached storage is by far the most diverse service offering for companies evaluating a specialized solution. I prefer storage providers with widely supported file management APIs, smart settings for MIME types and caching HTTP headers, and a primary functionality of serving files our to the worldwide web. I expect popular storage providers will bundle more CDN services in the future through an exclusive up-sell partnership. I also expect a new class of storage middleware optimized for minimizing files, cleaning up images, or transcoding video will set up new programmable front-ends backed by popular storage providers.

Database storage

Databases are the preferred way of persisting structured data powering web applications. Cloud service providers have tuned and rewritten database functionality for the cloud, opening up new opportunities for scalable data services across multiple dynamic application instances. Cloud databases are distributed, replicated, and largely transactional. Cloud databases can be separated from the rest of the cloud stack through RESTful APIs between different vendors but there is a definite latency advantage to coupling of data and its interpreter.

Microsoft offers SQL Server as a web service as part of the Azure services stack. Google App Engine offers Megastore, an abstraction layer on top of BigTable, as a service API within an App Engine instance or as a separate remote API. Amazon’s SimpleDB brings together EC2 processing with S3 data storage. Greenplum offers PostgreSQL as a stand-alone cloud offering.

Cloud databases are typically more limited in functionality than their local counterparts. App Engine returns up to 1000 results. SimpleDB times out within 5 seconds. Joining records from two tables in a single query breaks databases optimized for scale. App Engine offers specialized storage and query types such as geographical coordinates.

The database layer of a cloud instance can be abstracted as a separate best-of-breed layer within a cloud stack but developers are most likely to use the local solution for both its speed and simplicity.


Our web applications receive multiple requests for the exact same resource. We should be able to place a pre-assembled version of our web pages, images, and XHR data into a local memory cache for fast serving on multiple requests. On our own servers we frequently use memcached, Varnish, Squid, etc. The cloud stack should include a storage cache as its first layer of request processing.

Google App Engine includes a memcache API written by Brad Fitzpatrick, creator of memcached. Windows Azure will supposedly support Velocity caching in the near future.

Cloud consumers

Building corporate clouds

The target market of a cloud computing platform will affect its stack completeness, feature sets, and future support. Cloud terminology seems to be thrown around as a magical buzzword but there are major usage cases emerging.

Web application developers

New web applications start small and may sometimes experience exponential growth on a worldwide basis. Web developers evaluating the cloud stack are likely starting from scratch without the concerns of switching from a legacy system or alternate implementation.

Cloud computing abstracts tiered architecture, operations planning, and other nuances from companies specializing in bring new ideas to market quickly. Web developers prefer a cloud stack tuned for fast web performance. Geographically distributed dynamic instances are important at least as an upgrade option to protect a new business from a rewrite at varying levels of scale.

I believe cloud providers offering a complete managed stack will attract web development specialists to their platform. Google App Engine, Mosso, and Windows Azure compete in this space.

Back office tasks

Enterprise applications are moving out of the local server closet and into the cloud. Medium- to large-sized companies are replacing in-house maintenance of machines and applications with software and infrastructure as a service. Project management, employee tracking, payroll, and many other common functions have made their way into the software-as-a-service realm. More customized applications will migrate to cloud hosting and take their place alongside the anchor tenants of the groupware and collaboration suites.

Windows Azure, Salesforce‘s, and Google App Engine show strong promise as integrated back office add-ons. Microsoft and Google already have a solid footing in enterprise groupware services through Exchange Online and Google Apps respectively. can be closely tied to the popular Salesforce CRM application for sales and marketing teams.

More generic back office functions can operate on any cloud hosting provider with a properly maintained disk image. A new class of hosting provider operates as an abstraction layer between multiple clouds by maintaining the appropriate images and deployment scripts for any given task. Companies such as Aptana, CohesiveFT, RightScale, and many others span multiple cloud hosting providers with a single management interface. Cloud management companies can monitor multiple providers and create spot pricing market for computing resources.

Back office solutions represent the largest possible growth area for cloud hosting providers. Platforms with strong existing anchor tenants can add on new services combining software-as-a-service and infrastructure-as-a-service. Generic cloud hosting providers will likely be tapped for general tasks directly or though a cloud management layer.

Microsoft is promoting its cloud hosting solutions through its partner channels. Microsoft partners receive a 12% commission on the first year of revenues and 6% commission on all future revenues. Google offers a 20% discount to Google Apps Authorized Resellers over the life of the account.

Excess capacity

Hosting solutions need to scale up to meet peak demand. Peak demand could occur for an hour each day, one day a year (Black Monday in the retail sector), or one month out of twelve (college basketball playoffs). Cloud computing lets businesses pay only for what they use when they use it. Servers are not sitting around in your datacenter depreciating in value and consuming resources while you wait for peak load to occur.

Excess capacity needs may be predictable and cyclical, allowing a business to integrate cloud computing into their computing workflow with ease. Generic cloud computing platforms offer the best migration costs as businesses clone their own local machine images for execution in a cloud computing environment.

Disaster recovery

Business operations need to stay online when catastrophe strikes. An earthquake in California, a hurricane in Florida or Texas, or a power outage anywhere in the world could knock your business offline instantly. A hot backup in the cloud spins up when your primary site is down. An on-demand backup facility is a lot cheaper than physical investments as companies invest in contingency planning.

Amazon Web Services recently introduced reserved machine instances for companies who must be absolutely sure they will be able to operate in an environment of strained cloud capacity. Reserved instances receive priority allocation of cloud resources in exchange for an upgrade fee and lower monthly usage charges. Reserved instances are the VIP treatments of the cloud hosting world.

Demand response programs are common in utility sectors such as electricity. Businesses can opt to be the last ones kicked off the grid in a low-capacity environment in exchange for higher consumption costs.


Cloud computing is picking up steam and there are a few early winners. The most promising solutions from large vendors are still in a technology preview stage but should be open for general use by the end of the year. Startups developing new applications should pick the best solutions provider based on the strength of their stack offering and usage pricing. Some cloud layers can easily be abstracted to best-of-breed solutions.

I hope you enjoyed this summary of the world of cloud hosting! There is a lot going on and this post just scratches the surface of how our computing world is changing.

Google App Engine 1.1.9 boosts capacity and compatibility

Google App Engine logo

Google App Engine released hosted platform version 1.1.9 earlier this week with big boosts in capacity and compatibility. The new App Engine supports standard HTTP libraries, larger files, triples the response deadline, and removes limitations on CPU-intensive processes.

Standard HTTP libraries

App Engine now supports Python’s standard web requesters urllib, urllib2, and httplib. Programmers on App Engine were previously required to use Google’s proprietary urlfetch API, which still provides the best integration for the Google request gateway. Support for standard Python request libraries means better compatibility with open source libraries developers would like to include in their web application.

30 seconds or die

Mechanical stopwatch

App Engine scripts now have up to 30 seconds to respond to any incoming request, a big change from the previous 10 second limit. I often request data from the web, process and interpret results, write to the datastore, and then issue a response to the consuming agent. The extra headroom opens up new possibilities for data processing, especially in a programming environment without queues or background tasks.

10 MB response sizes

App Engine processes can now send and receive files up to 10 MB in size, a 10x boost from the previous limit of 1 MB. Need to share a podcast, PDF, or large image? It’s now possible. Developers can deploy up to 1000 static files of up to 10 MB each, creating up to 10 GB of geo-distributed static file storage per App Engine instance.

No more high CPU limitations

App Engine scripts are no longer limited to 2 CPU-intensive requests per minute. Scripts are still limited to 30 active dynamic connections at any given moment (under the free plan). Processing power is made available on-demand for up to the full 30 seconds of your process.

Datastore now supports IN operator

Google App Engine’s datastore now supports the IN operator for Megastore queries. You can now query on a list of values instead of chaining data requests through a for loop, a big efficiency change for me.


Google App Engine raised its capacity this week and opened up new possibilities for developers creating web applications on its service. The platform is still in free “preview release” without the ability to purchase additional processing headroom but the per-process restrictions have loosened up a lot.

Economic anecdotes from SF restaurants

The macroeconomic climate of a global marketplace is on everyone’s mind these days, including small business. On Friday evening I had drinks with owners of two well-known restaurants in San Francisco. Our conversation turned to business, marketing, and what changes (if any) are occurring within the service industry. In this post I will share a few trends observed in the front lines of the San Francisco food and beverage industry that may apply to broader business.

More diners in 2009

The total number of diners is up year-over-year. People are connecting with friends over food, often bringing small groups into restaurants. Splitting plates has become more common, as have split bills paid with a credit card. Tips are increasingly paid with a credit card as well, leading to more income tax reporting for the recipients.

Menu options

Restaurants have introduced new menu items targeting diners at both ends of the price spectrum. Wine is available as a taste, glass, or bottle in 75 mL, 150 mL, and 750 mL portions respectively. The “taste” offering is becoming more popular and sometimes leads to additional beverage purchases.

Popular dishes are sometimes offered in smaller portions to encourage a food purchase from the bar or more casual customers. Restaurant owners I have spoken with view this method as a hook towards repeat visits.

Daily or weekly specials have proven a good way to extract additional revenue from the high-end of the price band. Rib-eye steaks, a catch of the day, or rotating dessert specialities encourage a spending stretch for special offerings.

Partnerships with San Francisco’s Visitors Bureau on prix-fixe offerings was unpopular with the owners I spoke with. Partnerships with media outlets or trade associations around a particular featured ingredient has worked well (e.g. celebrate the month of bacon).


When given the choice between hiring more people or working more paid hours the staff has taken the additional hours. Restaurants hiring new staff have been overwhelmed with applicants for both cooks and servers and pleasantly surprised with the quality of talent.

Lessons for web companies

Pricing tiers extract revenue from different classes of self-service customers. Get Satisfaction and 37signals have shown good iterations over this concept.

Advertise special options for larger customers. “Pro” or “custom” customer levels can attract the big deals and bigger spenders.

Communication between managers and staff addresses uncertainty. They are the experienced workers adding value to the business and possibly cutting costs.

How will Twitter make money?

Twitter cash in bird's nest

Micro-blogging service Twitter will celebrate its third birthday in March and may have a revenue model to support the company over the long-term. Last month Twitter CEO Evan Williams told Kevin Maney of the company will kick off new revenue streams by March 2009 to avoid raising another round of venture capital funding. Twitter’s deeply engaged community would love to see a sustainable business develop around the site, its services, and the community. In this post I will take a deeper look at Twitter and its revenue potential as publicly hinted by its founders.

  1. What is Twitter?
  2. SMS revenue
  3. Brand monitoring
  4. Summary

What is Twitter?

Blog platform

Twitter is a hosted blogging platform that limits blog posts to 140 characters or less. Short Twitter messages were designed as an archived status message communicated to friends via instant message, text message, or even radio bursts. Twitter’s 140-character message fits within a SMS message’s 140 octets for UTF-8, a design contraint that has spurred new content creation with little effort.

Feed reader

Twitter builds feed reading directly into the blogging service. Members subscribe to other Twitter accounts (“follow” in the Twitter vernacular) to receive updates in a centralized timeline. Twitter members create new posts referencing other members and their posts or comment privately via direct message, creating a publishing system that feeds off each uniquely assembled list of content.

Twitter also operates a near real-time search engine against its public content. The search engine receives direct updates from the Twitter blogging service over a streaming API interface nicknamed “the firehose.” Twitter currently allows trusted external partners to consume this full streaming update of its data. Other consuming clients are limited to a public timeline snapshot of 20 Twitter updates every 60 seconds.

Twitter Services Inc.

Technically there are two Twitters on record with the state of California: Twitter Inc. and Twitter Services Inc. The difference may be a remnant from Twitter’s beginning in incubator Obvious Corp. or it could be a sign of the company’s plans to operate its services business separate from the blogging platform.

SMS revenue

Twitter SMS revenue share illustration

Twitter currently pays vanity short code fees and SMS fees for mobile-terminated and mobile-originated messages in the United States and Canada. Twitter accepts mobile-originated messages to a long number in England. An intermediary such as Sybase 365 sits in between Twitter servers and the cellular networks of AT&T, T-Mobile, Sprint, Verizon, Bell Canada, Rogers, and more. A subset of Twitter users take advantage of SMS update notifications but this feature is currently a drain on Twitter cash reserves.

SMS fees typically reverse at volume, with carriers paying application providers for SMS revenue generation. A carrier such as AT&T might charge 5 cents per SMS and pass a fraction of a cent per message to large applications such as Yahoo! or Google. Twitter could be approaching a SMS volume that provides bargaining power with carriers for mobile-terminated and mobile-originated message revenue domestically and mobile-originated message revenue internationally. New hire Kevin Thau, Twitter‘s Director of Mobile Business, will likely be creating closer partnerships with mobile carriers to lower Twitter’s SMS spend and possibly extract revenue from this feature.

Brand monitoring

Twitter has publicly mentioned future revenue extraction opportunities from businesses on Twitter. Dell directly measures revenue generated by its multiple Twitter channels. Comcast offers “digital care” for its customers through a special Twitter account. Whole Foods discusses groceries with over 22,000 subscribers. Twitter has yet to extract value from the thousands of brand connections it currently enables. Brand monitoring will likely be the first corporate product from Twitter.

Twitter dashboard mockup

Brands monitor conversations on Twitter through specially-crafted search queries. Jet Blue might track all conversations around its brand, for example, likely leading to customers or potential customers experiencing issues in need of real-time solutions or clarity. Twitter does not currently provide much information about the people behind such brand mentions directly in search results, leading to a extra required steps for brand managers. What might brands want in real-time brand management dashboard for Twitter?

  • Better context around the person behind each brand mention. Who are they, where do they live, and how many people subscribe to their updates on Twitter?
  • Easy integration with existing CRM and customer support systems. Companies should be able to track existing customers on Twitter by importing a list of e-mail addresses.
  • Brokered communications channel (direct messages) to Twitter members who are not already subscribed to the brand’s Twitter account. Similar to LinkedIn‘s paid inMail feature.
  • Account analytics. Twitter does not currently share the number of times a particular message or profile was viewed, even in channels it controls such as web or widget views.
  • Sponsored account suggestions. Twitter suggests other members you might want to follow. These suggestions appear, or have appeared at some point in the past, on the Twitter homepage, public timeline, and individually-tailored suggestions. Twitter has even experimented with small textual call-outs linking to another account in every member’s sidebar. These sponsored listings are currently untapped, under-marketed, or both.

Potential competition

The limited data sharing of full Twitter updates limits the open competition to data services developed by Twitter in-house. Nielsen BuzzMetrics, Biz360, TNS Cymfony, and many others currently tap into broad social media streams for brand insights. Twitter could compete with these dashboards in a more real-time environment, resell their solution to larger monitoring firms, or both.


Twitter plans to roll out revenue-generating services to serve corporate customers within the next two months. The company is also rumored to be raising additional capital that could extend its runway before Twitter needs real revenue under its wings. I believe SMS revenue-sharing and brand monitoring will provide Twitter revenue in 2009 with additional paid features made available to enthusiast users later this year. Twitter’s best path to realized revenue may be an acquisition but large companies always like a bird with wings.

Facebook v. Power Ventures

Facebook v. Power Ventures

Facebook filed eight legal complaints in United States federal court against Power Ventures, operators of social aggregator (story via NYT Bits blog). Facebook claims Power collected Facebook usernames and passwords, stored Facebook data on their servers, used the Facebook trademark without license, sent e-mails posing as Facebook, and knowingly circumvented Facebook’s attempts to block access. The lawsuit, filed on December 30th in San Jose, comes one month after Facebook initially contacted regarding its violation and attempted to transition Power to an acceptable method of access: Facebook Connect. is headquartered in Rio de Janeiro, Brazil with additional offices in San Francisco and Hyderabad, India. Power raised $8 million from Draper Fisher Jurvetson, DFJ affiliate FIR Capital, Esther Dyson, and other investors. Facebook is seeking triple damages for willful violation including all revenue generated by in the month of December. Facebook may be able to claim $10,000 for each Facebook account accessed by Power under California Penal Code section 502 due to repeat violations.

  1. The password anti-pattern
  2. Social data distribution
  3. Dispute timeline
  4. Tips for business partnerships
  5. Summary

The password anti-pattern

Facebook login bar

Collecting Facebook usernames and passwords is at the heart of the dispute. impersonates a Facebook user after collecting their username and password. The site imports friends lists from Facebook and other social providers to create a meta profile for its over-networked members trying to keep their many personas in sync. Facebook Connect, announced in May and available for beta testing shortly after, provides account linking between Facebook and other sites, SSL transport, and friend imports. Facebook Connect limits the data flow of Facebook user data in ways a direct login would not. assumed full user powers as a remote agent of a Facebook user instead of an authorized proxy to accomplish its own goals and violated Facebook terms of service in the process.

I covered some of these data portability issues and best practices in my Data Portability, Authentication, and Authorization post last year.

Social data distribution

[T]he sole end for which mankind are warranted, individually or collectively, in interfering with the liberty of action of any of their number, is self-protection. That the only purpose for which power can be rightfully exercised over any member of a civilized community, against his will, is to prevent harm to others. His own good, either physical or moral, is not a sufficient warrant…In the part which merely concerns himself, his independence is, of right, absolute. Over himself, over his own body and mind, the individual is sovereign.

John Stuart Mill, On Liberty

Modern society mostly allows people to commit self-harm as long as that action is not also harming others. Facebook restricts access to another person’s member data beyond the original intent that person’s sharing. New data use must explicitly receive permission to participate in shared data beyond the walls of (you may invite me into this new context but I am not automatically imported). Data is shared within a friend context on Facebook with the understanding such information is protected and may be limited to only a group of approved friends. Once that friend data starts propagating outside its initial use (by a Facebook member or Facebook itself) the trust associated with sharing data is violated. If you have ever thought twice about posting an e-mail address on a web page out of fear of automated data harvesters you have experienced communicating with a known community of site visitors versus other uses. Facebook wants to be an identity hub of real data about real people and takes certain steps to protect that data exchange. knowingly violated the Facebook Terms of Service and encouraged Facebook members to do the same.

Dispute timeline launched to a United States audience on December 1, 2008. The site previously focused on the Brazilian market with support for Flog√£o and Google-owned Orkut since launching in August. Facebook contacted on December 1, according to the lawsuit, notifying the team of their terms of service violation.

Power Ventures CEO Steven Vachani responded to the Facebook inquiry on December 12 (11 days later) promising to delete all existing Facebook data stored on servers and implement Facebook Connect as a replacement by December 26. The next business day Facebook acknowledged the e-mail and waited for confirmation of data deletion and Connect switch-over. Vachani confirmed the transition progress on December 22 (4 days before the supposed switch).

Vachani e-mailed Facebook legal council after the close of business on December 26 and communicates a “business decision” not to comply with Facebook’s request to stop collecting and storing Facebook logins on Vachani claimed the site would implement Facebook Connect but such integration would take over 5 weeks to complete. kicks off a “launch promotion” that same day with a $100 reward for the Facebook user who invites the most friends to join Power using their Facebook credentials. Facebook implements an IP-address block against servers on the evening of December 26 to prevent further abuse. circumvents the IP-block by Facebook and continues its marketing campaigns. Power sets up a Facebook event page to promote its $100 signup give-away and uses the existing Facebook accounts in its system to send event invites to friends lists.

Facebook took legal action against Power Ventures on December 30, one business day after the Christmas holiday weekend, to prevent further abuse after civil discussions obviously broke down. Facebook accused Power of trespassing on Facebook servers in San Jose (a modern form of ToS violation), spamming Facebook members (violation of CAN-SPAM), and knowingly circumventing data protections (DMCA), and unlicensed use of the Facebook trademark.

Tips for business partnerships

Power Ventures could take proactive steps to look like a legitimate, responsible business in the eyes of potential business partners such as Facebook.

Create a meaningful WHOIS record domain data currently lists “DiscountDomainRegistry” as a technical contact. “Power Assist Inc” is listed as a registrant and “Leigh Power” is listed as an administrative contact. Not good identity management.


If you are going to collect member login credentials from other sites you should at least use a SSL certificate for more secure data transfer. Self-sign if you must, but $30 will buy you a certificate recognized by major browsers. If you can afford extended validation certificates and the verification process that entails, even better.

Register your company with the partner website

Facebook allows its members to join one or more corporate networks. Register your company on Facebook and at least associate executive and developer accounts. This additional verification step helps Facebook identify your employees. Other social networks have similar verification and associations.

Power Ventures is not listed in the Facebook corporate network directory.

Summary violated Facebook terms of service by accessing and storing Facebook member data on its servers. Facebook immediately contacted Power regarding this violation and attempted to work with the site as they transitioned to the official data API, Facebook Connect. Power reneged on their agreement hours before promised delivery and immediately launched a marketing campaign to financially reward further violations. Facebook decided enough is enough and blocked Power through technical measures followed by legal measures when the site did not comply.

I have little sympathy for Power and its actions. I hope other sites violated by such as Google, Microsoft, MySpace, and Hi5 put a stop to websites like Power harvesting user data instead of using permitted access methods such as OAuth. Locating your business in Brazil with servers in Canada and development in India does not shield companies from the consequences of abusive practices.

2008 in review: iPhone apps

Content developed exclusively for the iPhone helped web publishers rethink content display beyond the desktop browser. Reimagining web content for small screens with bandwidth, latency, and interaction constraints provided publishers with an introduction to widget concepts and a broader web strategy. The potential audience of an iPhone web application shows up directly in website server logs, providing direct and actionable data in ways other widget options such as MySpace or Vista gadgets just can’t match. The iPhone also provides access to a relatively affluent user base capable of paying $200 for a new handset and at least $70 a month in service fees. I expect more web companies in the United States will develop specialized content for high-end mobile handsets in 2009. The recent launch of BlackBerry OS 4.0 from Research in Motion and Android on HTC may spark new interest for publishers interested in business or youth usage respectively.

Web app vs. native app

AP News iPhone application

iPhone web applications now have access to home screen icons, full-screen view areas, database storage, hardware-accelerated transitions, and a larger file cache in iPhone OS 2.0. Web development teams with experience in HTML, CSS, JavaScript will need to upgrade their understanding of current HTML 5, CSS 3, and other features present in a modern, stable browser platform but these new skills are also applicable to the main business of running a website. The shared mobile WebKit base of iPhone, Android, and BlackBerry also allows some code reuse between mobile platforms.

Urbanspoon iPhone restaurant selection screen

iPhone native applications are written in Objective-C, compiled, and distributed primarily through Apple’s iTunes App Store. Native applications have access to native Apple libraries including the user’s current location, address book, and even Bonjour networking. Websites such as Urbanspoon, Pandora, and SmugMug have captured a new class of users in the mobile space through their native iPhone apps.


I hope to see a lot more iPhone web applications in 2009 as web publishers leverage the skill sets of their existing web development teams. I expect many contracted native applications will go stale as the iPhone OS continues to upgrade, evidenced by a shift to push notifications, and web publishers separate platform flirtations with long-term interest. The iPhone has ignited new enthusiasm for mobile development in the United States which may carry over to BlackBerry and Android handsets in 2009.

OpenSocial REST for social data interchange

OpenSocial is best known for its social applications: canvas and profile views powered by JavaScript and Flash. Applications and widgets are just one part of the full OpenSocial offering. Over the past few months the OpenSocial spec has grown to include JSON, Atom, and XML outputs over a RESTful interface. OpenSocial containers MySpace, LinkedIn, and Plaxo already expose social data over these protocols, with additional support from large networks such as and Yahoo! expected in the near future.

OpenSocial server client illustration

Exposing data over OpenSocial REST foramts is not limited to widget containers. Social web apps such as Flickr, Twitter, or even Facebook could support OpenSocial data standards without ever adding OpenSocial application support to their web pages. Last week I turned TwitterFE into an OpenSocial RESTful container, opening up Twitter data for OpenSocial clients. OpenSocial 0.9, scheduled for release on December 19, will help solidify these new protocols across containers (I found many errata in 0.81 and I am pushing for changes in 0.9). In this blog post I will provide a brief overview of OpenSocial RESTful protocols and its data implementation for any website interested in standardized descriptors of social data.

  1. OpenSocial background
  2. People
  3. Activities
  4. Advertising OpenSocial support
  5. Summary

OpenSocial background

OpenSocial applications request and interpret applications via JavaScript requests. An application might request profile data on the logged-in member viewing the app, write a new story to a member’s social news feed, or store custom data such as a member’s favorite color. These data objects are called Person, Activity, and AppData respectively. Each of these data objects contain a minimal-set of required information and a long list of optional data that varies by implementation. Most social applications store a member ID, username, profile photo, and a profile URL, for example, but specified views on romance or religion are less common.

Yet OpenSocial isn’t just for widget containers. Social web apps can export and import user data via anonymous and/or authenticated requests. You just have to speak the language of OpenSocial data to achieve fluid data interchange between servers. Data requests may occur with or without a login but additional data may be exposed to requesters with proper OAuth credentials for a particular account.


People are the center of a social network experience, connecting us to new data and interactions. OpenSocial maps common profile components across containers including e-mail addresses, profile pictures, location, and member bios. Friends lists are a collection of people objects mapped to a particular owner.

The only required person data from a container are a display name and a container-specific identifier such as the numeric auto-increment id you are storing in your users table. Websites need to stick to a specific Person vocabulary to ensure compatibility across sites.

Portable Contacts and OpenSocial RESTful Person objects are wire-compatible formats. The specs are currently aligned but you might hear either one used interchangeably in conversations.


<person xmlns="">
  <displayName>John Smith</displayName>
    <formatted>John Smith</formatted>

In the above example I’ve defined some basic data about a fictional user of using the OpenSocial Person vocabulary in an XML format. Consuming agents can write a single interpreter for multiple OpenSocial containers and easily display, export, or annotate profile and friend data over this interface.


Activities are small application updates usually posted to a social news feed. When a member adds an event, posts a status update, uploads a photo, or takes some other action websites usually write a new activity into the member’s feed. These actions are normalized in the OpenSocial context into a specific Activity vocabulary.


<activity xmlns="">
  <title>Updating my MySite account</title>

The above example normalizes a text-based status update into an OpenSocial activity expressed in XML. I can post this message into an OpenSocial activity stream, open up export capabilities for members, or interface with a wider array of applications (desktop, mobile, etc.) that already support activity stream display.

Advertising OpenSocial support

OpenSocial RESTful resources are described using XRDS-Simple. If you have used OpenID or OAuth you’re likely already familiar with this markup and discovery process. Agents can probe possible supporting containers for application/xrds+xml response support to receive a full descriptor set.

OpenSocial REST containers advertise supported data objects by the object’s name. A type of* advertises RESTful support, data objects available (person, activity, etc.), possible query types, and a hint of specversion (currently “2008”). You might choose to support some or all of the OpenSocial data objects and your XRDS document will serve as the central discovery resource for such data.


OpenSocial is about more than just widgets and applications rendered in a web browser. The project exposes standardized interfaces and object descriptors for social web components while offering interoperability with very large social networks around the world. Any social website can allow public and private access to member data using OpenSocial RESTful protocols and responses. You will open up new API opportunities, allow import and export of data between sites, and even expose more granular data to crawlers such as Google (if you choose). Interesting stuff that’s just getting started.

Rewriting Twitter for web best practices

TwitterFE screenshot

Last week I decided to rewrite the front-end on Google App Engine to incorporate modern front-end programming best practices, exceptional performance, and establish a solid platform for further development. is a fully-functional read-only clone of designed to make your web browser sing. I created the site as an example of web development best practices anyone can integrate into their web presence.

The new web front-end on features localized templates, expressive markup, distinct URL structures, integrated site search, geo-distributed dynamic and static servers, and more available features than In this post I will outline some of the changes I’ve applied to the Twitter front-end reproduction as they apply to general front-end web development.

  1. Global audience
  2. Unique usage models
  3. Consolidate URLs
  4. Expressive markup
  5. Split the page load
  6. Know your cache settings
  7. Review access controls
  8. Expose site search
  9. Summary

Global audience

Twitter most popular countries

I added a localization framework to the Twitter front-end to enable site content delivered in multiple languages. According to Google Trends Twitter’s top regional languages are English, Portuguese, Japanese, Chinese, German, and Spanish speaking regions. I isolated the site’s template strings and translated all key phrases into Spanish. Visitors with an Accept-Language header of es will receive template strings in Spanish.

TwitterFE Spanish language example

The most difficult part of localization is isolating your template strings and choosing common wording across the site. Twitter uses the terms “person,” “user,” “account,” and more to reference a profile owner. Websites need to pick common concepts to explain their site interactions before requesting translations.

Modern websites rely on crowd-sourcing to translate a site into new languages. Porting a web application to your native language is a point of pride for many communities. Something as simple as “favourites” instead of “favorites” for the Brits could help create identity around a product in other countries. Facebook Translations and Google in Your Language are just two examples of large localization efforts led by an engaged community.

Unique usage models has three main visitor interactions: anonymous visitor, annotator, and author. The current Twitter website loads resources for all possible interaction types, weighing down the page and interfering with the intended experience. I tore the site down and started from scratch, building up each interaction model starting with the anonymous visitor.

Anonymous visitor

Twitter logged-out view

The anonymous, public visitor is anyone browsing Twitter content in a non signed-in state. This audience typically makes up the majority of site traffic and includes both humans and search engines. Websites need to clearly and concisely communicate content to this new audience who is likely inexperienced with your site while quickly while smartly driving business objectives such as member signups or advertising.


Twitter Al Gore example

The annotator is a logged-in member browsing site content with an opportunity for annotation. They may want to discover new social network friends, mark content as a favorite, or otherwise engage with an existing content. Annotations are typically short and asynchronous, posting new associations between an account holder and a unique content identifier. The majority of pages presented to logged-in users on follow this annotation interaction model.


Twitter status update field

Twitter is a message authoring platform. Logged-in users may publish new text updates from their homepage while browsing other subscribed content. Authors type into a text area and receive real-time feedback on authoring limitations while they type. The author commits new content to Twitter’s servers after hitting a submit button, and the service responds with a confidence indicator for accepted updates.

Consolidate URLs

How many possible URLs represent the same content on your website? Websites should avoid duplicate content spread across multiple paths, subdomains, and protocols. There should be one strong public-facing match for your distinct content.

Which one of the following URLs represents the profile page of Twitter CEO Evan Williams (username ev).

  • and more…

Websites need to pick a winner and funnel visits into that best representation of content. Search engines are crawling multiple versions of Twitter right now and splitting authority between many different options.

Be aware of URL propagation when you introduce new subdomains and schemes with relative URLs inherited from a common template. You might be buying new servers to keep up with a crawl load across your millions of pages as a result.

Expressive markup

TwitterFE uses an xHTML vocabulary to express content, CSS for positioning and styling, and JavaScript for progressive enhancement and interactions. Gone are the table-based layouts of and its heavy DOM footprint. Resources are split into dynamic and static content and served from geographically-distributed datacenters for optimal performance. Twitter currently stores static assets such as profile pictures on Amazon S3 and does not use a distributed CDN to address speed of light issues.

Comparing a user such as Al Gore on TwitterFE vs. Al Gore on shows a 41% difference in required resources sent over the wire (89 vs. 152 KB). The new site also reduces the total DOM footprint for faster parsing, layout, rendering, and addressing.

Microformats expose unique structured objects within each page such as people, relationships, and feed mappings. Search engines such as Yahoo! tap into microformat content to expose deeper information about a page. I cleaned up microformat support on Twitter pages and added support for Internet Explorer 8 Web Slices.

Expressive markup helps web browsers and search engines better understand the content within your pages. Sites can fully utilize xHTML vocabulary sets independent of default styling to best define the content and the rendered display of each page.

Split the page load

Web pages should respond quickly with progressive enhancement added after the main page content renders on the page. We might, for example, load a page and then apply search field listeners, autocomplete, or menu expansions as a second wave. Splitting our pages into “must have” and “nice to have” segments helps us deliver core content quickly while still providing the on-page interactions and magic sprinkles that thrill our visitors.

Twitter’s profile pages load 36 of the profile owner’s following list onto each page. That’s 36 tiny little 700 byte profile images all waiting in line for a remote connection and display on page. I tripled the total number of displayed member pics but loaded the list asynchronously after the rest of the page finished loading. I can pre-fetch these components into cache on the original profile request and respond very quickly to the async request after page load.

Know your cache settings

How long should browsers and other requesting agents hold on to a piece of content before requesting a fresh copy? A frequently changing profile page might expire its HTML every 5 minutes or so while static assets such as a site logo or icon should be kept in browser cache for a long period of time instead of requested with each page. In some cases Twitter sets image Expires headers 5 minutes into the future, slowing down pages and increasing bandwidth costs for the company and its visitors.

Review access controls

Some sites split pages into public-facing and login-required access models. Twitter places pages such as a following list or a full-sized profile picture behind a login screen while exposing the same data over their APIs without such restrictions. is losing search engine exposure and logged-out user browsing capabilities due to these inconsistencies in implementation, not policy.

Expose site search

Twitter OpenSearch example

The OpenSearch format exposes site search options to web browsers and search engines alike. If your website offers site search you should be lighting up the browser chrome with new search options for the given page. Twitter acquired a search company in July but has not exposed available search hooks in their main website’s front-end. Think about how you might want to scope a search to the currently viewed user account as well as an expanded site-wide option.


TwitterFE is a read-only clone of Twitter’s front-end that fixes many of my frustrations with the site’s front-end engineering and creates a new platform for future third-party development. Any site could roll these types of improvements back into their core services. Twitter APIs are full-featured enough I can clone the Twitter front-end without creating yet another stand-alone Twitter-like site.

There is a difference between a website or widget rendering in a browser and having the same site perform exceptionally well. Established web teams should revisit their web content to optimize experiences. is the result of one person working part-time for a week to re-write the front end of a website serving millions of monthly visitors. Similar lessons apply throughout the Web world.

I now have a new platform to develop features beyond what’s currently offered on If you’re an iPhone developer in need of a headless Twitter API proxy for push updates let me know.

What other front-end features do you wish established websites would invest time and effort to improve?

Syndication and Widgets Primer

The publishing world is continuously evolving, creating new opportunities for plugged-in companies to reach new audiences like never before. Today’s publishers need to think beyond the fixed location of their website and fully integrate with the large hubs of user activity on the desktop, mobile phone, social networks, blogs, and web pages at large. Syndication and widgets power new opportunities to carry content beyond the walls of a single site and into some of the largest brands in the world yet some publishers still haven’t gotten the message. I recorded a 1-hour video presentation earlier this week to better explain the syndication and widget landscape to web publishers. This summary document helped shape the Widget Summit program and new publisher opportunities.

A widget in its simplest sense breaks apart a website into its essential components, broadcasts those components to anyone who will listen, and reassembles the content on a remote system while tapped into local resources. It’s a bit like writing your website’s front end in a remote location powered by local assembly methods, cached resources, and rich interactions. I compare syndication and widgets to television broadcasting and international shipping: we build our products to exact specifications to take advantage of new audiences and standardized transports.

Graph of available audience populations of major widget platforms

Widgets let publishers take their content to the audience instead of waiting for the audience to come to them. Throughout my presentation I used Twitter as an example. Twitter attracted approximately 2.3 million unique visitors in the U.S. in September according to Nielsen Online. That’s a pretty strong audience but it’s tiny compared to the repeat daily activities of Facebook, MySpace, Google, or everyone reaching Windows Vista. We package up and redistribute our content to reach these larger audiences spending time away from our site instead of waiting for a new visit into a fixed domain.

In the presentation I broke syndication strategy into two major components: syndicate data using Atom and piece your data back together on major platforms of interest using widgets. I dive into a few example of major feed reading and widget platforms, and even spent some time on advanced topics such as contextual awareness.

I am still experimenting with creating presentations for online video distribution but I hope you enjoy these multimedia conversations.