Google App Engine optimizations

Google App Engine

I have developed a few web applications powered by Google App Engine since its launch in May. It has been a fairly easy transition from my traditional programming in Python and Django backed by MySQL to the distributed App Engine environment, Bigtable, and the limitations of each. I have learned a few App Engine best practices over over the past month and would like to share some best practices for App Engine development gained mostly through trial and error. In this post I will share data optimization tips for Google’s hosted Bigtable instance, reduce the errors and resource usage of your application, and add a few steps to your deployment checklist.

Key-based lookups

I program Django applications referenced by a set of short unique object labels named slugs. A slug column is uniquely queried across a model and easily indexed for fast scans. In the Bigtable world of Google App Engine slugs are optimally stored as a model’s key name. Key names are limited to 500 bytes and must be unique across your defined entity. This unique key lookup directly copies the entity into memory without needing to scan an entire distributed hashtable.

Entity key names provide very fast lookups for developers who like to plan ahead. You cannot alter the key name once it’s set and it cannot start with a number or underscores. If you can accept these limitations within your code you’ll experience an even snappier reads from your data store.

Reduce indexed columns

It’s tempting to choose a Datastore property by its input helper or based on names similar to a SQL equivalent. So what’s the difference between a short String and Text? An index.

According to Guido, a 300 byte string stored as Text is the same size as String but without an index. If you have a short string you never query or sort you’ll optimize your data queries if it’s stored as Text.

Define a favicon

App Engine developers should define favicon.ico, robots.txt, and other frequently requested file paths. Google App Engine logs frequent errors inside your administrative console if it has to hunt for your icon with every browser request.

Define the location of your static favicon file directly from app.yaml for fast response times:

- url: /favicon.ico
  static_files: static/favicon.ico
  upload: static/favicon.ico

You should follow a similar pattern for robots.txt and optionally the verification files from Google Webmaster Tools, Yahoo! Site Explorer, and Windows Live Search.

Define default 400 and 500 response templates

Your site is not perfect. Visitors will inevitably request pages that do not exist or generate an internal server error. Your site should define default templates for 404 and 500 status codes or risk displaying whatever is sitting on Google’s NetScaler.

Google App Engine default 500 page

The screenshot above shows an error page of an App Engine application without a defined 500 handler. A link on the page suggests a visit to Google’s support website where your visitors will find no support options of interest.

Django developers should define 404.html and 500.html in your app‘s templates directory. Django will load and render each file for the default page_not_found and server_error views respectively.

Deploy and request

Developers should prime Google’s distributed server networks by issuing requests for key URLs a few minutes after deploy. These automated requests trigger your memcache storage and distribute your app instance across Google’s distributed servers. The first request requires more CPU cycles and memory than subsequent requests as Google tries to prioritize active application instances and their versions. You can speed things up by always issuing one or more requests after a successful deploy.

This process is not unlike flushing and re-populating CDN PoPs with new content from your origin server or propagating dynamic handlers across your front-end cluster. It’s best to kick off the process early and have the latest version of your content waiting for new visitors on subsequent requests.

Summary

Google App Engine simplifies the scaling process but is not a magic cloud that will erase all latency and resource usage issues in your app. App Engine requires new approaches to data storage, data latency, and resource requirements in a metered and opaque environment. Hopefully my trials and experience will speed up your App Engine web apps as you create new services in the cloud.