I have developed a few web applications powered by Google App Engine since its launch in May. It has been a fairly easy transition from my traditional programming in Python and Django backed by MySQL to the distributed App Engine environment, Bigtable, and the limitations of each. I have learned a few App Engine best practices over over the past month and would like to share some best practices for App Engine development gained mostly through trial and error. In this post I will share data optimization tips for Google’s hosted Bigtable instance, reduce the errors and resource usage of your application, and add a few steps to your deployment checklist.
I program Django applications referenced by a set of short unique object labels named slugs. A slug column is uniquely queried across a model and easily indexed for fast scans. In the Bigtable world of Google App Engine slugs are optimally stored as a model’s key name. Key names are limited to 500 bytes and must be unique across your defined entity. This unique key lookup directly copies the entity into memory without needing to scan an entire distributed hashtable.
Entity key names provide very fast lookups for developers who like to plan ahead. You cannot alter the key name once it’s set and it cannot start with a number or underscores. If you can accept these limitations within your code you’ll experience an even snappier reads from your data store.
Reduce indexed columns
According to Guido, a 300 byte string stored as
Text is the same size as
String but without an index. If you have a short string you never query or sort you’ll optimize your data queries if it’s stored as
Define a favicon
App Engine developers should define favicon.ico, robots.txt, and other frequently requested file paths. Google App Engine logs frequent errors inside your administrative console if it has to hunt for your icon with every browser request.
Define the location of your static favicon file directly from app.yaml for fast response times:
- url: /favicon.ico static_files: static/favicon.ico upload: static/favicon.ico
You should follow a similar pattern for robots.txt and optionally the verification files from Google Webmaster Tools, Yahoo! Site Explorer, and Windows Live Search.
500 response templates
Your site is not perfect. Visitors will inevitably request pages that do not exist or generate an internal server error. Your site should define default templates for
500 status codes or risk displaying whatever is sitting on Google’s NetScaler.
The screenshot above shows an error page of an App Engine application without a defined
500 handler. A link on the page suggests a visit to Google’s support website where your visitors will find no support options of interest.
Deploy and request
Developers should prime Google’s distributed server networks by issuing requests for key URLs a few minutes after deploy. These automated requests trigger your memcache storage and distribute your app instance across Google’s distributed servers. The first request requires more CPU cycles and memory than subsequent requests as Google tries to prioritize active application instances and their versions. You can speed things up by always issuing one or more requests after a successful deploy.
This process is not unlike flushing and re-populating CDN PoPs with new content from your origin server or propagating dynamic handlers across your front-end cluster. It’s best to kick off the process early and have the latest version of your content waiting for new visitors on subsequent requests.
Google App Engine simplifies the scaling process but is not a magic cloud that will erase all latency and resource usage issues in your app. App Engine requires new approaches to data storage, data latency, and resource requirements in a metered and opaque environment. Hopefully my trials and experience will speed up your App Engine web apps as you create new services in the cloud.