Feed publishing best practices

Web feed syndication is made up of two base vocabularies: RSS 2.0 and the Atom Syndication Format. These base vocabularies are extended using namespaces to create a common set of expressions for your web feed data. In this post I’ll walk through some best practices for publishers syndicating their data via web feeds.

Should I use RSS or Atom?

The RSS 2.0 syndication format has been around for about four years and over that time it has been used by web publishers large and small to represent their data for syndication. The New York Times publishes its top stories via RSS to deliver updates to readers with appropriate viewing software. NPR distributes audio attachments commonly referred to as “podcasts” using RSS enclosures to iTunes and other specialized subscription programs.

The Atom Syndication Format was released in December 2005 under the standardization process of the Internet Engineering Task Force (IETF). A few popular uses include Google GData for API responses, FeedBurner resyndication, and Six Apart blogging products.

Choosing RSS or Atom for feed syndication is a bit like selecting GIF or JPEG as your image format: publishers have preferences for the best representation of the original data but most renderers support both. There are a few easy answers however. If you syndicate audio or video in your feed, RSS offers more reliable compatibility across deployed players. If you would like to use your feed as a lightweight API or present data for government consumption, Atom should be your format of choice.

Extended vocabularies

RSS and Atom take advantage of XML to express data not included in their base vocabularies. A number of groups and companies have authored namespace extensions to represent a variety of data. Here’s a look at some of the more popular namespace expressions:

Dublin Core metadata: The Dublin Core namespace might be used to specify an author name, a contributor, or copyrights to an individual feed item. Many Dublin Core elements are better expressed using Atom base elements.
Comments: Comment feeds and counts can be included with a feed item. Slash and Well-Formed Web namespaces are popular additions to RSS while Atom feeds may use Atom Threading Extensions.
Photo, audio, and video: Publishers may add more information about media enclosures using Yahoo! Media RSS or the iTunes podcast namespace. Yahoo! Media RSS lets a publisher describe multiple available data types available, such as MP3 and AAC. The iTunes namespace enhances your listings within the iTunes Store.
Search results: OpenSearch expresses search results and related data for consumption by search aggregators and the built-in search features of Internet Explorer 7 and Firefox 2.
Creative Commons: To declare Creative Commons license data inside a RSS feed. Atom publishers can use rights instead.
Geographical coordinates: Publishers can express latitude and longitude coordinates using the W3C Basic Geo vocabulary. A geotagged set of photos might be syndicated with coordinates or traffic conditions might publish a corresponding location.
Item pricing: Buy.com product module uses a specialized namespace for pricing, thumbnail image, text-only description, and SKU.
Weather conditions: Yahoo! Weather publishes weather forecast data using a specialized namespace. The National Weather Service uses Digital Weather Markup Language.
Forums: Jive Forums namespace covers forum issues such as total post messages and individual threads.
Calendar: Google Calendar namespace is one way of expressing calendar data.
List formatting: Microsoft’s Simple List Extensions define a unique ordering of feed items such as a Top 10 list or upcoming movies in your rental queue.

Avoid confusion of tongues

Given the amount of expression available in both the base and available and widely deployed extended namespace a new feed publisher would be well-suited sticking to these vocabularies where possible. Just as the color value “cyan” may have no value to a color picker with a limited vocabulary of expressions, your expressed data might never be parsed or understood by feed parsers if you become overly inventive.

Most feed parsers don’t actually walk the XML of each feed. They rely on feed parser libraries to handle feed errors, similar markup across different publication formats, and retrieving remote files from your server. A parser such as Universal Feed Parser contains built-in support for over 40 namespaces and attempts to normalize various ways of expressing title, author name, etc. A newly invented namespace is less likely to be supported by these intermediate libraries than existing methods of data definition.

Here’s a sampling of some of the popular feed parsing libraries by programming language:

Windows/C#: Windows RSS Platform
Apple Leopard/Cocoa: Apple Syndication Platform (unreleased)
Python: Universal Feed Parser
PHP: Magpie
Java: Rome
Perl: XML::FeedPP
Ruby: Simple RSS

Check for errors

Once you’ve published your feed you’ll want to check for XML and feed errors. Some parsers are more liberal than others, but a single error could result in users of specific services not receiving your latest updates.

You can check your files for errors with Feed Validator or the W3C Feed Validation Service. You can program web services directly against the W3C interface, or you can download the feed validator code for local use.

Feed marketing

Once you’ve published a feed using well-understood element sets and valid markup you’ll want to be sure the world can find your latest updates. Aggregators and search engines support ping notifications, a quick way of letting a service know they should visit your website and/or feed and discover new updates.

Ping

Most ping servers accept update notifications delivered via XML–RPC and the weblogUpdates.ping method name for website title and website URL and/or weblogUpdates.extendedPing for the same data plus a feed URL. You can send notification updates to a variety of sources for quick inclusion in a search index or feed aggregator. Below are just a few popular ping endpoints serving a general audience:

Google: http://blogsearch.google.com/ping/RPC2
Yahoo!: http://api.my.yahoo.com/RPC2; http://ping.blo.gs/
NewsGator: http://services.newsgator.com/ngws/xmlrpcping.aspx
Bloglines: http://www.bloglines.com/ping
Technorati: http://rpc.technorati.com/rpc/ping
VeriSign: http://rpc.weblogs.com/RPC2

Create new subscriptions

A few search services restrict their index to user feed subscriptions. If you’re not already a user, create a new account and subscribe to your feed, adding notes and tags where appropriate. Be sure to cover popular online aggregators such as My Yahoo!, Google Reader, Bloglines, etc.

These additional actions give your feed a few extra importance points, since at least one user cares enough about the data to subscribe.

Claim your site, claim your feed

Some search services allow a publisher to verify their website and/or feed for more frequent updates, statistics tracking, or highlighted search results listings. You’ll likely have to place a specially issued code within a web page or feed to prove your account has the ability to edit the site you would like to claim. Here are a few search services that offer author claiming:

Local Resources

This blog post is meant to serve as a general overview of the worldwide market for feed publishers. My views are skewed towards blogs published in English inside the United States. If you publish content in other languages or focused on a particular national audience, research the integration opportunities available with those specific services.

Summary

Feed publishing is a pretty busy space! Millions of customers are ready to receive regularly delivered content updates, either through their feed aggregator or through a search engine. Structured data delivered in easily digestible chunks is a good thing.

Feeds can serve many purposes, from lightweight APIs and data interchange formats to news updates. Each use has an intended audience and possible extended audience, and creating well described data in commonly understood data formats will extend your distribution reach and allow the many parsers and feed interfaces already present on the web to begin remixing your data in new ways for custom delivery and interpretation.