Feed publishing best practices

Web feed syndication is made up of two base vocabularies: RSS 2.0 and the Atom Syndication Format. These base vocabularies are extended using namespaces to create a common set of expressions for your web feed data. In this post I’ll walk through some best practices for publishers syndicating their data via web feeds.

Should I use RSS or Atom?

The RSS 2.0 syndication format has been around for about four years and over that time it has been used by web publishers large and small to represent their data for syndication. The New York Times publishes its top stories via RSS to deliver updates to readers with appropriate viewing software. NPR distributes audio attachments commonly referred to as “podcasts” using RSS enclosures to iTunes and other specialized subscription programs.

The Atom Syndication Format was released in December 2005 under the standardization process of the Internet Engineering Task Force (IETF). A few popular uses include Google GData for API responses, FeedBurner resyndication, and Six Apart blogging products.

Choosing RSS or Atom for feed syndication is a bit like selecting GIF or JPEG as your image format: publishers have preferences for the best representation of the original data but most renderers support both. There are a few easy answers however. If you syndicate audio or video in your feed, RSS offers more reliable compatibility across deployed players. If you would like to use your feed as a lightweight API or present data for government consumption, Atom should be your format of choice.

Extended vocabularies

RSS and Atom take advantage of XML to express data not included in their base vocabularies. A number of groups and companies have authored namespace extensions to represent a variety of data. Here’s a look at some of the more popular namespace expressions:

Dublin Core metadata
The Dublin Core namespace might be used to specify an author name, a contributor, or copyrights to an individual feed item. Many Dublin Core elements are better expressed using Atom base elements.
Comments
Comment feeds and counts can be included with a feed item. Slash and Well-Formed Web namespaces are popular additions to RSS while Atom feeds may use Atom Threading Extensions.
Photo, audio, and video
Publishers may add more information about media enclosures using Yahoo! Media RSS or the iTunes podcast namespace. Yahoo! Media RSS lets a publisher describe multiple available data types available, such as MP3 and AAC. The iTunes namespace enhances your listings within the iTunes Store.
Search results
OpenSearch expresses search results and related data for consumption by search aggregators and the built-in search features of Internet Explorer 7 and Firefox 2.
Creative Commons
To declare Creative Commons license data inside a RSS feed. Atom publishers can use rights instead.
Geographical coordinates
Publishers can express latitude and longitude coordinates using the W3C Basic Geo vocabulary. A geotagged set of photos might be syndicated with coordinates or traffic conditions might publish a corresponding location.
Item pricing
Buy.com product module uses a specialized namespace for pricing, thumbnail image, text-only description, and SKU.
Weather conditions
Yahoo! Weather publishes weather forecast data using a specialized namespace. The National Weather Service uses Digital Weather Markup Language.
Forums
Jive Forums namespace covers forum issues such as total post messages and individual threads.
Calendar
Google Calendar namespace is one way of expressing calendar data.
List formatting
Microsoft’s Simple List Extensions define a unique ordering of feed items such as a Top 10 list or upcoming movies in your rental queue.

Avoid confusion of tongues

Paul Gustave Dore Confusion of Tongues

Given the amount of expression available in both the base and available and widely deployed extended namespace a new feed publisher would be well-suited sticking to these vocabularies where possible. Just as the color value “cyan” may have no value to a color picker with a limited vocabulary of expressions, your expressed data might never be parsed or understood by feed parsers if you become overly inventive.

Most feed parsers don’t actually walk the XML of each feed. They rely on feed parser libraries to handle feed errors, similar markup across different publication formats, and retrieving remote files from your server. A parser such as Universal Feed Parser contains built-in support for over 40 namespaces and attempts to normalize various ways of expressing title, author name, etc. A newly invented namespace is less likely to be supported by these intermediate libraries than existing methods of data definition.

Here’s a sampling of some of the popular feed parsing libraries by programming language:

Windows/C#
Windows RSS Platform
Apple Leopard/Cocoa
Apple Syndication Platform (unreleased)
Python
Universal Feed Parser
PHP
Magpie
Java
Rome
Perl
XML::FeedPP
Ruby
Simple RSS

Check for errors

Once you’ve published your feed you’ll want to check for XML and feed errors. Some parsers are more liberal than others, but a single error could result in users of specific services not receiving your latest updates.

You can check your files for errors with Feed Validator or the W3C Feed Validation Service. You can program web services directly against the W3C interface, or you can download the feed validator code for local use.

Feed marketing

Once you’ve published a feed using well-understood element sets and valid markup you’ll want to be sure the world can find your latest updates. Aggregators and search engines support ping notifications, a quick way of letting a service know they should visit your website and/or feed and discover new updates.

Ping

Most ping servers accept update notifications delivered via XML-RPC and the weblogUpdates.ping method name for website title and website URL and/or weblogUpdates.extendedPing for the same data plus a feed URL. You can send notification updates to a variety of sources for quick inclusion in a search index or feed aggregator. Below are just a few popular ping endpoints serving a general audience:

Google
http://blogsearch.google.com/ping/RPC2
Yahoo!
http://api.my.yahoo.com/RPC2
http://ping.blo.gs/
NewsGator
http://services.newsgator.com/ngws/xmlrpcping.aspx
Bloglines
http://www.bloglines.com/ping
Technorati
http://rpc.technorati.com/rpc/ping
VeriSign
http://rpc.weblogs.com/RPC2

Create new subscriptions

A few search services restrict their index to user feed subscriptions. If you’re not already a user, create a new account and subscribe to your feed, adding notes and tags where appropriate. Be sure to cover popular online aggregators such as My Yahoo!, Google Reader, Bloglines, etc.

These additional actions give your feed a few extra importance points, since at least one user cares enough about the data to subscribe.

Claim your site, claim your feed

Some search services allow a publisher to verify their website and/or feed for more frequent updates, statistics tracking, or highlighted search results listings. You’ll likely have to place a specially issued code within a web page or feed to prove your account has the ability to edit the site you would like to claim. Here are a few search services that offer author claiming:

Local Resources

This blog post is meant to serve as a general overview of the worldwide market for feed publishers. My views are skewed towards blogs published in English inside the United States. If you publish content in other languages or focused on a particular national audience, research the integration opportunities available with those specific services.

Summary

Feed publishing is a pretty busy space! Millions of customers are ready to receive regularly delivered content updates, either through their feed aggregator or through a search engine. Structured data delivered in easily digestible chunks is a good thing.

Feeds can serve many purposes, from lightweight APIs and data interchange formats to news updates. Each use has an intended audience and possible extended audience, and creating well described data in commonly understood data formats will extend your distribution reach and allow the many parsers and feed interfaces already present on the web to begin remixing your data in new ways for custom delivery and interpretation.

  • Posted
  • Updated at
  • Comments [5]

5 comments

Commentary on "Feed publishing best practices":

  1. nate on wrote:

    Personally, for PHP, I use and recommend SimplePie over Magpie any day.

  2. Jeffrey McManus on wrote:

    Excellent resource, thanks for posting Niall.

  3. Dominic Jones on wrote:

    This is most useful and clear. Except I don’t quite understand why this is so:

    > …present data for government consumption, Atom should be your format of choice.

    Why? Because it has been through a standardization process?

  4. Niall Kennedy on wrote:

    Dominic,

    Yes, the standardization process of the Atom Syndication Format is important to governments and data interchange. When two countries in the EU supply data on common initiatives a central body needs to reliably parse and present that data.

    There are a few documents out there from government entities discussing data interchange requirements and recommendations if it’s an area of interest.

  5. Pascal Van Hecke on wrote:

    Hi,

    just for the sake of completeness: you can claim your feeds at Bloglines as well

    (Home – my account – publisher tools)

    This is especially useful if you have a lot of versions of your feed around (WordPress can have up to 6 urls for the feed, 2 for each flavor), and/or want to move your feed to another location (like FeedBurner) and make sure Bloglines gets the redirect.