Rewriting Digg feeds using Atom 1.0

Digg currently uses RSS 2.0 as a lightweight API, adding their own namespaced elements to explain Digg-specific values within the XML. The current Digg feed reinvents some elements (digg:category???) I feel could be better marked up with existing standards and namespaces. I’ll use Digg’s data in this post to show how some complex data and relationships can be expressed using Atom 1.0.

Simplifying drives adoption

It’s important to express your data inside pre-defined elements and attributes when possible for easy parsing by the many feed libraries used by developers all over the web. PHP developers don’t write their own parsers, they use something like Magpie instead. Python developers might use Universal Feed Parser. Windows developers might use the Windows RSS Platform. Each abstracted view of your feed might hide your proprietary namespaced data or at least make it more difficult for a programmer to access your one-off namespace.

Feed-level Identifier

<id>tag:digg.com,2006:technology</id>

Globally unique identifiers are a good thing. They help aggregators figure out when they have seen a particular resource in the past, and store or display that information accordingly. You can use a URL as your identifier, but URLs do tend to cycle and may not represent the same resource throughout time. The tag URI scheme, RFC 4151, is another way to create an unchanging, globally unique URI as in the example above. See Mark Pilgrim’s How to make a good ID in Atom for more information.

Simple List Extensions

<cf:treatAs>list</cf:treatAs>
<cf:sort ns=”http://digg.com/docs/diggrss” element=”diggCount” label=”Digg Count” data-type=”number” />

Digg’s feed is an ordered list and therefore a good candidate for Microsoft’s Simple List Extensions namespace. The first line excerpted above defines Digg’s feed as a list. The second line defines a sort option that may be rendered in a user interface such as Internet Explorer’s feed view allowing someone to sort by the number of “diggs” received by any one item.

Multiple link relations

A Digg story page is the appropriate HTML link alternate for the feed, but it is possible to provide additional meanings and links for the individual story. The via value signifies the source of information for the entry, which in this case is the URL originally submitted to Digg.

Published vs. Updated

A Digg story is originally published when a user submits information for the first time to Digg’s servers. The story is continually updated as members leave comments and “digg” actions throughout time. The Atom 1.0 specification defines updated as “modified in a way the publisher considers significant” which in this case could mean new comments, new diggs, or significantly buried by the user base.

Comment information

The Atom Threading Extensions help publishers define information about comment counts and the location of comments about the entry, among other things. The example above defines where users can read comments about the entry, the number of comments available at last update, and when the last comment was submitted for a given story.

Citing the source

I defined the Digg submitter using the source element including username, profile picture, profile web page, a feed of all submissions by that user, and his last submission.

Conclusion

There are many ways to express data and take advantage of deployed feed aggregators in the market today. The Atom 1.0 IETF standard is about 9 months old and introduces new ways of describing data able to be understood by a widely distributed number of feed parsers and interpreters. Digg is just one example of translating data described in a format such as HTML into easily digestible individual entries in Atom.