Many current and future feed publishers create content targeted at individuals for personal use and are not meant for widespread consumption. You may have a customized feed from Netflix, FeedBurner, or WordPress.com to track your movie queue, subscriber count, or blog stats respectively. Some feeds offer privacy through obfuscated URLs and others are just a one-time token exchange at the time of subscription. Given the current merged back-ends of online search aggregators with search and other methods of open discovery, how can a feed publisher opt-out of a public index?
One solution using existing element sets may be to overload the
category element in RSS and Atom 1.0. Using the domain/scheme attribute it is possible to indicate the type of data communicated at either a feed or individual item level.
- <category domain=”http://www.robotstxt.org/wc/meta-user.html”>noindex</category>
- <category term=”noindex” scheme=”http://www.robotstxt.org/wc/meta-user.html” />
scheme attribute values communicate “categorization” according to the Atom and RSS 2.0 specifications and this use case seems within that specified use. Multiple values can be specified using multiple
A subscription agent could also check the domain’s robots.txt and the meta robots value of the feed’s alternate HTML for a more complete picture. Some aggregators take the position that since a feed is requested by a user and not a spider it should not need to check these extra locations. Adding robot exclusion to the feed itself seems like the most reliable way to operate.
What do you think?