Gathering and distributing search results as RSS

I often receive questions from people about how search engines gather and distribute subscribed search results such as the results provided by Technorati, Feedster, Blogpulse, PubSub, and MSN or Yahoo! Search. It’s worth a brief explanation. Caveat lector, I work for Technorati but I will try to deliver an impartial view of the world of ping and search while keeping the post relatively brief.

All of the aforementioned sites output search results in the RSS 2.0 feed format. Your feed results will differ between services based on what is indexed, how frequently it is indexed, and what content is made available to each indexer.

Notification

Every site but MSN Search hosts its own ping beacon, a server location site authors can notify of new or changed content to prompt a fresh crawl. There are also publicly available change files for independent ping beacons such as Weblogs.com or hosted blog services such as Blogger or LiveJournal

Short definitions

Technorati
Blog focused.
Indexes HTML, RSS, and Atom.
XMLRPC ping beacon at rpc.technorati.com/rpc/ping.
Ping submittal via a web form at technorati.com/ping.html.
Feedster
Feed focused.
Indexes RSS and Atom.
XMLRPC ping beacon at api.feedster.com/ping.php.
Blogpulse
Blog focused.
Indexes HTML, RSS, and Atom.
No ping beacon.
One-time blog submittal via a web form.
PubSub
Feed focused for future events.
Indexes RSS and Atom.
XMLRPC ping beacon at xping.pubsub.com/ping/.
MSN Search
Indexes anything its crawlers can find and interpret.
No ping beacon.
Yahoo! Search
Indexes anything its crawlers can find and interpret.
XMLRPC ping beacon at api.my.yahoo.com/rss/ping.

Search result feed

So why might your results differ between services? Let’s take a look at a RSS search feed for “Mark Felt” — recently revealed as Deep Throat and a hot news topic — on each service.

Technorati, Feedster, Blogpulse, and PubSub output search results in reverse chronological order: last in, first out. MSN and Yahoo! apply their ranking algorithms to your search query and return the results of whatever happens inside their black box. PubSub will start monitoring its data stream for matches to your search once you create a request. You are unable to find out what was said about Mark Felt yesterday. Blogpulse indexes a feed once per day and your feed publication date is measured in days, not minutes.

Wrap-up

Now you have the inputs and outputs of a variety of search services. Maybe in the future I will dive into what happens in the middle.

Tags: , , , , ,

3 comments

Commentary on "Gathering and distributing search results as RSS":

  1. Jeff Clavier on wrote:

    Couple of comments:
    – My understanding is that MSN Search gets its blog search functionality from Moreover Technologies, which has a ping server.
    – Ping-o-Matic ought to be used to notify all ping beacons, with FeedMesh offering the long term solution (?).

  2. Niall Kennedy on wrote:

    Moreover provides RSS content for My MSN, Microsoft’s portal site.

  3. Natalie Glance on wrote:

    Niall,
    Thanks for the useful side-by-side comparison. A couple of corrections concerning BlogPulse (I work for Intelliseek, the company behind BlogPulse).
    (1) We are blog-focused and index HTML, Atom and RSS. (2) Publication date for a post is the actual publication date associated with the post, either according to the feed or the blog HTML, depending.