I have been reading through some of the posts about Google Blog Search and have some new thoughts on possible infrastructure although nothing has been officially stated by Google.
I’ve read about how fast Google’s results come back. I would hope so, their entire index covers only about 90 days.
Powered by Google Fusion?
We already know that Google Blog Search is indexing only feeds, and the index does appear to separate from the main Google index. We also know that Google’s feed search index only contains posts since June 2005. We also know that Google plans to add a form for inputting feeds in the future.
Pictured above is the a form field available on Google Fusion, Google’s personalized homepage and feed aggregator. This service launched in July, so data back to June is certainly a good possibility. Given all of the information we do know, it appears Google Blog Search is based on the same set of data used by Google’s feed reader.
Update: Google just posted a page with information about FeedFetcher, the feed retrieval robot for Google Fusion. FeedFetcher disobeys robots.txt and other things that are different than Google’s claims for it’s Blog Search product, so perhaps I am wrong.