Correcting Kottke

Popular blogger Jason Kottke recently posted an entry criticizing blog search companies for the incompleteness of their results compared to his internal search tool powered by Movable Type. I happen to know both Movable Type and blog search pretty well, so I decided to dig into the data and see where search engines might have missed the mark in the interest of improving quality. I found that Jason’s criticisms where a bit unfounded yet still may alter the perceptions of many people who are heavily influenced by what they read on his blog.

Jason found more results searching his installation of Movable Type 3.15 than he was able to find using many search engines. I manually checked every page on Jason Kottke’s Movable Type install for mention of the word “Freakanomics” and found some disconnects between what was presented to Jason in his Movable Type search results page and what is presented to the world at large, including search engines.

Jason’s installation of Movable Type is located at Yoink.org. I searched all blogs on his Movable Type installation for “Freakonomics” over the past 6 months Update: Jason has since deactivated public search. I chose 6 months because has only been indexing feeds since June and I wanted a good base for comparison.

Movable Type returned Jason’s most recent blog post as well as 9 posts from his link blog.

  1. The economics of sex… posted on December 12. The term “Freakonomics” appears nowhere in the entire source code of the page.
  2. Profile by Michael Lewis of Mike Leach posted on December 7. There is a link to freakonomics.com near the end of the post but the word “Freakonomics” appears nowhere in the post text.
  3. A pair of Boston economists… posted on December 5. “Freakonomics” appears nowhere in the entire source code of the page.
  4. …People who don’t clean up after their dogs.. posted on October 7. “Freakonomics” appears no where in the entire source code of the page.
  5. Unique Planned Parenthood pledge drive posted on September 19. There is a link to freakonomics.com at the end of the post but the word “Freakonomics” appears nowhere in the post text.
  6. Oakland A’s are rolling posted on August 16. There is a link to freakonomics.com near the end of the post but the word “Freakonomics” appears nowhere in the text of the post.
  7. Crime fell because of rap music posted on August 9. There is a link to freakonomics.com in the post but the word “Freakonomics” appears nowhere in the text of the post.
  8. Where did all the crack go posted on August 8. There is a link to freakonomics.com at the beginning of the post but the word “Freakonomics” appears nowhere in the text of the post.
  9. Economics of poker written on July 18. The word “Freakonomics” appears nowhere in the entire source code of the page.

4 out of the 9 posts surfaced by Movable Type’s search functionality contained no mention of “Freakonomics” anywhere in the outputted post. The word “Freakonomics” may occur somewhere in a field not outputted to the final page such as keywords, excerpt, extended entry, or something else, but there is no content that anyone could expect a search engine to match for the desired query. Jay Allen wrote the search engine built-in to Movable Type and I’m sure he could answer any questions about your individual install.

5 out of the 9 posts contain a link URL partially represented by the search term. A search engine could pull out “freakonomics” from the URL if it chooses and a query term contained in a URL is one factor used to rank queries in large search engines such as Google. Technorati tries to optimize its various search indexes available to user queries by limiting search possibilities. If you are searching for a link a query analyzer should only look through a list of available links and not keywords. If you are looking for a keyword a query analyzer should throw away any link data and search only against the words in a post.

I am not sure where The New York Times sourced its data but it didn’t come through me.

If you have any questions about “what they are telling us is actually true” and would like some answers for your own posts or research you can contact me to find out more about how search works. I’m a big fan of researched blog posts and adding more original and thoughtful content into the world.

Update 12/22: Jason updated his post based on this new information. I e-mailed him last night with a link to the post, an alert that his search interface was showing, and inviting further conversation. He wishes I had just stuck to an e-mail instead of a full post, but I don’t see it as “airing dirty laundry.” Thousands of people would read his post while he was asleep and I had a chance to TrackBack and provide some extra information for people viewing the web page and believing all search engines suck.

Tags: