Recent Innovations in Search presentation

On Tuesday, April 12, I attended BayCHI’s monthly program at PARC in Palo Alto. Peter Norvig of Google, Ken Norton of Yahoo!, Mark Fletcher of Ask Jeeves, Udi Manber of A9, and Jakob Nielsen of Nielsen Norman Group participated in a panel discussion of “Recent Innovations in Search and Other Ways of Finding Information.”

The auditorium was packed. Every seat was filled and many attendees sat in the aisle or in front of the first row of seating. Closed-circuit televisions were setup in the lobby to accommodate the overflow.

Each panelists presented their organization’s unique view on usability and search before joining the discussion panel.

Peter Norvig, Google

Peter mentions that users will sometimes fax a copy of the Google home page with a written query inside the search box. These are obviously confused about how to get started, or maybe just playing a practical joke.

Google recently launched Google Answers: a database of facts extracted from their crawls of the Web. Not all sites have the same answer to the same query, but Google aggregates the answers and displays the majority opinion at the top of its search results.

Ken Norton, Yahoo!

Yahoo! launched its search technology in February 2004 and has been rapidly adding new features over the past 14 months. Yahoo! Search’s vision is “to enable people to find, use, share and expand all human knowledge.”

Y!Q brings Yahoo! search directly to the page so users may conduct a search at the point of inspiration. Flickr, Yahoo! 360, and My Yahoo! demonstrate the direction the company is moving in the social media space.

Mark Fletcher, Ask Jeeves

Mark started Bloglines because he had over 100 sites bookmarked and it was taking him too much time to keep up with every site on the list. Bloglines’ main goal is to allow users to search, subscribe, share, and publish online content. Bloglines currently adds 1.6 million articles to its database daily and is about to pass 400 million stored articles.

Future Bloglines search features include the ability to search over a defined set such as a group of friends and to easily search and filter articles.

Udi Manber, A9

Udi Manber showed off A9 Yellow Pages. A9 used GPS technology and digital cameras to capture over 28 million images of storefronts across the United States. This visual browsing technique allows users to easily search by proximity to a known landmark. A9 captured the images using specialized equipment placed inside rental SUVs.

A9 SUV

Amazon’s IT department would not allow Udi’s team to use an unlocked laptop in the field, so A9 devised a specialized mouse to prevent the laptop computer in the passenger seat from going to sleep after a period of input inactivity.

Jakob Nielsen, Nielsen Norman Group

Jakob presented statistics of search activity over time. The mean length of query strings has increased from 1.3 words in 1.3 words in 1994 to 2.2 words in 2004. 42% of search users surveyed felt they had found their desired result. Jakob commented that “search on websites are a miserable failure” and “a disgrace for the field.”

Jakob showed a screen capture with audio of one woman’s search experience on AOL while searching for headache cures. The audience had a good laugh as the woman could not locate the proper text box for her search or navigate the search results. When she finally reached a destination website she almost clicked an unrelated skyscraper-style advertisement along the side of the page, thinking it might have an answer.

Usability problems pie chart

Panel discussion

The first question posed to the panel was about the evolution of search queries. Mark Fletcher commented that Ask Jeeves has seen query length decrease, possibly because users expect relevant information with less input. Udi Manber mentioned that research shows the length of a query is directly related to the size of the search input box. Jakob Nielsen commented that the average search input box is 18 characters wide but 90% of queries need at least 27 characters.

The second question was about tags as a new search interface. Ken Norton mentioned that tags are not much different than Yahoo!’s current analysis and weighting of anchor text. Jakob commented that it is less work to make a tag than to make a weblog entry.

During the audience question and answer session there was a question about search engine abuse and web spam. Peter Norvig of Google said Google is aware of bad actors and they take steps to identify the bad actors. Web spam has centralized targets making it an easier problem to solve than e-mail spam. Spam is also expensive to do well. “If we keep catching them, penalizing, and setting them back to square one we will demotivate and they will disappear.”

John Battelle starts FM Publishing

John Battelle recently announced his new company named FM Publishing to act as a publisher for more authors in similar ways to his role as band manager at Boing Boing.

I plan to partner with site authors, acting as a platform which provides important services to them – revenue (in the form of advertising), back end support, and the like. In essence, FM will act as a publisher to sites which need and want a publisher.

John’s move helps validate the world of weblog authors and provide a way for authors to reach a wider audience and make some money in the process while still focusing on what they love: their writing. John, I wish you all the best and will try to help send some good people your way.

Nine Inch Nails releases new single as GarageBand file

Earlier tonight I downloaded the latest Nine Inch Nails single as a GarageBand file after reading a short post from Trent Reznor on the Nine Inch Nails news section (weblog?).

What I’m giving you in this file is the actual multi-track audio session for “the hand that feeds” in GarageBand format. This is the entire thing bounced over from the actual Pro Tools session we recorded it into. I imported and converted the tracks into AppleLoop format so the size would be reasonable and the tempo flexible.

It is interesting to listen to the different components of the track individually. Ambience fills the entire track giving it a bit of a scratchy sound.

I had some fun and created some remixes. Trent has talked about wanting to tour with a string quartet in the past, so I went isolated lead vocals and added a small string section.

The included license from Interscope Records references compact discs and is definitely confusing to a fan like me who just wants to play around and share a derivative work with the world. Statement 4 of the license seems to prohibit sharing the work I created with all of you, so it will just stay on my personal devices for now.

This license expressly forbids resale, relicensing or other distribution of any of these sounds, either as they exist upon downloading, or any modification thereof.

The above legal statement seems in stark contrast to Trent wanting to “see what comes of it” and have his fans create remixes and experiment.

Trent worked for id Software and created the sound effects for Quake. The NIИ logo even appeared on the nailgun ammo boxes. Trent also worked as sound engineer on Doom 3 for a while but took off early and his work was never released in the final product. The license accompanying the GarageBand file expressly prohibits remixing the track into video games, a bit odd given the history.

I am really excited I get to play around with a Nine Inch Nails track at such a detailed level. It would be good if the legal text allowed fans to feel comfortable swapping remixes with each other with no commercial intent.

Movable Type 3.16 coming Monday

Six Apart plans to release version 3.16 of Movable Type this Monday, April 18. The new release includes over 100 bug fixes and improvements including significant security fixes making this a must-have for all Movable Type installations.

Some highlights from the private changelog:

  • Added support for Creative Commons 2.0 licenses.
  • Improved sanitization of user-submitted HTML (e.g. comments).
  • DateTime perl module is no longer required.
  • New “DebugMode” configuration parameter can enable/disable unsightly warning messages. (Defaults to off).
  • Subcategories are displayed hierarchically in the administrative interface.
  • MTCategoryCount no longer includes draft entries.
  • MTEntryAuthorLink and MTCommentAuthorLink default behavior is to not display commenter’s e-mail address
  • TrackBack discovery is now more forgiving when domain name is mismatched.
  • Easier dynamic publishing setup.
  • post_save callbacks now have access to the original object as well as the object which was saved.

Who-hoo! Some of the things I have tweaked in my install made it into the core.

Social bookmarks article in D-Lib magazine

Tony Hammond, Timo Hannay, Ben Lund, and Joanna Scott of Nature Publishing Group contributed a long a detailed look at social bookmarking tools in the April 2005 issue of D-Lib Magazine. The article takes a look at the history of bookmarks and reviews nine social bookmarking tools.

The authors, Nature Publishing Group, created social bookmarks tool Connotea which was released under GPL today.

D-Lib Magazine is sponsored by DARPA.

New York Times photo shoot

Today I was photographed for a New York Times article on the tension between employers and employees over weblogs. I was told after a phone interview yesterday that if the article is accepted by the paper it will run in the business section next Monday, April 18. I will be checking my feed aggregator on Sunday night to see what direction the paper decided to take with the story.

The photo shoot was fun. We walked around about a two block radius of the Technorati office shooting pictures for about an hour in Ritch Alley and around SBC Park.

To all the photo geeks out there: the photographer used a Nikon D100 with a wide-angle fixed lens for most shots.

Corporate blogging policies

The ease of use of weblogs to publish content to a worldwide audience does not create a new problem of corporate communication. The modern tools that power weblogs lower the barrier to entry and engage a wider population with less effort than other media. In the past ten years corporations have had to adapt as e-mail, chat rooms, message boards, and instant messaging entered the communications realm. The rise of weblogs and the transparent communication weblogs provide happened to coincide with the communications restrictions of the Sarbanes-Oxley Act. Millions of people with full-time jobs write e-mails, send instant messages, and post to weblogs every day and each activity has the potential to negatively affect an employer.

Most employers have confidentiality agreements protecting trade secrets. If an employee discloses trade secrets such as sales numbers or future product direction it’s grounds for firing and legal action whether or not the activity took place at a keyboard or a cocktail party.

Many employers also have Internet policies informing employees their words and actions online are logged, outsiders can trace the actions, and remind employees not send chain letters or e-mail racy jokes around work.

Could you imagine a company banning an employee from having an e-mail or instant message account for use at home? Fear of blogging is simply fear of something new but once the paranoia dies down corporations will see that it is time to meet the new medium.

Employers considering a blogging policy are not thinking broadly enough. Employees need to be trained as better, more effective communicators, and any blogging policy is really just an extension of an existing corporate communications policy with the intention of educating everyone similarly to how an executive may receive training to deal with press and the public. Treat your employees as an important yet independent public voice and prepare to reap the rewards.

The world of weblogs we know today is only the beginning. Cameraphones are entering the workplace. File transfers and uploads to Internet sharing is becoming more common. Individuals are being empowered with more tools to create new methods of flexible communication. The wave of change is unavoidable and it is time for employers to positively participate.

Perseus blog study estimates 31.6 million hosted weblogs

Perseus Development Corporation surveyed ten thousand weblogs on twenty leading weblog hosting services. They conclude 31.6 million weblogs have been created on these top hosted services including 10 million weblogs created in the first three months of 2005.

The numbers

Perseus published some subscriber number estimates as of March 31, 2005 from the major services.

  • 8 million Blogger accounts use Blog*Spot hosting.
  • 6.6 million LiveJournal accounts.
    • 51,900 LiveJournal accounts with syndicated feeds (a free feature).
  • 4.5 million MSN Spaces accounts.
  • 211,500 Bolt accounts blogging.
  • Over 564,000 MySpace accounts blogging.

This survey has a confidence interval of 0.98% for a 95% confidence level.

Interesting numbers. Perseus has a blog survey weblog to respond to questions.

How do you extrapolate a sample of ten thousand to numbers as large as 8 million with such a small confidence interval? There must be more to the study than has been published. I have seen the MSN Spaces and LiveJournal numbers before but the Blog*Spot estimate is new to me.

Contagious Media Showdown

Eyebeam has kicked off the first contagious media showdown to track the hottest meme online between May 19 and June 9. There is over $5000 up for grabs including $1000 based on traffic measurement from Alexa and Technorati, unique visitors measured by the media server, and the top content licensed under a Creative Commons Attribution-ShareAlike license.

It seems a bit odd to restrict the Creative Commons prize to a specific license instead of a level of openness or better. Overall a very cool idea!

Seems like a search engine optimization play taking advantage of the energy of the blogosphere to hype your ego. “Your entry can have its own domain name if it is mapped to the contagious server.” You now have a month to come up with ways to participate without contributing to the site’s search juice.

SpamLookup testing

Over the weekend I installed Brad Choate’s SpamLookup Movable Type plugin after rave reviews. So far I have had mixed results.

SpamLookup discards TrackBack pings from TypePad and other weblog authoring tools by default as the server IP address may not match the IP address of the authoring tool. In the case of TypePad, a recent ping came from 66.151.149.25 and not the expected TypePad domain IP address of 66.151.149.10.

It’s more than a bit comical that Six Apart code discards a post from a Six Apart property by default. (not sure what intellectual property rights Six Apart provides employees like Brad but Brad also codes the Movable Type core)

If you have attempted to send a TrackBack and failed, sorry about that, but I am living on the bleeding edge to keep the bad pings away. TrackBacks can not be moderated in the current build of MTLookup so your pings may be discarded. I see your pings in my logs and I know how to add your ping manually when I find a moment.

Update: Brad mentions in the comments that moderation of TrackBacks is currently supported if MT-Moderate is installed.