Recently in Search Category

Search and discovery across large data sets.

  1. Aug28

    Internet Explorer 8 Search Suggestions

    Microsoft released a second beta of its upcoming Internet Explorer 8 browser yesterday afternoon. The new browser will reach full release by the end of the year, changing the way most Windows users view the Web. There are many new features of IE 8 for web developers, including completely new ways to light up the browser chrome. Microsoft has extended the OpenSearch protocol with a new search suggestions data formats expressed XML or JSON. The new format will display real-time search results, summaries, images, and even search result classifications inside the browser chrome for any site owner supplying the appropriate format. In this post I'll teach you how to add search suggestions to your OpenSearch description document for instant search suggestions in IE8.

    1. Search Suggestions
    2. Suggestions Format
      1. Quick element definitions
    3. Summary
    IE8 instant search Wikipedia

    Internet Explorer's Instant Search provides suggestions based on text already entered into the search box. The functionality is very similar to Google Suggest and its JavaScript feed expanded for multiple categories, graphics, and short descriptions.

    The new format covers search suggestions and not necessarily search results. Syndicated search results should continue to be exposed as a Url element of attribute type of Atom or RSS.

    Search suggestion data files are exposed through a new MIME type referenced in a OpenSearch descriptor's Url element: application/x-suggestions+xml.

    <Url type="application/x-suggestions+xml" template="http://example.org/suggest?q={searchTerms}" />

    Suggestions URLs should follow the same fill-in-the-blank parameters defined by OpenSearch such as result count or language scope.

    Suggestions Format

    How do you make your site light up with pretty pictures in the search box? Just add XML descriptors of possible intended searches.

    <?xml version="1.0"?>
    <SearchSuggestion>
      <Query>ajax</Query>
      <Section>
        <Separator title="Web Development"/>
        <Item>
          <Text>AJAX Developer Center</Text>
          <Description>Asynchronous JavaScript and XML</Description>
          <Url>http://developer.mozilla.org/En/AJAX</Url>
          <Image source="http://example.org/ajax.jpg" alt="AJAX Web Development"
                 height="50" width="50" align="middle"/>
        </Item>
        <Separator title="Soccer"/>
        <Item>
          <Text>AFC Ajax</Text>
          <Description>Amsterdamsche Football Club Ajax</Description>
          <Url>http://english.ajax.nl/</Url>
          <Image source="http://example.org/afcajax.jpg" alt="AFC Ajax"
                 height="50" width="50" align="middle"/>
        </Item>
        ...
      </Section>
    </SearchSuggestion>
    

    In the example above I provided search suggestions broken into two sections with Separators: web development and soccer. Each section has a list of Items defining title text, a short summary, and a related image.

    Internet Explorer 8 also supports results in JSON format if you prefer.

    Quick element definitions

    Separator
    A distinct grouping of your search result set. Correlates well with site categories.
    Item
    An individual result wrapper.
    Text
    Your result title. Internet Explorer will highlight text in your result title matching the text entered in the search box.
    Description
    A short summary of the search result. Similar to a HTML meta description or search result snippets.
    Image
    An image you would like to display alongside the search result. Internet Explorer 8 will pass along height and widths in the drop-down configuration -- max width, row height, and section height -- if you setup a few extra parameters in your OpenSearch description. The image should be relatively small to fit alongside a search result title and description (~75px).

    Summary

    Internet Explorer 8 opens up more of the browser chrome to user search customizations including instant search suggestions. Webmasters can enhance their search results for supporting web browsers with suggested terms or results served directly within prime browser real estate. Images, highlighted text, and short summary will help your results stand out and should drive increased search usage by loyal customers.

    Google crawls web forms and I expect their search team will look for signals described on the page such as OpenSearch or site suggest to make an educated guess about the best ways to probe your site for deep results. Adding search suggest markups to your site could also help machines discover deeper content within your site among popular keywords.

    Internet Explorer 8 just released beta 2 and these features are not frozen. Search suggestion XML are a good feature to track for sites with deep content and engaged users.

  2. Mar08

    AskCity lets you draw your search area

    Ask.com's local search product AskCity launched new tools last night that allow searchers to define the scope of a search by drawing on a map. Ask has always been focused on liberal search queries you might ask a concierge and this new search feature again puts the user in charge while abstracting some complexities of local search.

    AskCity search drawing tools

    In my example search above I searched for coffee near The Palace Hotel, a popular web conference spot. A search for the hotel put a marker in the middle of Market Street, but it didn't matter in the case of my search. I drew a circle with about a one city block radius, a reasonable walking distance for a meeting or general break from the conference action. AskCity plotted a few options on the map and opened a new pane showing search summaries from IAC partner CitySearch.

    A typical local search asks the user for a ZIP code or street address as a center point and searches for nearby points of interest. Users can limit their search via drop-down menu selections with choices including things like 1, 3, or 5 miles from the specified center. If you're on the edge of a ZIP code boundary or would like to limit your search for after-dinner drinks to the two blocks around your restaurant you're often out of luck. AskCity's drawing tools let users draw their search boundaries, constructing longitude and latitude ranges while the user applies a brush stroke.

    Despite its unique features such as walking directions and search boundary drawing tools AskCity continues to frustrate me with other UI elements that just get in my way. The map would adjust its center and its zoom level after every shape-based search, leading to extra work trying to pan and zoom back to my search area. It wasn't immediately obvious how to make the search result pane go away -- you maximize the map pane to minimize the search results -- and the search listings and reviews displayed seemed to be out of date, not listing a Peet's that opened last year, and a restaurant I first visited about a year ago was listed as "soon to be open."

    I'm a fan of AskCity's willingness to experiment and rethink approaches to local search based on our actual behavior. I just want my experience with the whole product to be as productive as my interactions with the single mapping pane.

  3. Jan02

    Search is not a zero cost switch

    New search startups come and go, but the ever-present meme seems to be the "zero switching cost" between an established competitor and a newcomer. In the old days this was certainly true, as any user could simply update your browser bookmarks or homepage, replacing WebCrawler with Excite or AltaVista. As the web grew search became an integrated component of large websites, networks, and the desktop software powering the entire experience.

    Google is spending billions to integrate its search products into the Apple operating system, new Dell PCs, MySpace, Firefox, and more. Google commands about a 50% share of the U.S. toolbar search market according to comScore. Google powers search on sites with lots of pages such as newspaper, university, and personal websites. A developer platform further diversifies these sources of traffic, turning the long tail of search origination into site revenue.

    If a new search engine comes online they not only have to launch a compelling destination site and service, they will also have to unseat the entrenched sources of traffic (and revenue) spread across the entire technology landscape. Google gained existing user traffic not just from Excite and AltaVsita, but also the Thunderstone site searchers, Yahoo! toolbar users, and anyone who bucked the trend of the default pre-installed homepage and tools.

  4. Dec23

    Wikiasari: Wikipedia success applied to social search?

    Wikia Search icon

    Wikia will release a new search engine early next year according to an interview with Jimmy Wales in today's Times of London. The new Wikia search engine project is named Wikiasari and will apply wisdom of the crowds features to search engine results, letting individual users rank sources of information and their relevancy to a particular query.

    Of course the article takes a "gunning for Google" angle, citing the PageRank algorithm used since Google's was founded in 1998. Search engines grow over time, and incorporate multiple ranking factors beyond the math of inbound links and source authority. Google (synonym for big search engine for simplicity's sake) can assign domains of trust to highlight trusted content beyond their PageRank calculation. The Mayo Clinic might be a trusted source of health news. Government domains could be an authority for government searches. External links found in Wikipedia could carry additional weights as curated sources.

    Google also incorporates click through rates into its advertising algorithms, relying on the preferences of a crowd to select the most relevant result. Google presents a searcher with a title, contextual summary, and domain name to help him or her select the result best matching their query. Click-through tracking can be grouped on a personal level (search results you previously visited), geographic level (popular in San Francisco), network-specific (other people on your corporate network liked these results), within an affinity group (search originates at Sierra Club or through a Google Co-op group), and much much more.

    Wikia and its investor Amazon may have an edge incorporating a user's purchase history, news preferences, and other profiling data into each search. You could place a set of eyes on the billions of web pages currently in existence, hoping that new stem cell review center achieves appropriate annotations for discovery, but I'm skeptical. Sites such as Google, Yahoo!, and Windows Live already have the crowds clicking on search results every day, submitting bookmarks, and, in some cases, flagging spam. Wikia would need a critical mass of users to maintain a useful search index and query analyzer to supply Britney Spears' fans, medical research, and the many many other search queries submitted every day. The same same search engine pickpockets wandering through Google's search index will continue to target any significant source of traffic and unlike Wikipedia, you can't just lock down a contested (or heavily profitable) area and still maintain balance.

  5. Nov27

    Feed publishing best practices

    Web feed syndication is made up of two base vocabularies: RSS 2.0 and the Atom Syndication Format. These base vocabularies are extended using namespaces to create a common set of expressions for your web feed data. In this post I'll walk through some best practices for publishers syndicating their data via web feeds.

    Should I use RSS or Atom?

    The RSS 2.0 syndication format has been around for about four years and over that time it has been used by web publishers large and small to represent their data for syndication. The New York Times publishes its top stories via RSS to deliver updates to readers with appropriate viewing software. NPR distributes audio attachments commonly referred to as "podcasts" using RSS enclosures to iTunes and other specialized subscription programs.

    The Atom Syndication Format was released in December 2005 under the standardization process of the Internet Engineering Task Force (IETF). A few popular uses include Google GData for API responses, FeedBurner resyndication, and Six Apart blogging products.

    Choosing RSS or Atom for feed syndication is a bit like selecting GIF or JPEG as your image format: publishers have preferences for the best representation of the original data but most renderers support both. There are a few easy answers however. If you syndicate audio or video in your feed, RSS offers more reliable compatibility across deployed players. If you would like to use your feed as a lightweight API or present data for government consumption, Atom should be your format of choice.

    Extended vocabularies

    RSS and Atom take advantage of XML to express data not included in their base vocabularies. A number of groups and companies have authored namespace extensions to represent a variety of data. Here's a look at some of the more popular namespace expressions:

    Dublin Core metadata
    The Dublin Core namespace might be used to specify an author name, a contributor, or copyrights to an individual feed item. Many Dublin Core elements are better expressed using Atom base elements.
    Comments
    Comment feeds and counts can be included with a feed item. Slash and Well-Formed Web namespaces are popular additions to RSS while Atom feeds may use Atom Threading Extensions.
    Photo, audio, and video
    Publishers may add more information about media enclosures using Yahoo! Media RSS or the iTunes podcast namespace. Yahoo! Media RSS lets a publisher describe multiple available data types available, such as MP3 and AAC. The iTunes namespace enhances your listings within the iTunes Store.
    Search results
    OpenSearch expresses search results and related data for consumption by search aggregators and the built-in search features of Internet Explorer 7 and Firefox 2.
    Creative Commons
    To declare Creative Commons license data inside a RSS feed. Atom publishers can use rights instead.
    Geographical coordinates
    Publishers can express latitude and longitude coordinates using the W3C Basic Geo vocabulary. A geotagged set of photos might be syndicated with coordinates or traffic conditions might publish a corresponding location.
    Item pricing
    Buy.com product module uses a specialized namespace for pricing, thumbnail image, text-only description, and SKU.
    Weather conditions
    Yahoo! Weather publishes weather forecast data using a specialized namespace. The National Weather Service uses Digital Weather Markup Language.
    Forums
    Jive Forums namespace covers forum issues such as total post messages and individual threads.
    Calendar
    Google Calendar namespace is one way of expressing calendar data.
    List formatting
    Microsoft's Simple List Extensions define a unique ordering of feed items such as a Top 10 list or upcoming movies in your rental queue.

    Avoid confusion of tongues

    Paul Gustave Dore Confusion of Tongues

    Given the amount of expression available in both the base and available and widely deployed extended namespace a new feed publisher would be well-suited sticking to these vocabularies where possible. Just as the color value "cyan" may have no value to a color picker with a limited vocabulary of expressions, your expressed data might never be parsed or understood by feed parsers if you become overly inventive.

    Most feed parsers don't actually walk the XML of each feed. They rely on feed parser libraries to handle feed errors, similar markup across different publication formats, and retrieving remote files from your server. A parser such as Universal Feed Parser contains built-in support for over 40 namespaces and attempts to normalize various ways of expressing title, author name, etc. A newly invented namespace is less likely to be supported by these intermediate libraries than existing methods of data definition.

    Here's a sampling of some of the popular feed parsing libraries by programming language:

    Windows/C#
    Windows RSS Platform
    Apple Leopard/Cocoa
    Apple Syndication Platform (unreleased)
    Python
    Universal Feed Parser
    PHP
    Magpie
    Java
    Rome
    Perl
    XML::FeedPP
    Ruby
    Simple RSS

    Check for errors

    Once you've published your feed you'll want to check for XML and feed errors. Some parsers are more liberal than others, but a single error could result in users of specific services not receiving your latest updates.

    You can check your files for errors with Feed Validator or the W3C Feed Validation Service. You can program web services directly against the W3C interface, or you can download the feed validator code for local use.

    Feed marketing

    Once you've published a feed using well-understood element sets and valid markup you'll want to be sure the world can find your latest updates. Aggregators and search engines support ping notifications, a quick way of letting a service know they should visit your website and/or feed and discover new updates.

    Ping

    Most ping servers accept update notifications delivered via XML-RPC and the weblogUpdates.ping method name for website title and website URL and/or weblogUpdates.extendedPing for the same data plus a feed URL. You can send notification updates to a variety of sources for quick inclusion in a search index or feed aggregator. Below are just a few popular ping endpoints serving a general audience:

    Google
    http://blogsearch.google.com/ping/RPC2
    Yahoo!
    http://api.my.yahoo.com/RPC2
    http://ping.blo.gs/
    NewsGator
    http://services.newsgator.com/ngws/xmlrpcping.aspx
    Bloglines
    http://www.bloglines.com/ping
    Technorati
    http://rpc.technorati.com/rpc/ping
    VeriSign
    http://rpc.weblogs.com/RPC2

    Create new subscriptions

    A few search services restrict their index to user feed subscriptions. If you're not already a user, create a new account and subscribe to your feed, adding notes and tags where appropriate. Be sure to cover popular online aggregators such as My Yahoo!, Google Reader, Bloglines, etc.

    These additional actions give your feed a few extra importance points, since at least one user cares enough about the data to subscribe.

    Claim your site, claim your feed

    Some search services allow a publisher to verify their website and/or feed for more frequent updates, statistics tracking, or highlighted search results listings. You'll likely have to place a specially issued code within a web page or feed to prove your account has the ability to edit the site you would like to claim. Here are a few search services that offer author claiming:

    Local Resources

    This blog post is meant to serve as a general overview of the worldwide market for feed publishers. My views are skewed towards blogs published in English inside the United States. If you publish content in other languages or focused on a particular national audience, research the integration opportunities available with those specific services.

    Summary

    Feed publishing is a pretty busy space! Millions of customers are ready to receive regularly delivered content updates, either through their feed aggregator or through a search engine. Structured data delivered in easily digestible chunks is a good thing.

    Feeds can serve many purposes, from lightweight APIs and data interchange formats to news updates. Each use has an intended audience and possible extended audience, and creating well described data in commonly understood data formats will extend your distribution reach and allow the many parsers and feed interfaces already present on the web to begin remixing your data in new ways for custom delivery and interpretation.

  6. Nov26

    Declaring alternate web content for searchability and discoverability

    Web authors may declare alternate versions of a single web page, exposing additional languages available or various file formats. HTML documents express these relationships using the link element in the document header.

    Alternate language

    Wikipedia main language offerings

    A single Wikipedia article about "search" might have alternate representations and translations, such as "buscar" in Spanish, "suche" in German, "rechercher" in French, etc. A search engine or web browser software can discover the availability of these alternate document versions if declared by the publisher.

    <link title="Arabic" href="http://ar.example.com/" rel="alternate" hreflang="ar" type="text/html" charset="ISO-8859-6" />

    The example markup above advertises an alternate version of example.com available in Arabic expressed in the ISO character set 8859-6. If a user capable of reading Arabic arrives at the page they can now take appropriate action.

    Alternate format

    The HTML specification also allows publishers to associate alternate file formats with a web page. A publisher might declare alternate versions of the page available in plain text, PDF, or a web feed format such as RSS or Atom.

    <link title="Print Me" href="http://example.com/index.pdf" rel="alternate" media="print" type="application/pdf" />

    Modern browsers take advantage of these alternate file format declarations, lighting up a special icon when a web feed is discovered. Internet Explorer 7, Firefox 2, and Opera 9 advertise the availability of a web feed corresponding to the viewed web page.

    Internet Explorer 7 web feed highlight

    The ease-of-use and availability of these new feed discovery tools will convert website visitors into website subscribers, strengthening each user's relationships with your content.

    This post is the part 1 of 2 of a 15-minute feed syndication best practices presentation from WebmasterWorld PubCon 2006 in Las Vegas. Part 2, Feed publishing best practices, is much longer.

  7. Nov21

    The Spam Farms of the Social Web

    Blogs and other social media tools have changed the publishing landscape over the past few years, making it easier than ever to share information with the world. The ease of use and focused attention of the medium has also helped create new opportunities for spammers to automatically generate content, buy links, and get noticed by search engines and other points of aggregation. In this post I will break down the operations of one spam network utilizing social media technologies such as WordPress, Digg, del.icio.us, and more to climb the search results and generate revenue through ads and affiliate programs.

    Last weekend I noticed a Digg submission about weight loss tips had climbed the site's front page, earning a covetous position in the top 5 technology stories of the moment. The 13 sure-fire tips were authored by "Dental Geek" and posted to the "Discount Dental Plan" category on his WordPress blog. Scanning the sidebar links and adjacent content it was obvious this content was out of place on a page optimized for dental insurance. The webmaster of i-dentalresources.com had inserted some Digg bait, seeded a few social bookmarking services, and waited for links and page views to roll in, creating a new node in a spam farm fueled by high-paying affiliate programs and identity collection for resale.

    eBizzSol portfolio snapshot

    The spammer's domain is managed by eBizzSol, a company with fake domain registration information including the address block of a Christian church in Fullerton, California. The dental site is registered to an address in Dhaka, the capital of Bangladesh. Based on the broken English I've found on the network's sites an offshore base of operations would not surprise me. eBizzSol mentions about 200 sites in its portfolio, including real estate, mortgage, casinos, and more. They even advertise a content generation service for SEOs offering six blog posts a month for $75 optimized for specific keywords, including guarantees for blog directory and ping submissions. There are other sources of content generation available for hire online, creating a flow of content republished across a target category optimized for specific terms.

    Follow the money

    Why would someone want to create a site optimized for dental services? A search engine such as Google or Yahoo! discovers the site, indexes its pages, and starts including its content in search results for targeted keywords. Web searchers associate search engine rank with authority on a subject such as lowering an insurance premium or mortgage and generate a large amount of money per action. This particular site is collecting $40 or more per dental plan sold through a dental plan reseller and targeting specific keywords of value and boasts search engine index inclusion of "just a few hours" on its pages.

    The dental terms targeted cost up to $18 a click, offering incentives for top organic search conversion. Below is a price estimate from Google for keyword targeting in the United States.

    Google AdWords pricing
    Search termCPC ($)
    teeth whitening18.66
    sedation dentistry12.80
    cosmetic dentistry12.76
    dental plans9.78
    dental implant6.85
    pediatric dentist6.77
    discount dental plans5.93
    oral surgery4.95
    braces3.39
    cavity1.88

    Gathering links

    Directories

    Yahoo! directory pricing

    This webmaster bought links from the Yahoo! directory, the Microsoft Small Business Directory, Business.com, and a few others, placing a link to their site within targeted categories. They are cheaper than the $1000 links purchased on sites such as the W3C, but these listings are often just as spammy.

    Virality

    Digg sample count

    The article link was submitted to Digg by a user who joined Digg last month yet is already ranked in the top 150. The story received over 900 Diggs and is currently buried. A newly minted user posted to Reddit, posted to Newsvine, and posted to del.icio.us using the same name on each service. Seeding and voting up the content worked, as the blog post made its way to the top story listings on each social news service.

    As of this evening the spam site has 353 inlinks from 212 external pages, mostly due to its viral marketing efforts on social networks. Some social bookmarking users include their bookmarked links in their blog sidebar, creating additional direct links throughout their entire site in addition to the original bookmarking service location. The spam network had successfully spread a piece of content throughout multiple user communities, and onto individual blogs in the process.

    Summary

    Certain topics are especially well suited for baiting the technology-oriented crowds of social news and bookmarking sites. Stories focused on Apple, Firefox, Google, Nintendo, history of computers, top X lists, or the target social site itself are common baiting practices used to attract attention and place a new content node on the map. Opportunists will continue to jump into new networks of influence and promote their own sites, gathering search engine juice even when the brief blip of attention has passed and the crowd moves on to another story of the moment.

    World of Warcraft female human with shovel

    I believe social media accounts are currently available for rent or for sale, rewarding active users with paid placements or account resells in much the same way as a World of Warcraft character might be resold on eBay. Social media sites and search engines need to stay on top of this new form of content creation, continually analyzing data and scrubbing out the dirt. Sites overrun with web spam quickly lose their utility and might be banned from search engines.

    Social media sites continue to change the way we interact with data but expect more activity and content shaping in the future from marketers targeting the social media space for a quick link injection.

  8. Nov16

    Social network marketing, spam, and gaming

    I spent the last few days among webmasters at the PubCon conference, where most conversations were focused on marketing yourself online to humans and search engines. The 2000 attendees focused on ranking themselves as high as possible in search engine result pages and driving site traffic. Methods of achieving these goals cover a full spectrum of white hat to black. Social networking and crowdsourcing sites are new focuses of the search engine marketing sector, taking advantage of loose editing and account creation restrictions to boost a site's visibility.

    Social networking and e-commerce

    Should every item in your product catalog have a MySpace profile? A few retailers think so, and mentioned creating automated processes to create new accounts on sites such as MySpace and Vox. If a user wants to add Tickle Me Elmo Extreme to his or her friend list it might just be a profile created by a shopping comparison site, toy merchant, or an affiliate. Toymakers such as Mattel are likely not policing their brand on sites such as MySpace, leaving some opportunity for others to produce the content and gather links, affiliate fees, and more.

    Most web publishers aren't making a cent and would happy to take a few dollars in exchange for a link. That's the opinion of a few new companies and webmasters specializing in buying links on weblogs and hobby sites on the web. A few dollars might buy a link on a recipe site making sure every mention of "sharp knife" points to a specific product. Marketers who pay a little more might buy a link in a blog plugin or theme. One consultant mentioned local trade associations are really easy to "buy off."

    The links are distributed across the web, look almost natural, and are a tougher for a search engine to spot as purchased. Sometimes a sponsorship such as Bizrate's placement on CPAN pages is one example, but the success of blog placement in search engine results creates cheaper and more distributed points of purchase.

    Gaming Digg

    Digg was a popular topic of discussion in the hallways, with lots of stories about how sites can tap into the Digg's huge audience and secure a few choice links and good traffic. Some marketers create a story aimed at the Digg audience, such as the top 10 reasons Mac users love Daily Show, and with the appropriate submitters and human or bot-powered voting rise towards the top. A few search engine marketing consultants are promoting their account status and influence on Digg to clients. User-powered content is a popular target, and some of the techniques used are pretty clever and advanced.

    Summary

    There is a lot of activity in the social networking and user generated content space from marketers and spammers. New services need to pay attention to a variety of attack vectors and patch holes and vulnerabilities quickly to stay relevant, useful, performing well. I've summarized some of the already public and well-discussed vectors of exploitation, but there are a lot more advanced methods skewing search and discovery on today's social web I won't be blogging about.

  9. Nov10

    Speaking at PubCon on Tuesday

    I'll be in Las Vegas early next week speaking at PubCon on feed syndication best practices. The session takes place from 10:15-11:30 a.m. on Tuesday if you are attending the search conference.

    I have not been to Las Vegas in a few years so I'll be checking out new pieces of grandeur at the Wynn, new Caesars Palace, Treasure Island, etc.

    Hopefully there will be lots of search geeks in attendance leading to interesting conversations.

  10. Oct27

    Bookmarking and social sharing trends

    The ability to save a URL has been around since Mosaic 0.2 but is currently experiencing a transformation as we learn more about the pages and content behind the pointers and share our findings with others through social networks. Hotlists, bookmarks, and favorites are changing and this month's SF Tech Sessions next Monday will take a look at a few new companies changing the way we think about sharing bookmarks.

    The inspiration for this month's SF Tech Sessions came out of a conversation with Jeff Weiner and Joshua Schachter of Yahoo! earlier this month. We talked about different ways people share data on del.icio.us, Yahoo! My Web, and Yahoo! Shopping as well as within smaller communities of interest.

    The bookmarking space continues to change, driven by changes in desktop software as well as modern web usage such as bookmarklets, extensions, and social sharing, but it's clear we're just getting started. Let's take a look at current methods of bookmarking a web page, and how individuals choose to share personal and social browsing behavior.

    Local bookmarks

    NCSA Mosaic Advanced Hotlist Manager

    Local bookmarks are stored in our web browser profiles and are often used the same way we might dog-ear a book. Local bookmarks can list a frequently visited site, an article you want to be sure to revisit later, or a decision in progress such as choosing a vacation or shopping for a new couch.

    You might bookmark the news page of your son's school to stay up-to-date on snow closures, events, and other relevant news. Local bookmarks might be relevant only to you, enabling shortcuts for frequent activities.

    Further reading: Internet Explorer Favorites, Mozilla Firefox.

    Live bookmarks

    Some bookmarks contain trackable updated content, expressed as a web feed, calendar data, or simply a file modification. It's possible to subscribe to a web page, displaying updated content within the bookmark listings or simply noting the page has changed in some way.

    Mozilla Firefox Live bookmark

    A live bookmark lets users quickly glance over changing data, and track the updates of many sites at once. In this case a bookmark is more like a subscription, creating a shortcut for visiting a page, identifying new content, and then visiting the location of the new content.

    Bookmark clusters

    Adding a bookmark used to mean saving the location of the current browser window. Today's modern browsers consist of multiple tabs, placing multiple web pages inside each window. These tabs might be organized collections, storing items you would like to recall on a regular basis or save as a collection.

    Mozilla Firefox tabbed interface

    Browser tabs form natural groupings and an easily saved state. I expect we'll see more bookmark collections in the future as tabs become common browsing tools in Internet Explorer, Firefox, and Opera. A user can save a group of bookmarks such as a trip planning, home improvement, or baby names.

    Synchronized bookmarks

    Bookmarks were one of the first pieces of local data to travel into the cloud, offering synchronization across multiple computers or web access when you are on the go. Synchronization may occur through a browser toolbar or plugin, operating behind-the-scenes while connecting to a backend such as Yahoo! or Google.

    Yahoo! Toolbar bookmarks

    Google and Yahoo! account for more than 95% of toolbar searches in the U.S. and I expect many of those users automatically sync their bookmark data.

    Bookmarking in public

    Sites such as del.icio.us or Furl allow you to sync and share your bookmarks, exposing your web pages of interest to other site members or the entire Web. Your descriptive behavior may change as you add a title, description, and tag for your own use and/or discoverability of others.

    del.icio.us add bookmark

    The integration of social bookmarking content in blog sidebars, spliced feeds, and site browsing has made bookmarking a substitute for a full blog post and commentary. Private bookmarks are a fairly recent addition to del.icio.us, showing the default nature of the site's users.

    Bookmarking for another individual

    Del.icio.us users can share a bookmark with a specific person, placing the pointer within the target person's bookmark stream. This bookmarking behavior is a virtual tap on the shoulder, suggesting new content of potential interest.

    A person's link behavior might be tied into a user account network, tracking the bookmarks of a group of people at once, and suggesting those same people as possible share points.

    Bookmarking for an affinity group

    Groups form in online communities, joining together people interested in squared circles, social networking or web design. Submitting links to a group creates a shared resource with a defined audience interest. Your work is archived, allowing new members to discover the group's past activity.

    ma.gnolia Identity 2.0 group

    Bookmark groups may also launch further conversation, either in real-time through a chat or through comments on the original submission. Adding a link to a new article in a trade publication might spark some debate, or a link to a corporate document might initiate further analysis.

    Try it: Ma.gnolia, Mugshot

    Shared collections

    Shared bookmark collections are a useful way of sharing research and soliciting input from others on multiple resources. Users can share their own personal resources such as the best coffee in San Francisco, waterfall hikes in Oregon, or the hottest prom dresses of the season.

    Kaboodle bridal collection

    Once a collection is shared it might be edited or commented upon by a group, enabling the wisdom of the crowd. Shared collections are an opportunity for revenue sharing, rewarding the recommendations and expert opinions of others while completing a purchase of displaying an advertisement.

    Check out: Amazon Guides, Kaboodle.

    Additional data collection and display

    A bookmarked URL can contain more information than just a URL text string. You can identify a bookmarked resource as an image, audio, or video and display the full content or a preview within your application. You can also recognize content from known structured sources such as Amazon, Flickr, and YouTube, pulling in additional data about the linked resource.

    Amazon product information

    Recognizing an Amazon URL and the ASIN within, a service could gather price, availability, product images, reviews, and more from the web page HTML or through available APIs. A Flickr or YouTube URL could be similarly recognized and additional data gathered and URL normalized based on the service's proprietary identifier and URL structures. I expect more social bookmarking services will build these specialized data displays as they seek to grow vertically and make their pages a bit less boring.

    Frequently visited non bookmarks

    You might frequently visit a site by typing some keywords into a search engine and clicking on the top result. I sometimes conduct the same search for a resource multiple times a month, visiting the top result. I consider these actions a type of soft bookmark. It's easier to initiate a search than save it, but my repeat visits are useful information to the search engine as it tries to shape personal search and social search preferences.

    Amazon product information

    The example search above shows a search for "NSI whois" on Google, my way of calling up a Whois data for a domain and I occasionally want to get the data within a few seconds.

    Bookmarks are searchable locally and inside of an online service, contributing strong signals about user preferences to the search process. A search for "digital camera" becomes more useful when you are reminded about previously bookmarked cameras. Search can serve as a recall for yourself and a filter for yourself and others, creating better results out of the millions of possible matches to your query. Your friend's guide to waterfall hikes is more valuable to you than a random publisher, and search engines with bookmarking abilities will continue to integrate your saved items, visited results, and more into your personalized search results.

    Summary

    There are many different approaches to bookmarking and recent changes in web browsers, add-ons, and a web of participation will continue to fuel growth in the sector. There's still a lot of work to be done in terms of search and service integration and creating compelling reasons to generate useful content, connecting users with the information they care about.

    If you made it this far and you live in the San Francisco Bay area you might want to check out SF Tech Sessions next Monday, October 30, from 7-9 p.m. at CNET to learn more from the people behind current social bookmarking products.

Niall Kennedy Niall Kennedy is a web technologist in San Francisco, California in the United States. I am very interested in the world of... MORE »

Search this weblog:

Subscribe:

Recently Popular

Archives: Popular Categories

Sites: More from Niall