Google phrase analysis within highly ranked websites

A few more details about Google’s possible analysis of page text is now available thanks to a recently published patent application by Googler Anna Patterson from June 2006. The application details how a search engine like Google might analyze text phrases, date-based topics, and associate a web page with related topics, even if the specific topic does not appear in the document itself. The 22-page document further emphasizes Google’s current work on “shingle” analysis to discover important phrases and concepts. (via Search Engine Land) Highly ranked websites are more likely to receive in-depth analysis through multiple index passes and phrase…

Google Blog Search overtakes Technorati’s market share according to Hitwise

Google Blog Search has overtaken Technorati’s market share in the United States according to LeeAnn Prescott of Hitwise. The success of the Google Blog Search is hand-in-hand with Google leveraging existing properties such as Google News and the Google homepage to drive traffic to its new property. Google Blog Search launched in September 2005. Technorati is the green line above, and Google Blog Search is shown in purple. Google Blog Search received a huge traffic boost in October after blog search appeared as an option on Google News pages. Google Blog Search later received a spot on the front page…

Wikiasari: Wikipedia success applied to social search?

Wikia will release a new search engine early next year according to an interview with Jimmy Wales in today’s Times of London. The new Wikia search engine project is named Wikiasari and will apply wisdom of the crowds features to search engine results, letting individual users rank sources of information and their relevancy to a particular query. Of course the article takes a “gunning for Google” angle, citing the PageRank algorithm used since Google’s was founded in 1998. Search engines grow over time, and incorporate multiple ranking factors beyond the math of inbound links and source authority. Google (synonym…

del.icio.us API for URL top tags, bookmark count

Social bookmarking site del.icio.us has exposed a new API providing the top tags and total number of bookmarks for any URL in its system. Yahoo’s Developer Network provided a short preview earlier tonight of a soon to be released del.icio.us web badge but currently anyone can request data from the open API. It’s a useful feature to provide additional context for a URL, suggest tags, or measure one aspect of a site’s popularity. endpoint http://badges.del.icio.us/feeds/json/url/blogbadge parameter hash Simply submit a request to the above API endpoint with a hex MD5 hash of the URL of interest as your hash parameter…

Google Mondrian: web-based code review and storage

Guido van Rossum unveiled his first Google project, Mondrian, tonight during a Python tech talk at the Google campus in Mountain View. Mondrian is a web-based code review system built on top of a Perforce and BigTable backend with a Python-powered front-end. Mondrian is a pretty impressive system and is currently in use across Google. Shared Development Environment Google uses a company-wide Perforce depot with almost no developer branches. Each developer has their own NFS workspace readable by anyone in the company, including automated processes. An administrative process takes snapshots of each developer workspace including local development environments accessed…

Feed publishing best practices

Web feed syndication is made up of two base vocabularies: RSS 2.0 and the Atom Syndication Format. These base vocabularies are extended using namespaces to create a common set of expressions for your web feed data. In this post I’ll walk through some best practices for publishers syndicating their data via web feeds. Should I use RSS or Atom? The RSS 2.0 syndication format has been around for about four years and over that time it has been used by web publishers large and small to represent their data for syndication. The New York Times publishes its top stories via…

Declaring alternate web content for searchability and discoverability

Web authors may declare alternate versions of a single web page, exposing additional languages available or various file formats. HTML documents express these relationships using the link element in the document header. Alternate language A single Wikipedia article about “search” might have alternate representations and translations, such as “buscar” in Spanish, “suche” in German, “rechercher” in French, etc. A search engine or web browser software can discover the availability of these alternate document versions if declared by the publisher. <link title=”Arabic” href=”http://ar.example.com/” rel=”alternate” hreflang=”ar” type=”text/html” charset=”ISO-8859-6″ /> The example markup above advertises an alternate version of example.com available in Arabic…

The Spam Farms of the Social Web

Blogs and other social media tools have changed the publishing landscape over the past few years, making it easier than ever to share information with the world. The ease of use and focused attention of the medium has also helped create new opportunities for spammers to automatically generate content, buy links, and get noticed by search engines and other points of aggregation. In this post I will break down the operations of one spam network utilizing social media technologies such as WordPress, Digg, del.icio.us, and more to climb the search results and generate revenue through ads and affiliate programs. Last…

Social network marketing, spam, and gaming

I spent the last few days among webmasters at the PubCon conference, where most conversations were focused on marketing yourself online to humans and search engines. The 2000 attendees focused on ranking themselves as high as possible in search engine result pages and driving site traffic. Methods of achieving these goals cover a full spectrum of white hat to black. Social networking and crowdsourcing sites are new focuses of the search engine marketing sector, taking advantage of loose editing and account creation restrictions to boost a site’s visibility. Social networking and e-commerce Should every item in your product catalog have…

Google Personalized Homepage for your domain

Users of Google Apps for Your Domain can now add a homepage with a custom set of configured gadgets for their users. The new feature lets companies configure mail messages, calendar data, specialized web feeds, and more as their employees’ portal to the web. The group customization feature was previously only available to large partnerships such as Dell and Gateway. The Apps for Your Domain program launched in August and includes custom branded and custom addressable access to Google Mail, Talk, Calendar, webpage creation, and now your own start page to bring it all together. The Google search box…