Startup Search: tracking the web startup ecosystem

Tonight I am unveiling a new site tracking the startup ecosystem. It’s a directory and analytics tool I’ve personally wanted for a long time, and I know others will enjoy. Introducing Startup Search.

Startup Search tracks Web startups, their products, key employees, investment firms, and investment partners. Startup Search also tracks the success of each product since it was first introduced to the world, using publicly available metrics pulled into a single page. It’s a research tool, a discovery engine, and a fact-filled directory of our little Web startup world. I’ll walk through a few features.

Directory data

Startup Search is a data-driven website tracking facts and figures about the entire web industry. You might be familiar with a particular web product, but who is the parent company? Where do the founders live and work every day? Have they taken funding, and if so, from what firms or investors?

In today’s web directory data about startups and their employees was locked behind a paywall. A service such as VentureOne might call your company on behalf of a paying venture capitalist and ask questions about the company to help build a profile. Startups would never see this data, only provide information to someone they may never meet on behalf of a venture capitalist they may never meet. I want to change the flow of information, placing more power in the hands of anyone who would like to blog about, take a job with, or invest in some of the companies Startup Search covers.

Startup Search also covers some data you might never find inside of an existing directory. Who are the 12 angel investors in Dogster who collectively contributed $1 million? Who is the team behind Blinksale? Who is Felicis Ventures?

There may be some entirely new data areas covered as the directory expands. Good feedback will reveal just how much has changed.

Statistics

Each product is tied to a set of statistics I call buzz and traffic. Buzz measures the level of conversation around a product such as links from the web and from blogs, or mentions of the product’s name in blog posts or web searches. Traffic accumulates daily Alexa data together with monthly data from Quantcast and Compete into one single page.

I enjoyed deconstructing the Alexa data and putting it all back together again. All data is licensed using the appropriate APIs for each provider. I hope to increase my data coverage and site features over the next few days at Google Developer Day and Where Camp.

More to come

I wanted to release the product and iterate out in the open based on the tons of feedback available. I plan to introduce more features for startups who would like to claim their profile on the site. I’d also like to introduce more tools to help people track the business aspect of their startup such as tracking new statistics activity or perhaps researching interesting partners.

There is a lot more to be constructed from all the underlying data available at Startup Search. I could sort every possible venture capital investor based on their political affiliations and donations, or compute a possible burn rate based on a company’s business headquarters.

Summary

I hope you enjoy Startup Search and the tools contained within. The companies and features available now are just a beginning, with many iterations and expansions to come.

Startup Search is powered by Python, Django, and YUI. Bryan Veloso of Revyver designed the site using his new love for grid design. The site is currently sponsored by venture capital firm True Ventures for the month of June, and I hope to continue supporting some site costs through run-of-site sponsorships for the near future. There are many eggs and APIs coming together behind-the-scenes, and I’ll likely discover a few more likely sources of inspiration over the next few days.

Enjoy. startupsearch.org

Every good domain is taken. Here’s why.

Kevin Ham built a $300 million web company in Vancouver you’ve probably never heard of. You are likely familiar with his work as you drop vowels from the domain names of your favorite web startups such as Flickr or Tumblr, or try selecting a name for your new company or product. Kevin Ham built a domain name empire named Agoga, cutting exclusive deals with local registrars and national governments, attracting over 30 million unique visitors a month according to an article by Paul Sloan in the June issue Business 2.0 magazine.

Think of a name, any name, and it’s probably already registered as a .com domain. Most sites blocking your path to a memorable and descriptive names are owned by domain squatters, businesses buying domain names in bulk and running targeted advertisements from Google and Yahoo!. Domain names are not only memorable, they influence a site’s ranking in search engines, since a site URL containing the phrase “kittens” has a higher possible correlation to kitten content than a URL without the keyword match. Kevin Ham owns over 300,000 domains, virtual real estate in our online lives. He assembled the portfolio over many years of work, taking advantage of changing technologies, business environments, and legal terms. He is just one example in a global domain industry collecting millions of dollars a year prospecting new names.

Paying off registrars

Each top-level domain on the Internet owns one or more root nodes defining the availability and location of sites such as MySpace or eBay. Ham wrote automated programs in the late 1990s that could download a listing of all registered domain names, a root zone file, on a daily basis, comparing multiple versions to identify expiring domains of value. In the beginning Ham ran the programs on a few desktop machines in his office, but he quickly stepped up his operations to stay ahead of competitors.

Ham worked out exclusive deals with domain registrars, snapping up expiring domain names on a direct connection from the registrar’s computers into the VeriSign root nodes at the heart of the U.S. Internet. He paid 10-times the typical domain registration fee in exchange for such privileged access to the domain name ecosystem. Ham was able to snap up over 10,000 domains in late 2000, just months after imploding Internet brands vacated their offices and their web brands.

Today registrars directly engage in the game of expiring domains, letting customers set triggers and alerts for a premium price or directing them to a domain resale marketplace.

The Money

Wedding shoes domain screenshot

WeddingShoes.com is one of Han’s many sites setup to serve visitors relevant ads from Yahoo. The site offers related keywords, often at higher CPC payouts, for site visitors who would like more advertisements.

A simple site targeting wedding shoes earns Han’s business about $9,100 a year. Not bad for a $8 domain purchase and what he reports and about $7 in maintenance costs per year.

Free trial periods

Each top-level domain sets its own terms and conditions for ownership, sometimes requiring a company to operate within the borders of the assigned country, or settling conflict resolution terms. Each top-level domain often has a money back guarantee on purchased domain names varying from 5 to 30 days. Domain name prospectors purchase thousands of domains at a time, run advertisements on each site, and get their money back on the worst performers when their trial period has ended.

Buy out an entire country

I often joke about the top-level domains assigned to small islands in the South Pacific which exist for a few hours each day at low tide. The 12,000 inhabitants of Tuvalu, a member of the British Commonwealth, earn $4.2 million every year by leasing the .tv top-level domain. That’s $348 a year per resident in exchange for helping videobloggers find a more desirable web address.

Cameroon map

Ham took things one step further, negotiating with the country of Cameroon for rights to the .cm top-level domain. A special deal with the Cameroon government gives gives Agoga control of any wildcard domains, domains requested by a web browser but not yet registered. It’s a gold mine attracting 8 million unique visitors per month to pages full of ads served by Yahoo.

Cameroon currently has about 167,000 online users spread across 39 ISPs according to the CIA World Fact Book. Leasing the unused space for .com typos makes good business sense for the local government.

Domain acquisition companies are already pursuing similar deals in Colombia and Oman, .co and .om respectively, hoping to capture a few more commercial typos without directly targeting the trademarked names and their legal troubles.

Summary

The Business 2.0 article is full of great stats and stories from the underworld of the domain trade. If someone was buying up property in the physical world and putting up hundreds of thousands of billboards at a time, I’m sure the governments and advertising industry would be forced to respond to the crowded skyline full of more advertisements than buildings. In the virtual world domain names are as plentiful as the numbers and characters that string them together, and on-demand advertising from large companies like Yahoo and Google help line the pockets of these prospectors every month.

The domain portfolios of these domain companies are now big enough they have begun to correlate users across multiple sites, targeting the best possible advertisements based on past visits to any member of its portfolio. The article mentions Ham now plans to build more features into his site, perhaps selling a turnkey wedding shoe selling site along with his domain.

Crazy stuff, but at least I’ll know where to direct my anger the next time every good domain for my next idea is squatted.

Windows Live Gallery partners program

Windows Live Gadgets

Microsoft will offer three tiers of partner support for Windows Live Gallery according to a podcast with Chris Butler this week posted on LiveSide. Companies and brands may partner with Microsoft to highlight their Windows Live Gadget offerings for preferred inclusion in the Live.com personal homepage, Windows Live Spaces sidebar, and the Windows Vista sidebar.

Windows Live Gallery partners receive visual differentiation in the customization listings and programmatic access to the Gallery serves to add and update new Microsoft gadget content.

Microsoft plans to offer external developers three tiers of support:

Gold partner
A premium offering associated with an established business relationship with Microsoft.
Silver partner
The most popular content in the Windows Live Gallery. Content enhancing the platform and elevated based on merit.
Regular contributor
Standard approved submissions to the Gallery.

I believe visual differentiation of official widget providers is an important step towards improving user experience and gaining trust in the widget environment. Brands are currently monetized by third parties motivated by goodwill or affiliate programs, and partner programs are a good way to differentiate official content from third-party submissions.

Widget publishers can also differentiate themselves from the pack by using official brand and trademark names in a widget directory’s author fields and promoting their widget content directly on their sites.

Google relaunches its search rankings and result pages

Google’s search result ranking algorithm received a major upgrade yesterday, incorporating its vertical search properties directly in the main search result page. The new design, Universal Search, integrates results from specialized Google verticals such as blogs, images, news, maps, and video. Results we’ve previously expected to find inside of a OneBox now appear anywhere in the users’ result listings thanks to rewritten ranking and content examination algorithms.

Examining vertical search

Google and other large search engines crawl the worldwide web for new information every minute of every day. The modern web consists of billions of documents expressed in multiple formats and languages created to serve various purposes of their authors and intended audiences. Search has also recently expanded its reach into our libraries’ book shelves, converting dead trees and ink into their digital representation.

Each vertical search engine takes a specialized approach to data and its sources, extracting more information than a generic crawler such as Googlebot. The main crawler might recognize a webpage contains web feeds and pass its RSS and Atom content to a specialized engine such as Google Blog Search for further analysis. A local search engine contains an entire Yellow Pages full of local listings data and other interesting pieces of information such as hours of operation, payment methods, or other items of local interest. A patent search engine knows how to navigate intellectual property databases such as the United States Patent office, turning standardized forms into structured data and diagrams.

Current Google search offerings

I’m sure there are even more specialized public data search verticals supported by Google I’m leaving out of this list. All of these various verticals present contain possible pieces of relevant information for a given query entered into the Google search box. Until yesterday they were isolated from the main Google search box in a separate silo or perhaps a short summary inside OneBox, the integrated results section at the top of a search page.

Google Universal Search

Google’s new search process collects relevant information for each search query from each of its vertical search properties. The universal search for data needs to be processed by a universal ranking algorithm to determine the top 10 results shown for each query. A search for “Utah Jazz” might contain recent news stories, video highlights, pictures of star basketball players, as well as relevant search results from across the Web. Google’s algorithms need to weight results from each vertical, assign it a universal search rank, order the results, and return a response to the user as quickly as possible. No small feat, and the new algorithm and breadth of search definitely impress. Google also announced yesterday it will process your search query in multiple different languages, returning results for your search term if it were translated into its equivalent value in French, Spanish, and more, creating an even more complicated problems of query extraction and result set analysis.

Visual changes

All of these behind the scenes updates allow a few new tweaks to the front-end UI adapted to your query’s result set. I’ll walk through some of the major changes.

Search refinement

Google search iPod

Google lists search verticals with appropriate results for your queries at the top of this search result page. In this example I searched for “iPod” expecting to find information about Apple’s popular music player. Google assembles 10 “universal” results on the page, but clusters relevant search options across its vertical properties such as iPod patents, iPod products, and iPod news search enabling a quick refinement.

Inline video thumbnails and playback

Google search video results

Video thumbnails are included directly alongside video search results along with the video’s total length and user rating. You can even watch the video directly from the search result page using embedded players from YouTube and Google Video. Google gathers metadata from other popular video sites such as Metacafe, but embedded players video playback is currently not available for these third-party sites. Searchers will see a thumbnail image from supported sites’ video content and will need to visit the site directly before playing back any videos.

Google’s Marissa Mayer mentioned during yesterday’s Searchology event a lack of support for external video players and not the actual site content. The Media RSS module can help smart publishers better define video thumbnails and an appropriate web browser media playback console. The thumbnail element provides search engines with visual search result data and the player element specifies the location of your preferred playback interface. I don’t expect Google to load remote code such as a Flash player inside of their search results pages, but at least you can be properly prepared.

Related content

Google results Jerry Falwell

The search result page also suggest alternate searches related to your current query where appropriate. In a sample search for “Jerry Falwell” I received five related searches, three results from the news archive broken out by year of publication, and three recent blog posts on Falwell’s recent death.

If a searcher makes it all the way to the bottom of the page it’s likely they never found what they were looking for, and a few quick suggestions might boost the relevancy of their experience.

Summary

The main Google search result page just received a major revamp yesterday with more content integrated from search verticals and new methods of displaying information about your query. Google’s sidebar advertising has always restrained itself to matching the layout and expectations of the main search result page, so perhaps Google Universal Search opens up new advertising options beyond a text summary such as maps, images, and video.

Publishers should now be even more motivated to list their content in a Google vertical and stand out on the main search result page. I expect video publishers and local businesses will pay even more attention to Google and its referral power now that their data can be highlighted in a more visually appealing search result than the competition.

Google Gadgets are now an AdSense unit

Webmasters will soon be able to auction off widget space on their sites and blogs managed and marketed by advertising powerhouse Google. Advertisers will produce a Google Gadget in standard IAB unit sizes for distribution across the Google network at CPC or CPM billing rates. Google will bolster its current Google Analytics package to support better tracking paid and free widget campaigns in this sub-page and asynchronous pageload environment. The Google Gadget advertising beta program was publicly announced during a marketing summit for the automotive industry according to Online Media Daily.

By the end of 2007 Google will offer its traditional text link advertising, display advertising, and interactive gadgets to its huge network of advertisers. AdSense publishers can select the interactive marketing unit that best suits their need, and the Google bidding system can select the most profitable ad content. The advertising content can behave like a miniature application, integrating tabs, updating its content on-the-fly using web feeds and other data protocols, and creating small interactive experiences across the web. Google Gadget advertisements will benefit from the same contextual analysis, click-through rate, and other measures of interactivity and success already measured by the Google system.

Pretty cool stuff, and it will be interesting to see what type of CPM might be commanded from a 300 x 250 pixel widget as an advertising vehicle. The new advertising system should be a huge boost to widget-producing design studios as big brands will be much more aware of this potential advertising spend.

Google is not the only company thinking about widgets as a paid advertising model. Startups such as Widgetbox and Clearspring feature widget analytics, directories, and plans for future advertising revenue. The 800-pound gorilla just entered the room, the same giant powering advertisements on widget-friendly MySpace for at least the next three years. The widget industry just shook a bit, and expect more announcements from Google’s gadget program in the next few months.

Podcast: Social media trends with Charlene Li

Social computing has changed the way we interact with the Web. Our information consumption and production benefits from the participation of the crowd in its various forms, creating niche audiences and new types of curators independent of space and time. We’re connected to local experts on hiking, cooking, parenting, programming, and much more. Yet social media extends beyond the realm of content creators, bolstered by the comments, ratings, rankings, sharing, and reading masses that help us find the content we seek.

Forrester Research released a report last week, Social Technographics, detailing levels of social media participation among 10,000 adults and youth. Their sample panel provides new insights and statistics into how users are currently engaging in social media activities, and the motivations which might drive such participation.

Last Friday I sat down with Forrester Research analyst Charlene Li to discuss her report’s findings and its implication for business on the Web. You can read more about the topics of our social media trends and engagement discussion on my podcast site, and view select results from the research report. Our podcast on social media trends is 20 minutes in length, a 9 MB download.

Tip: The full Social Technographics research report costs $279 and is available as a downloadable PDF. The accompanying free PowerPoint slide deck contains key statistics and other summary data you might find useful while keeping your wallet in your pocket.

Podcast: Taking Ajax offline

Rich Internet applications are stepping out of the web browser and onto the desktop, helped along by a new set of toolkits. Web developers are able to code against desktop resources using familiar languages and toolkits such as JavaScript, Ruby on Rails, or HTTP interactions. Offline access for web applications is about much more than planes, trains, and automobiles — it can accelerate performance and integrate with established desktop interactions as well.

Offline web applications are a hot topic, but often misunderstood. In this week’s podcast I step beyond the myths of offline web applications with special guest Brad Neuberg. Brad has spent years digging into reliable storage methods available within a browser environment, and most recently developed the Dojo Offline Toolkit for complete offline access. You can directly download the Offline Web Applications podcast or head on over to the podcast blog post to read more about discussed topics.

Beyond disconnect

Offline web application capabilities are about more than a missing Internet connection. Application data is stored on a local hard drive instead of a far away datacenter, boosting your load times. Web applications become searchable components of the local operating system, displayed inside a Windows Vista Search result or Mac OS X Spotlight. Your application data might become fully integrated with desktop calendar, address book, or web feed platforms, exposed to any requesting application including mobile phone synchronization or personal backups.

Summary

The offline web application space is a hot topic of discussion which may or may not apply to your product. Is offline access a graceful enhancement on top of your existing application? Are customers clamoring for it? Will you take your application offline using Adobe Apollo, Firefox 3, Joyent Slingshot, XULRunner, or Zimbra Offline? Those are just a few of the toolkits we know about this month, yet more are coming.

It’s time to demystify. I hope you enjoy my podcast with Brad Neuberg, one of the experts in the space of offline access for web applications, as a quick way to get your head around some of these larger issues in the future of web application development.

Nokia Widgets for Series 60

Weatherbug widget Nokia N95

Widgets are coming to Series 60 handsets this Fall, bringing tiny pieces of content onto the application menu of the world’s best-selling smartphone OS. The S60 Web Run-Time builds upon the existing open source technologies in the S60 browser and provides a development experience very close to Apple’s Dashboard widget environment. The widget software will be available in version 3.2 of Nokia’s operating system due out this Fall. The Series 60 operating system is currently installed on over 85 million mobile devices produced by Lenovo, LG, Nokia, and Samsung.

S60 widgets are marked up using HTML, CSS, and JavaScript tied together with a platform manifest. Plugins such as Flash Lite are currently not available inside the Web Run-Time environment to help minimize the total software footprint. S60 widgets appear on the application menu just like a native phone application, loading a locally stored user interface, content frame, and rich interaction supplemented with fresh data pulled in from the web over cellular data or a WiFi network. The load time is almost instant, unlike many of the mobile Java applications I have used. The user simply selects their content of choice launched on demand from a quick launch bar or application menu.

Any developer familiar with widget development inside personal homepages or Apple’s Dashboard environment should be able to easily port their work to Nokia’s S60 widget environment. Both Apple and S60 use the open source WebKit browser engine to render widgets. The mobile platform requires a few specialized changes such as adjusting how you collect a user preference such as a ZIP Code or a Flickr account name and altering your graphics display for a mobile phone’s small screen, colors, and resolution. WeatherBug ported their Apple Dashboard widget to S60 and claims the entire process took about 5 days.

The Web Run-Time and its widget environment is separated from the rest of the operating system for security reasons, but Nokia does plan to expose more functionality in future releases. Future widgets will be able to receive GPS data, access a local address book, and possibly even place phone calls. Imagine a widget responding to its local environment, narrowing a local search to 5 minute walking distance, and possibly messaging your nearby friends to come join you.

Widget distribution

S60 users will be able to download new S60 widgets directly to their phones from the Nokia mobile portal and from the WidSets distribution site. Any widget can also be sent to your phone and swapped with friends using the built-in Bluetooth connection.

Summary

Nokia smartphones are not readily available in the U.S. but they are a dominant force in Europe. Mobile-savvy European customers will soon have a new way to access their favorite mobile content inside a richer user interface than a traditional mobile browser window. Web content is as close as the quick launch bar, allowing web applications to earn their spot on the phone’s home screen.

Apple’s iPhone will have similar functionality on a larger display at a higher resolution but limited to EDGE cellular data transfer rates and latency. Both Nokia and Apple will be subject to the content wishes of mobile carriers, who could continue to make life difficult to place content on a mobile handset utilizing their networks.

I’m a fan of the new S60 Web Run-Time as yet another way to extend the reach of web content through widgets. The announced features open up the world of mobile content to developers with a web authoring skill set and creates a lot more content for the S60 platform.

Google Feed API

Google Reader finally has its first official API. Any developer in the world can request the entire history of a web feed from Google’s geo-distributed server cloud in a normalized response for inclusion in their websites or products. I’ve been hoping for such an API since I first deconstructed the Google Reader backend in December 2005.

Most users will likely interact with the Google AJAX Feed API through a JavaScript library included on their site or a pre-configured badge generated on the Google site. The Feed API wrapper is part of a larger effort by Google to extend its search and advertising network onto more sites. Page authors can integrate slick-looking results from Google’s web search, news search, blog search, local search, and video properties.

Google HTTP response waterfall

It’s possible to route around the pre-configured JavaScript API libraries and program directly against the JSON and XML response from Google’s servers. Advanced users can code directly against the service inside of client-side JavaScript or stand-alone programs to optimize user experience and efficiency with 3 less HTTP requests, 5 KB less data transferred over the wire, resulting in about a half-second performance improvement per page load in my test. Here’s how.

Google AJAX Feed API endpoint

It’s much quicker and simpler for advanced users to directly code against Google’s feed API endpoint. I’ll walk through each required and optional parameter of the REST interface.

An example request for my Atom feed.

http://www.google.com/uds/Gfeeds
Base URL.
q
A properly escaped feed URL.
callback
Define a JavaScript callback function for client-side processing of JSON results. Set to blank for requests from your server.
context
DOM Level 1 document context for XML. Set to blank if not needed.
output
json, xml, or mixed. The JSON response is the JavaScript wrapper’s default and therefore most likely to be already cached for common feeds.
v
API version number, currently 1.0.
num
Optional. Maximum number of entries included in the response. Default is 4.
key
Optional. Lets Google track your requests for metering and other purposes. You can agree to a terms of service and receive a key if you’d like.

This endpoint is unsupported and technically in violation of the product’s terms of service. Yet it functions just fine for my needs and provides a quicker load for client- and server-side scripts.

Two feed syndication talks at Web 2.0 Expo

The Web 2.0 Expo officially kicked off yesterday at the Moscone conference center in San Francisco, bringing together thousands of web technologists to learn new things and market new web products. I participated in conference planning as a program chair, selecting a range of topics to educate technical product managers on the latest web technology, specifically in the Web 2.0 Fundamentals track. I’m leading two sessions at the conference on feed syndication technologies and I’ll be in attendance all three days if you’d like to say hello.

Intermediate to Advanced Syndication

Web frameworks and software packages now feature basic support for syndication technologies such as RSS 2.0 and the Atom Syndication Format. I’ll step outside the default settings, introducing session attendees to the many use cases enabled by a standardized syndication format and a well-deployed base of parsing software.

A few of the topics I’ll cover in my 50-minute talk:

  • How is feed syndication used today?
  • Atom Syndication Format walkthrough
  • Common syndication mistakes
  • Optimizing your metadata
  • Feeds as a data API
  • Popular feed parsing libraries
  • Authenticated and private feeds
  • Atom messaging
  • Questions, and possibly some answers

I could go on and on but I somehow have to cram it all into 50 minutes and still leave time for a question and answer period. If you’ve been wondering how Feeds are the Intel Inside, come check out feed syndication talk at Web 2.0 Expo on Tuesday, April 17, from 4:50-5:40 p.m. in room 2008. I’m the last talk of the day, so there are enough geeky people in attendance we can stay after allotted time covering even more advanced or nuanced uses of feed syndication technology.

Feed Marketing Panel

Congratulations, you have a web feed. Now what? On Wednesday I will moderate a panel on feed syndication measurement, advertising, and search engine optimization. Bill Fritter of Pheedo, Don Loeb of FeedBurner, and Stephan M. Spencer of Netconcepts will share their feed marketing knowledge with the crowd, and answer questions from the crowd.

FeedBurner and Pheedo regularly trade veiled attacks on each company’s statistics and click-through rates, so this panel could get really interesting as panelists jockey for feed advertising dollars.

If you’re looking for marketing numbers to help build an argument for deeper involvement in feed syndication this is the session for you. The feed marketing panel takes place on Wednesday, April 18, from 1-1:50 p.m. in room 2002.