Google provides a full suite of services for the entry-level blog spammer. There are plenty of legitimate uses for all of these Google services, but Google’s market-leading position in search creates a spam ecosystem that inflates corporate revenues, index size, and user data. Google’s blog hosting service, Blog*Spot, received a lot of attention this week as blogosphere neighbors threw up their arms in protest of the host, which is like the seedy motel at the edge of town that rents by the-hour. It’s cheap and inviting to those who know no better, but those in the know don’t want anything to do with it.
I will describe the Google elements that contribute to a spam farm in an attempt to create more understanding about how your content ends up where you may not want it.
Blogger’s Blog*Spot hosting is a quick and easy way to create new blogs. It’s free, you can post via e-mail, and many people think a Blog*Spot blog is the quickest way into Google’s search index since the blog hosting servers might be only a few rows away from the Google crawler and of course Google knows how to find all of the content inside its own system.
The image above is a completely automated public Turing test to tell computers and humans apart, commonly referred to by its acronym: CAPTCHA. A CAPTCHA is supposed to be easy for a human to decipher, but difficult for computers using image recognition software.
Blogger requires users to solve the above CAPTCHA before creating a new blog. Yet the system is bypassed daily and thousands of new blogs are created.
A simple CAPTCHA can be broken using optical character recognition, the same technology that scans a printed page and converts the words to plain text.
A common way to bypass a CAPTCHA system is to offer humans a reward for successfully entering the scrambled word. Some sites trade free porn for a CAPTCHA solutions, others hire people in low-income areas of the world to sit in front of a computer and solve CAPTCHAs all day.
Google provides a lot of free content for someone to repurpose on their newly created Blog*Spot blog. Search Google’s web, news, or blog results for the keyword of your choice and you will receive a list of content sources Google has determined is most relevant to the query. Copying from the top of these results is an easy way for spammers to obtain content already deemed relevant by Google for inclusion in its own pages.
You will often see spam blogs composed of a group of results including a title, link, and except for targeted keywords. These pages are meant to attract search referrals for advertising or create more pages linking to a site the spammer would like to promote.
Google blog search is the newest Google search service with relevant content available for scraping. Many of the cries from bloggers over the past week were most likely a result of a spammer using a script to retrieve the top search results on Google’s blog search ranked by relevance for inclusion on a newly created Blog*Spot blog.
Google AdWords places text advertisements across the web related to the textual content of a page. Every time someone clicks on a Google text ad for “refinance” it costs the advertiser over $35 and makes the site owner some money. “Vioxx” pays about $16.50 a click, “poker” pays about $2.50 a click, and “camcorder” pays about $2.60 a click on Google’s advertising network. The newly created blog can make money from these advertisements based on how many people are searching for their targeted keyword, the likelihood of a visitor to click on an ad, and the payout for such keywords.
The above process becomes even easier through the use of automated tools for blog creation, content retrieval, and advertising placement. More expensive tools include the use of pre-configured Blog*Spot blogs for a quick start.
Free web hosts have hidden costs. You don’t have friendly neighbors and it’s possible that search engines will not want to help others discover your area of the web.
Google has taken more steps to protect its e-mail service, Gmail, from spammers than it has taken them away from Blog*Spot. There is a lot more that Google can do to reduce spam, reduce click fraud, and improve their Blogger service, but it might involve losing some advertising revenue in the short-term. I think no company in the business of content generation, indexing, or payment can afford to ignore the problem.