The Spam Farms of the Social Web

Blogs and other social media tools have changed the publishing landscape over the past few years, making it easier than ever to share information with the world. The ease of use and focused attention of the medium has also helped create new opportunities for spammers to automatically generate content, buy links, and get noticed by search engines and other points of aggregation. In this post I will break down the operations of one spam network utilizing social media technologies such as WordPress, Digg, del.icio.us, and more to climb the search results and generate revenue through ads and affiliate programs.

Last weekend I noticed a Digg submission about weight loss tips had climbed the site’s front page, earning a covetous position in the top 5 technology stories of the moment. The 13 sure-fire tips were authored by “Dental Geek” and posted to the “Discount Dental Plan” category on his WordPress blog. Scanning the sidebar links and adjacent content it was obvious this content was out of place on a page optimized for dental insurance. The webmaster of i-dentalresources.com had inserted some Digg bait, seeded a few social bookmarking services, and waited for links and page views to roll in, creating a new node in a spam farm fueled by high-paying affiliate programs and identity collection for resale.

eBizzSol portfolio snapshot

The spammer’s domain is managed by eBizzSol, a company with fake domain registration information including the address block of a Christian church in Fullerton, California. The dental site is registered to an address in Dhaka, the capital of Bangladesh. Based on the broken English I’ve found on the network’s sites an offshore base of operations would not surprise me. eBizzSol mentions about 200 sites in its portfolio, including real estate, mortgage, casinos, and more. They even advertise a content generation service for SEOs offering six blog posts a month for $75 optimized for specific keywords, including guarantees for blog directory and ping submissions. There are other sources of content generation available for hire online, creating a flow of content republished across a target category optimized for specific terms.

Follow the money

Why would someone want to create a site optimized for dental services? A search engine such as Google or Yahoo! discovers the site, indexes its pages, and starts including its content in search results for targeted keywords. Web searchers associate search engine rank with authority on a subject such as lowering an insurance premium or mortgage and generate a large amount of money per action. This particular site is collecting $40 or more per dental plan sold through a dental plan reseller and targeting specific keywords of value and boasts search engine index inclusion of “just a few hours” on its pages.

The dental terms targeted cost up to $18 a click, offering incentives for top organic search conversion. Below is a price estimate from Google for keyword targeting in the United States.

Google AdWords pricing
Search termCPC ($)
teeth whitening18.66
sedation dentistry12.80
cosmetic dentistry12.76
dental plans9.78
dental implant6.85
pediatric dentist6.77
discount dental plans5.93
oral surgery4.95
braces3.39
cavity1.88

Gathering links

Directories

Yahoo! directory pricing

This webmaster bought links from the Yahoo! directory, the Microsoft Small Business Directory, Business.com, and a few others, placing a link to their site within targeted categories. They are cheaper than the $1000 links purchased on sites such as the W3C, but these listings are often just as spammy.

Virality

Digg sample count The article link was submitted to Digg by a user who joined Digg last month yet is already ranked in the top 150. The story received over 900 Diggs and is currently buried. A newly minted user posted to Reddit, posted to Newsvine, and posted to del.icio.us using the same name on each service. Seeding and voting up the content worked, as the blog post made its way to the top story listings on each social news service.

As of this evening the spam site has 353 inlinks from 212 external pages, mostly due to its viral marketing efforts on social networks. Some social bookmarking users include their bookmarked links in their blog sidebar, creating additional direct links throughout their entire site in addition to the original bookmarking service location. The spam network had successfully spread a piece of content throughout multiple user communities, and onto individual blogs in the process.

Summary

Certain topics are especially well suited for baiting the technology-oriented crowds of social news and bookmarking sites. Stories focused on Apple, Firefox, Google, Nintendo, history of computers, top X lists, or the target social site itself are common baiting practices used to attract attention and place a new content node on the map. Opportunists will continue to jump into new networks of influence and promote their own sites, gathering search engine juice even when the brief blip of attention has passed and the crowd moves on to another story of the moment.

World of Warcraft female human with shovel

I believe social media accounts are currently available for rent or for sale, rewarding active users with paid placements or account resells in much the same way as a World of Warcraft character might be resold on eBay. Social media sites and search engines need to stay on top of this new form of content creation, continually analyzing data and scrubbing out the dirt. Sites overrun with web spam quickly lose their utility and might be banned from search engines.

Social media sites continue to change the way we interact with data but expect more activity and content shaping in the future from marketers targeting the social media space for a quick link injection.

27 comments

Commentary on "The Spam Farms of the Social Web":

  1. Kevin Burton on wrote:

    Note that this story didn’t make it into Tailrank ;)

  2. Ted Rheingold on wrote:

    Nice sleuthing!

  3. Ian Kennedy on wrote:

    Reads like a thriller – excellent work digging this up. I would love to get even more details.

    I think you’ve coined a new term, “spam node”

    *sigh*

  4. Chris Schultz on wrote:

    Niall, Great post, really informative. This is probably a wave of the future that we’re going to have to protect from. My question is, how did they get 913 unsuspecting Digg users to Digg this story up to the first page. I’m sure some were phantom accounts, but they had to have some real users.

  5. alan patrick on wrote:

    Great article…I had blogged on this issue the day before, have linked to this as a postscript in my post about games (Social Media) people play.

    Question is, what to do about it?

    By the way, what do you think of the Google click to call pranking?

  6. Allen on wrote:

    Niall – you are “Niall, the Digg Hunter” – ok fine, its lame, but it works :)

    Great digging… Spam is (and will become more of) a problem as these social sites hit the mainstream.

  7. Niall Kennedy on wrote:

    Chris,
    After an initial seed group geeks who want to lose weight might mod it up or bookmark.

  8. Toni on wrote:

    Great post. I’d love to know how much money these types of schemes make and how much of their operation is automated. Imagine how bad it’ll get if they figure out how to automate this to the level that email and comment spam have been taken.

  9. Dave Hodson on wrote:

    Niall – Great sleuthing! 900 diggs sounds like this was a larger effort (spam-bot) etc

  10. Ravi on wrote:

    I hope that social networking sites evolve to behave more like a “wikipedia,” in the sense that the crowd can become smart enough to quickly detect and “blacklist” sites that are obviously out there just to game the system.

    Great post and thanks for educating the community on this important subject. I think once people get educated that this is going on (and rampant) they will start to “ding” the abusers. I hope that social networking/bookmarking sites enable these types of “cross-check” features more prominently.

    This is the beauty of social networks, they have the power to “give” and “take away” :) now we just need the features to enable the “take away” piece!

    Ravi in Rainy Seattle

  11. Allen on wrote:

    Toni – I am betting not very much – on my site, 1085 diggs = ~$12 in revenue. And people coming from Digg are not ad-clickers and so while this person received a ton of inbound links, he/she probably did not generate a lot (if any) direct revenue from it.

  12. bdeseattle on wrote:

    Great post. I find it fascinating to watch Digg Swarm and actually see how readers gravitate from story to story in real time.

    I’d love to mine that data and have the ability to trace diggs user by user, story by story, and then look for common patterns for how users navigate in real time from story to story. Would also likely help with exposing spammers and others who are exploiting the social networks. Maybe we need to whip up some spambots that crawl the social networks and nuke all spam-related content/comments/etc.

    Your post underscores the importance of baking anti-spamming ninjas directly into socially-driven systems in the hopes of slaying the spammers.

    I’d also like to see a historical view of buzz.originalsignal.com and have the ability to visualize how stories rise, fall, appear, and disappear from the socially driven sites in real time.

    Right now, my biggest problem (aside from the issue you raise) with Digg and other socially-driven sites is the fact that it usually takes at least 3 clicks to get to the actual content item that I am trying to read/browse. Really starting to annoy me.

    I’m also concerned that we may need to call in the troops to help manage bloated tag clouds, and dirty metadata. IMHO, the socially-driven sites need to build better tag mgmt systems that are simple for the average user to manage. If not, the whole social media [r]evolution might implode before us techno-geeks are able to bring it to the masses.

  13. Lee Odden on wrote:

    I agree with your assessment Nial. I’ve written about social media spam a few times on my blog as well, but not quite in this detail.

    Whenever there’s opportunity for link manipulation, opportunists will take advantage. It’s bad enough now that the communities at most social news and bookmark sites are not capable of self-policing as I have previously, and too optimistically, believed.

    BTW, nice job on the RSS presentation last week.

  14. Maxpower on wrote:

    Great detective work. The problems you have highlighted are directly related to the whole debate between sites that have some editorial control (e.g. slashdot & metafilter) vs the more popularity contest style (e.g. digg & reddit).

    Time will tell which way is the best.

  15. engtech on wrote:

    Toni – I am betting not very much – on my site, 1085 diggs = ~$12 in revenue. And people coming from Digg are not ad-clickers and so while this person received a ton of inbound links, he/she probably did not generate a lot (if any) direct revenue from it.

    Much disagreement. It isn’t about the traffic from digg, it’s about the organic inbound links. You can continue to get 600+ hits a day on a dugg post with lots of organic links. (my experience)

  16. MikeOK on wrote:

    I see this social manipulation as a way to hide PPC fraud. By increasing the spam sites exposure, they are less likely to get fingered for large scale PPC assaults. After increasing traffic, they could allow bot networks to intermingle with legit traffic which hides the fraud. I wrote about this scenario earlier this year. Here is a quote and a link where I describe how I would build such a fraud network:

    The ultimate goal of the virus owner, for maximum sustained profit, is to build a semi-legitimate network of authority sites funded by click fraud revenue.

  17. Jeremiah Owyang on wrote:

    Excellent research as always Niall

  18. Tony Obregon on wrote:

    I’m with Ian, “spam node” is a new term and one that caught my eye.

    I am amazed that someone can be in the digg top 150 within a month…doesn’t seem right.

  19. zaibatsu on wrote:

    As a top digg user, I try my best to Digg the stories I like or that I find interesting. But I have a strong network of friends on Digg and I try, and I hope they do, to keep this kind of crap off the front page. I don’t need the money and never will, so I will alway fight hard to keep this kind of trash off of Digg.

  20. Mike D. on wrote:

    A newly minted user posted to Reddit, posted to Newsvine, and posted to del.icio.us using the same name on each service. Seeding and voting up the content worked, as the blog post made its way to the top story listings on each social news service.

    Errrr, where did you get this information? I can’t speak for Reddit or del.icio.us, but on Newsvine, this user never even made it out of the Greenhouse. The Greenhouse is where all new users are put until they are promoted into the general population. Stories submitted by Greenhoused users do not appear *anywhere* across Newsvine, let alone on the front page. It’s one of the ways we prevent things like what you’re talking about in this article, and it works really well. If you visit the story you mentioned on Newsvine, you can see that it has only one vote (probably a self-vote) and never made it anywhere.

    But to your larger point, yes, social news sites are absolutely prone to systemic abuse and will continue to fight the good fight against it.

  21. Niall Kennedy on wrote:

    Mike,
    The story was referenced by two distinct Newsvine users.

    User pindarev, referenced above, seeded the article on Saturday, November 18, at 3:16 a.m. That user joined this month and received one vote on the submission.

    User Anchorman seeded the story over 15 hours later. He has been a site member since January and received 6 votes on the submission.

  22. Mike D. on wrote:

    Interesting. I just checked Anchorman’s article and it’s marked in the system as a dupe, meaning it wouldn’t (or at least shouldn’t) show up anywhere around the ‘Vine either (this can be confirmed by trying to hit any corresponding tag page like “newsvine.com/diet” for instance). It’s possible the 6 votes it got were from readers of Anchorman’s column but I can’t see how else anyone would be able to find it. Definitely possible we have a bug somewhere though.

  23. tomo on wrote:

    The thing about this and the fake gaming console story is that nobody knows if the story was dugg because people thought it was real or dugg because people thought it was fake and wanted to put one over on digg. Whatever the reason the process works because it is a newsworthy story now by virtue of the firestorm it has created. haha – that was a joke :)

    It seems as though people associated digg with news. digg isnt about news at all. digg is an entertainment service, similar to an online game. digg provides users with a platform to interact with each other electronically. with all the pedophile shit the media has associated with “online,” you would think that people would have much more skepticism about placing so much faith that the community sentiment is something they should gracefully accept as gospel. i say faith because if it wasn’t faith this wouldn’t be a big deal. this whole situation shouldn’t be surprising because spam has been around for 15+ years. Somehow digg was above the fray but imho it can no longer be the de-facto trusted place because it is too popular and if i don’t have a direct relationship with someone how can i trust them? i need to trust them if i am going rely upon them as my source of news because that news, or the contents of it, will have a major influence on the decisions i make.

    digg has a lot going for it and is surely getting the brand recognition major companies dream of but it would appear to be worthwhile to get Kevin out of the spokesperson role because while he may generate buzz, the negative buzz certainly has outweighed the positive during the last 9 months.

  24. Vi on wrote:

    I dig this story. This is what you call investigation. I remember reading about the Digg army in the SEO black hat site. And now this takes the cake. Thanks for bringing this to light.

  25. George P. on wrote:

    Excellent research Niall. Your stimulating analysis leaves no doubt that some social sites are starting to be taken advantage or or are taking advantage of their users. Given the massive positive influence these sites have, the question that must be asked is: “Is this spam farming a terrible thing?”

    Given the upside of these sites, is it worth making a lot of noise over such a minor waste of time? I’m not entirely sure it is.

  26. Alessandro on wrote:

    Niall, this is really a very good research. I have learnt something today. I am wondering if it’s possible to automatically identify these kind of schemes from the different social media tools.

  27. Mitch Wander on wrote:

    It seems that the social bookmarking model will eventually need to transform itself. Agreed!

    I think the solution will be pretty straight-forward… allowing users to restrict or weight the link count to those provided by “trusted” sources within their professional or social network.

    LinkedIn + Digg???