Wednesday, March 22, 2006

Bikecircle -- scraped content

I just received a form letter asking for a link exchange with bikecircle dot com. To ensure I'm not linking to a bad neighborhood I always do a little check on the domain requesting the link. Here's what I found about bikecircle.
  1. The domain was registered January 2006 to somebody in Spain.
  2. The site is a discussion forum about bicycling.
  3. In spite of the recent registration, there's are hundreds of pages of discussions. There are two reasons this might be: the forums were on an old, existing forum and they switched domain names; or (more likely) the site is using "scraped" or copied content from other websites.
  4. To test for copied content, I select a random discussion thread that I think looks a little unique. There are several misspelling, which is a big giveaway that this is scraped content. The user profiles are all empty which is also suspicious. The topic I selected also catches my attention because it's something I think I've seen before. I search for similar content and, lo and behold, I had posted the article to Usenet in rec.bicycles.misc about three years ago!
I don't see any advertising on the site, so I suspect they're just seeding the forums to encourage further participation. I've seen other web forums do the same, but they at least keep the proper attribution. Bikecircle's tactic of slightly modifying the content through mispellings and occasional swapped words and changing authorship to fictional users is completely unethical.

Tags: blackhat, seo

4 comments:

  1. Have you been cast for "CSI: Cyclelicio.us" yet? Or maybe it could be "CSI: Colorado"...

    ReplyDelete
  2. Hi Fritz..

    Content scraping is really a touchy thing. I don't think it's right to re-publish w/o permission, but some will argue that any RSS feed (I'm not sure if the forums you're referring to are rss) is by nature meant to be repurposed.

    I guess if there's no feed, and it's true screen scraping, it's really bad form, and probably against copyright laws.

    What I would find interesting is that they're scraping forums content, which is really bottom of the barrell stuff. They must be pretty desperate for google juice.

    ReplyDelete