Fighting Spam — All Kinds

How I deal with comment and pingback spam.

I start each morning pretty much the same way. I make myself a cup of coffee, make a scrambled egg for my parrot, and then sit down at the kitchen table and check the comments that came into my blog overnight.

About Spam

The main thing I’m checking for each morning is comment and pingback spam. These are similar but different.

  • Comment spam is a comment that exists solely to provide one or more links to another Web site, usually to promote that site or its services, but possibly to just get links to that site to improve Google rankings. Comment spam ads nothing to the site’s value. Sometimes disguised as a guest book entry or general positive comment — for example, “Great blog! I’ll be back!” accompanied by a link or two — it simply isn’t something the average blogger should want on his or her site.
  • Pingback spam is a comment that appears as a result of a link on another blog pinging your blog. Although many pingbacks are legitimate (as many comments are legitimate), there appears to be a rise in pingbacks as a result of feed scraping, which I’ve discussed here and here. Pingback spam is usually pretty easy to spot; the software that scapes the feeds isn’t very creative, so the excerpt is usually an exact quote from what’s been scraped. Sometimes, oddly enough, the quote is from the copyright notice that appears at the bottom of every feed item originating from this site. Pingbacks automate the linking of your site to someone elses — in the case of pingback spam, it’s likely to be a splogger.

Lucky me: I get both.

Tools to Fight Comment Spam

Fortunately, I use both Bad Behavior and Spam Karma 2 (many thanks again to Miraz for suggesting both of these), so the spam comments that get through their filters and are actually posted to the site are minimized. On a typical day, I might just have 3 to 5 of them. Compare that to 3,400 potential spam messages stopped by Bad Behavior in the past week and the 51,000 spam messages deleted after posting by Spam Karma in the past year since its installation. Without these two forms of protection, I’d be spending all day cleaning up spam.

Anyone who doesn’t use some kind of spam protection on a blog with open comments is, well, an idiot.

Neither program is very effective against pingback spam, although Spam Karma seems to be catching a few of them these days. Although I’m pretty sure I can set up WordPress to reject pingbacks, I like the idea of getting legitimate links from other blogs. It helps form a community. And it provides a service to my readers. For example, if I wrote an article about something and another blogger quoted my work and added his insight to it, his article might interest my readers. Having a link in my comments right to his related post is a good thing.

My Routine

So my morning routine consists of checking Spam Karma’s “Approved Comments” and marking the comments that are spam as spam. Then I go into WordPress’s Comments screen (Dashboard > Manage > Comments) and marking pingback spam as spam and deleting it.

Why do it both ways? Well, I’m concerned that if I keep telling Spam Karma that pingback spam is spam, it’ll think all pingbacks are spam. I don’t want it to do that. So I manually delete them. It only takes a minute or two, so it isn’t a big deal. If I had hundreds of these a day, I might do things differently.

The other reason I delete the pingbacks manually is because I want to check each site that’s pinging mine. I collect URLs of splogging sites and submit them periodically to Google. These sites violate Google’s Terms of Service and I’m hoping Google will either cancel their AdSense accounts or remove them from Google’s search indexing (or, preferably, both). So I send the links to Google and Google supposedly looks at them.

I’m working on a project to make creating a DMCA notice easier — almost automated — and would love to hear from anyone working on a project like that.

This morning was quiet. Only three spams to kill: one comment spam and two pingback spams. I’ll get a few more spams during the day and kill them as they arrive; WordPress notifies me via e-mail of all comments and pingbacks as they are received. (I don’t check my e-mail at the breakfast table anymore.)

Do you have a special way to deal with comment or pingback spam? Don’t keep it a secret. Leave a Comment below.

links for 2007-04-03

Google, Adsense, and Splogging

Reports of cancelled accounts while sploggers earn money by scraping honest bloggers’ content is troubling.

Jim Mitchell lost his AdSense account and Google won’t tell him why. He’s bitter about it. But what makes him more bitter is that he’s discovered that sploggers with AdSense accounts have been using his content to earn revenue.

From Is Google AdSense Really Fair? on JimMitchell.org:

Today, I found four different sites that have scraped my content to use as their own with AdSense ads on the page. This, according to the Google AdSense Terms of Service, is a huge violation. I promptly reported the abuse with hopes the sploggers who lifted my content get their income generating plug pulled pronto.

One of the commenters to Jim’s post claims his AdSense account was also cancelled for no reason.

Now I’ve had no trouble with Google or AdSense and hope I never do. My earnings are meager, but they do cover the cost of hosting, which is my primary goal for including AdSense ads on this site. (That’s one of the reasons I don’t plaster the site with advertising like so many other bloggers do.)

But I do have a serious problem with sploggers, especially if they’re using AdSense or other advertising programs to earn money by illegally using the content written by other bloggers.

I know my content is scraped. Every once in a while, I’ll get a pingback from a sloppy splogger that directs me to his site. The site is full of scraped content and not much else. Most of the ones I’ve seen seem to be link farms for some other purpose. I don’t know enough about this stuff to understand why my content is being scraped when there doesn’t appear to be ads on the site my content is appearing on. (Perhaps someone reading this can explain or include a link to a good explanation.) But if these sloppy sploggers are stealing content in a way that can be easily traced, how many other sploggers are stealing content in a way that can’t be easily traced?

And do they all have Google AdSense accounts?

Which brings up a good question: how does Google determine who qualifies for an AdSense account? Is there a human who actually looks at the sites? I seriously doubt that. So that makes me wonder how effective their software is at determining whether a site is legitimate — full of fresh, legally obtained content — or a ripoff of other bloggers’ hard work.

And that also brings up the question of the effectiveness of an Adwords account. I was using Adwords for Flying M Air in an effort to sell my multi-day excursions. While I’m no Adwords expert, I think I had it set up well. I know I was paying for a ton of hits. But I also know that my phone didn’t ring. While this might mean that people don’t want the service I’m offering — chances are, they get sticker shock out when they see the price — it also might mean that the clicks aren’t being made by serious customers — or even by humans.

But it also means that my Adwords payments might be going to sploggers who have built sites to draw in visitors who then click on my link. I probably wouldn’t mind so much if they were buying — one sale would pay my Adwords bill for a year — but they’re not. So I could be paying, through my Adwords account, for sploggers to steal content from honest bloggers, some of whom, according to Jim Mitchell, have had their AdSense accounts yanked for reasons never explained.

I guess what I want to know is this:

  • Why does Google cancel the AdSense accounts for certain bloggers who claim they have done nothing wrong, then refuse to explain why they were cancelled?
  • How does Google ensure that AdSense accounts are given only to legitimate sites — and not to sploggers or other copyright violators?
  • How can Google Adwords customers be assured that their ads are appearing on legitimate sites and are being clicked by humans who are genuinely interested in the products or services advertised?

I hope Jim gets his AdSense account back. And I hope that other bloggers do their best to report feed scraping and splogging activities to Google or other ad sourcers whenever it’s found.