Fighting Spam — All Kinds

How I deal with comment and pingback spam.

I start each morning pretty much the same way. I make myself a cup of coffee, make a scrambled egg for my parrot, and then sit down at the kitchen table and check the comments that came into my blog overnight.

About Spam

The main thing I’m checking for each morning is comment and pingback spam. These are similar but different.

  • Comment spam is a comment that exists solely to provide one or more links to another Web site, usually to promote that site or its services, but possibly to just get links to that site to improve Google rankings. Comment spam ads nothing to the site’s value. Sometimes disguised as a guest book entry or general positive comment — for example, “Great blog! I’ll be back!” accompanied by a link or two — it simply isn’t something the average blogger should want on his or her site.
  • Pingback spam is a comment that appears as a result of a link on another blog pinging your blog. Although many pingbacks are legitimate (as many comments are legitimate), there appears to be a rise in pingbacks as a result of feed scraping, which I’ve discussed here and here. Pingback spam is usually pretty easy to spot; the software that scapes the feeds isn’t very creative, so the excerpt is usually an exact quote from what’s been scraped. Sometimes, oddly enough, the quote is from the copyright notice that appears at the bottom of every feed item originating from this site. Pingbacks automate the linking of your site to someone elses — in the case of pingback spam, it’s likely to be a splogger.

Lucky me: I get both.

Tools to Fight Comment Spam

Fortunately, I use both Bad Behavior and Spam Karma 2 (many thanks again to Miraz for suggesting both of these), so the spam comments that get through their filters and are actually posted to the site are minimized. On a typical day, I might just have 3 to 5 of them. Compare that to 3,400 potential spam messages stopped by Bad Behavior in the past week and the 51,000 spam messages deleted after posting by Spam Karma in the past year since its installation. Without these two forms of protection, I’d be spending all day cleaning up spam.

Anyone who doesn’t use some kind of spam protection on a blog with open comments is, well, an idiot.

Neither program is very effective against pingback spam, although Spam Karma seems to be catching a few of them these days. Although I’m pretty sure I can set up WordPress to reject pingbacks, I like the idea of getting legitimate links from other blogs. It helps form a community. And it provides a service to my readers. For example, if I wrote an article about something and another blogger quoted my work and added his insight to it, his article might interest my readers. Having a link in my comments right to his related post is a good thing.

My Routine

So my morning routine consists of checking Spam Karma’s “Approved Comments” and marking the comments that are spam as spam. Then I go into WordPress’s Comments screen (Dashboard > Manage > Comments) and marking pingback spam as spam and deleting it.

Why do it both ways? Well, I’m concerned that if I keep telling Spam Karma that pingback spam is spam, it’ll think all pingbacks are spam. I don’t want it to do that. So I manually delete them. It only takes a minute or two, so it isn’t a big deal. If I had hundreds of these a day, I might do things differently.

The other reason I delete the pingbacks manually is because I want to check each site that’s pinging mine. I collect URLs of splogging sites and submit them periodically to Google. These sites violate Google’s Terms of Service and I’m hoping Google will either cancel their AdSense accounts or remove them from Google’s search indexing (or, preferably, both). So I send the links to Google and Google supposedly looks at them.

I’m working on a project to make creating a DMCA notice easier — almost automated — and would love to hear from anyone working on a project like that.

This morning was quiet. Only three spams to kill: one comment spam and two pingback spams. I’ll get a few more spams during the day and kill them as they arrive; WordPress notifies me via e-mail of all comments and pingbacks as they are received. (I don’t check my e-mail at the breakfast table anymore.)

Do you have a special way to deal with comment or pingback spam? Don’t keep it a secret. Leave a Comment below.

More on AdSense Splogs

Four more pingbacks from feedscrapers today!

After writing my two splogging-related posts this morning, I went for a flight with some clients. Just got back a while ago. And what do I find in my e-mail inbox? Four more splog pingbacks!

I also found a message from Jim Mitchell, responding to an e-mail I’d sent him earlier in the day. He included a link to a blog post on MaxPower that provides two good methods for stopping AdSense sploggers. I’m going through it now. But the thing I wanted to share was this explanation of what splogging is, for those of you who have no idea what I’ve been whining about all day.

From Stopping Adsense Splogs & Spammers: Methods that Work on MaxPower:

Imagine searching for something or other on the Internet and arriving at a webpage chocked full of ads and stuffed with the exact keyword you were searching for. The page is of no help because it contains no content of value. Some guy somewhere, created a website that sucks keywords / newstories / content from other websites using RSS, inserted the right keywords to maximize profit from Adsense, and waited for Google to index and rank it high enough for you stumble upon it. Once at the page, the spammer (or spamdexer) hopes that you will click on one of the Adsense ads that seem helpful compared to the rest of the useless random text. This practice of spamdexing wastes your time, its annoying, and you can fight back.

If you’ve seen this on your blog, follow the above link to the MaxPower article to see what you can do about it.

Reporting Google AdSense Policy Violations

A follow-up to my “Google, Adsense, and Splogging” post.

Moments after publishing my post about sploggers making money with AdSense, I got three pingbacks from a splogger’s site. I visited the site and saw an AdSense ad block at the top of the page.

Talk about timing!

I went to Google’s site and looked up the info to report violations. I found “Google AdSense Help Center: How do I report a policy violation?“:

How do I report a policy violation?

We regularly review sites in our program for compliance with our program policies. If you notice a site displaying Google ads that you believe is violating our program policies, please let us know and we can investigate the issue further.

The page then goes on to provide instructions making it clear how to report the problem.

Everyone reading this: please use this information to report sites that are scraping your feeds to generate content for their sites. The more we report this, the more likely Google will do something to prevent these people from getting AdSense accounts in the first place. Then maybe — just maybe — this feed scraping will stop.

Google, Adsense, and Splogging

Reports of cancelled accounts while sploggers earn money by scraping honest bloggers’ content is troubling.

Jim Mitchell lost his AdSense account and Google won’t tell him why. He’s bitter about it. But what makes him more bitter is that he’s discovered that sploggers with AdSense accounts have been using his content to earn revenue.

From Is Google AdSense Really Fair? on JimMitchell.org:

Today, I found four different sites that have scraped my content to use as their own with AdSense ads on the page. This, according to the Google AdSense Terms of Service, is a huge violation. I promptly reported the abuse with hopes the sploggers who lifted my content get their income generating plug pulled pronto.

One of the commenters to Jim’s post claims his AdSense account was also cancelled for no reason.

Now I’ve had no trouble with Google or AdSense and hope I never do. My earnings are meager, but they do cover the cost of hosting, which is my primary goal for including AdSense ads on this site. (That’s one of the reasons I don’t plaster the site with advertising like so many other bloggers do.)

But I do have a serious problem with sploggers, especially if they’re using AdSense or other advertising programs to earn money by illegally using the content written by other bloggers.

I know my content is scraped. Every once in a while, I’ll get a pingback from a sloppy splogger that directs me to his site. The site is full of scraped content and not much else. Most of the ones I’ve seen seem to be link farms for some other purpose. I don’t know enough about this stuff to understand why my content is being scraped when there doesn’t appear to be ads on the site my content is appearing on. (Perhaps someone reading this can explain or include a link to a good explanation.) But if these sloppy sploggers are stealing content in a way that can be easily traced, how many other sploggers are stealing content in a way that can’t be easily traced?

And do they all have Google AdSense accounts?

Which brings up a good question: how does Google determine who qualifies for an AdSense account? Is there a human who actually looks at the sites? I seriously doubt that. So that makes me wonder how effective their software is at determining whether a site is legitimate — full of fresh, legally obtained content — or a ripoff of other bloggers’ hard work.

And that also brings up the question of the effectiveness of an Adwords account. I was using Adwords for Flying M Air in an effort to sell my multi-day excursions. While I’m no Adwords expert, I think I had it set up well. I know I was paying for a ton of hits. But I also know that my phone didn’t ring. While this might mean that people don’t want the service I’m offering — chances are, they get sticker shock out when they see the price — it also might mean that the clicks aren’t being made by serious customers — or even by humans.

But it also means that my Adwords payments might be going to sploggers who have built sites to draw in visitors who then click on my link. I probably wouldn’t mind so much if they were buying — one sale would pay my Adwords bill for a year — but they’re not. So I could be paying, through my Adwords account, for sploggers to steal content from honest bloggers, some of whom, according to Jim Mitchell, have had their AdSense accounts yanked for reasons never explained.

I guess what I want to know is this:

  • Why does Google cancel the AdSense accounts for certain bloggers who claim they have done nothing wrong, then refuse to explain why they were cancelled?
  • How does Google ensure that AdSense accounts are given only to legitimate sites — and not to sploggers or other copyright violators?
  • How can Google Adwords customers be assured that their ads are appearing on legitimate sites and are being clicked by humans who are genuinely interested in the products or services advertised?

I hope Jim gets his AdSense account back. And I hope that other bloggers do their best to report feed scraping and splogging activities to Google or other ad sourcers whenever it’s found.