Feedback abuse

So, the comments system. Obviously, I’m concerned about comment spam and other similar issues, and want to make sure the comment code isn’t abused when it’s put online. This goes double since the JavaScript intro was posted to Reddit’s programming site — which apparently is where spammers get their list of sites-to-try-and-abuse from — as, shortly afterwards, we got a bunch of attempted exploits of our feedback/contact page.

Now, our feedback form isn’t a standard one, so I was surprised they’d attack it right away; perhaps naively, I assumed spammers targetted specific implementations that were widely deployed on websites across the web — ones they knew how to exploit. But no. Still, I run quite a lot of paranoia checks against the form data before accepting it, so nothing bad happened, except for some rather strange-looking entries in the log files.

Stop me if you’ve heard this one, but here’s what they were trying to do:

  • since many feedback forms send email, and…
  • since, one presumes, many sites just paste the form fields into a ‘template’ email and hand it off to sendmail or some other standard unix utility…
  • the spammers embed \r\n sequences in their form fields, and then a bunch of extra headers which they hope will now be accepted as real headers by the email system
  • these include To: addresses they hope will override the ones supplied by the web form
  • in other words, they hope to turn a contact form into an open SMTP relay.

Naturally, I strip out all control codes from form fields (what do you take me for?). I’m almost (but sadly, not) surprised that there are enough websites out there vulnerable to this simple attack to make it worth their while. Anyway, since seeing this behaviour, I also added early checks that reject anything that tries to spoof headers. It keeps the logfiles short if nothing else :P

The form data was somewhat scrambled as the spammers didn’t seem to really know which form fields matched what data. Also, the email content itself was nonsense, even by spam standards — there wasn’t any advertising or viral payload. And the attempts trailed off after a while. So what was the point of all that?

Spider sense tingling

Well, I think it’s some kind of black-hat “web form open-relay scanner”.

Firstly, I think they’re running a spider script that picks out new sites from Reddit (and presumably Digg and other social link-ranking sites) and looks for pages with “comment” “contact” “feedback” etc on them (hi!), picks out any forms, and then throws everything into every form field to see what sticks.

The emails were targetted to a big list of addresses; I think this is a screen to hide the one address which the spammers actually own themselves. Just accepting a form post is no indicator of whether the spam got sent, you see — so, I think they try spamming on every site that pops up on Reddit & friends with a slightly different random text, and wait to see which actually turn up in their inbox. This whole process is almost certainly automated. At the end of the day, they have a database of websites known to be vulnerable for spamming.

This is similar to the use of images in emails to verify working addresses. Whether they then spam the crap out of open relays themselves, or whather they sell the database on to others, I don’t know.

If this theory is true, it might be interesting to consider ways to use this against them. Web client IP addresses might be useful, but those are probably zombie machines (if they have any sense). On the other hand, it’s a little tenuous, but gathering together all the email addresses across all the websites and seeing which keep cropping up might highlight the addresses genuinely owned by the spammers and allow them to be tracked down — since, for this whole process to be useful to them, they must pick up email from those addresses eventually.

If ISPs prove uncooperative, but you’ve tracked down a known-spammer email address, sending a single carefully-crafted email to that address might allow you to insert data of your own into their database, perhaps by throwing up a honeypot site & script on Reddit and seeing who trundles along to visit…