Too Busy For Words - The PaulWay Weblog
16 02 2007

Fri, 16 Feb 2007

The touch of life
After my firewall fun last week, and my modem fun at the start of the week, I think I now have working replacement gear to run my home network. I'm gradually configuring up the two boxes - the nice red Yawarra box running pfSense and a new Netcomm modem purchased from Harris Technology while I get my refund from Orange Computers (who, after a bit of fiddling around getting case numbers from Netgear, have agreed to not worry about the '7 days or less for refund' clause on their receipt).

After a bit of poking around in various manuals, I found that it is possible to put the modem in bridge mode, effectively allowing the pfSense firewall to do the PPPoE connection. I prefer this method to having the modem be a firewall and the second firewall just act as a pass-through - it seems less flexible to me. But this raised a somewhat sacrilegious question in my head.

With the way modern modems run cut-down free operating systems to do their firewalling and administration, is it necessary for me to have a separate firewall running another free operating system in order to get the functionality I want? My old Belkin Wifi AP, for instance, allowed two WPA pass-phrases - one for 'full network access' and one that could only access the internet; pfSense only has one, that gives full network access. The NetComm's advanced setup was as sophisticated as anything I've seen from pfSense or Smoothwall, albeit without the neat graphs and SSH access. Should I be paying an extra $450 for a separate piece of kit where a $100 modem has the same functionality?

posted at: 17:12 | path: /tech | permanent link to this entry

Comment spam eradication, attempt 2
Dave's Web Of Lies allows people to submit new lies, a facility that is of course abused by comment spammers. These cretins seem to not notice the complete absence of any linkback generation and the proscription of any text including the magic phrase http://. Like most spammers, they don't care if 100% of their effort is blocked somewhere, because it won't be blocked somewhere else. And there's no penalty for them brutalising a server: their botnets are just trawling away spamming continuously, leaving the spammers free to exploit new markets. It is vital to understand these two factors when considering how to avoid and, ultimately, eradicate spam.

For a while now, I've done a certain amount of checking that the lie submitted meets certain sanity guidelines that also filter out a lot of comment spam. In each case, the user is greeted with a helpful yet not prescriptive error message: for instance, when the lie contains an exclamation point the user is told "Your lie is too enthusiastic". (We take lying seriously at Dave's Web Of Lies.) This should be enough for a person to read and deduce what they need to do to get a genuine lie submitted, but not enough for a spammer to work out quickly what characters to remove for their submission to get anywhere. Of course, this is violating rule 1 above: spammers don't care if any number of messages get blocked, so long as one message gets through somehow.

This still left me with a healthy chunk of spam to wade through and mark as rejected. This also fills up my database (albeit slowly), and I object to this on principle. So I implemented a suggestion from someone's blog: include a hidden field called "website" that, when filled in, indicates that it's from a spammer (since it's ordinarily impossible for a real person to fill any text in the field). Then we silently ignore this field. No false positives? Sounds good to me.

Initial indications, however, were that it was having no effect. I changed the field from being hidden to having the style property "display: none", which causes any modern browser to not display it, but since this was in the stylesheet a spammer would have no real indication just by scraping the submit page that this field was not, in fact, used. This, alas, also had no effect. I surmised that this was probably because the form previously had no 'website' field and spammers were merely remembering what forms to fill in where, rather than re-scraping the form (though I have no evidence for this). Pity.

So my next step was to note that a lot of the remaining spam had a distinctive form. The 'lie' would be some random comment congratulating me on such an informative and helpful web site, the 'liar' would be a single word name, and there was a random character or two tacked on the lie to make it unlikely to be exactly the same as any previous submission. So I hand-crafted a 'badstarts.txt' file and, on lie submission, I read through this file and silently ignore the lie if it starts with a bad phrase. Since almost all of these are crafted to be such that no sane or reasonable lie could also start with the same words, this reduces the number of false positives - important (in my opinion) when we don't tell people whether their submission has succeeded or failed.

Sure enough, now we started getting rejected spams. The file now contains about 36 different phrases. I don't have any statistics on how many got through versus how many got blocked, but that's just a matter of time... And I'm probably reinventing some wheel somewhere, but it's a simple thing and I didn't want to use a larger, more complex but generalised solution.

I'd be willing to share the list with people, but I won't post the link in case spammers find it.

I really want to avoid a captcha system on the Web Of Lies. I like keeping Dave's original simplistic design, even if there are better, all-text designs that I could (or perhaps should) be using.

posted at: 13:37 | path: /tech/web | permanent link to this entry


All posts licensed under the CC-BY-NC license. Author Paul Wayper.