So I took backups, and copied files, and wrote new code, and converted old Django 1.2 code which worked in Django 1.4 up to the new standards of Django 1.6. Much of the site has been 404'ing for the last couple of days as I fix problems here and there. It's still work in progress, especially fixing the issues with URL compatibility - trying to make sure URLs that worked in the old site, in one Perl-based CGI system, work in the new site implemented in Django with a changed database structure.
Still, so far so good. My thanks once again to Daniel and Neill at Ace Hosting for their help and support.
Last updated: | path: tech / web | permanent link to this entry
The final one is to allow you to sort the packs by some of the fields displayed. The critical ones are the cell type, pack weight, pack amp-hour rating, price, watt-hours stored and maximum amp delivery. After a bit of help from Khisanth in ##javascript to get the scripting working - basically, don't have a submit button named (or with an ID of) 'submit' or you override the form's natural submit() function - it works now.
I hope people find these improvements useful - I certainly am!
Last updated: | path: tech / web | permanent link to this entry
I decided to try to be good, so I found the GNOME bugzilla and tried to search for "directory", or "rhythmbox", or anything. Every time it would spend a lot of time waiting and then just finish with a blank page. Deciding that their Bugzilla was hosed, I went and got a Launchpad account and logged it there. Then, in a fit of "but I might have just got something wrong", I went back to the Bugzilla and tried to drill down instead of typing in a keyword.
Lo and behold, when I looked for bugs relating to "Rhythmbox", it turned up in the search bar as product:rhythmbox. Sure enough, if I typed in product:rhythmbox summary:directory then it came up with bugs that mentioned 'directory' in their summary line. If you don't get one of those keywords right, it just returns the blank screen as a mute way of saying "I don't know how to deal with your search terms".
So it would seem that the GNOME bugzilla has hit that classic problem: developer blindness. The developers all know how to use it, and therefore they don't believe anyone could possibly use it any differently. This extends to asserting that anyone using it wrong is "obviously" not worth listening to, and therefore the blank page serves as a neat way of excluding anyone who doesn't know the 'right' way to log a bug. And then they wonder why they get called iconoclastic, exclusive and annoying...
Sadly, the fix is easy. If you can't find any search terms you recognise, at least warn the user. Better still, assume that all terms that aren't tagged appropriately search the summary line. But maybe they're all waiting for a patch or something...
Last updated: | path: tech / web | permanent link to this entry
I came across the page http://www.aubatteries.com.au/laptop-batteries/dell-inspiron-6400.htm, which I refuse to link to directly. It looks good to start with, but as you study the actual text you notice two things. Firstly, it looks like no person facile with Australian English ever wrote it - while I don't mind the occasional bit of Chinglish this seems more likely to have been fed through a cheap translator program. Secondly, it seems obvious that the words "Dell Inspiron 6400 laptop" have been dropped into a template without much concern for their context. Neither of these inspire confidence.
I was briefly tempted to write to the site contact and mention this, but as I looked at some of the other search results it became increasingly obvious that this was one in a number of very similar sites, all designed a bit differently but using the same text and offering the same prices. This set off a few more of my dodginess detectors and I decided to look elsewhere.
Last updated: | path: tech / web | permanent link to this entry
Recently I had to generate a new one, and found a couple of recipes quite quickly. The routine (in Python) is:
from random import choice print ''.join([choice('abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*(-_=+)') for i in range(50)])(Aside: Note how Python's idea of line breaks having grammatical meaning in the source code has meant making one liners is now back in style? Wasn't this supposed to be the readable language? Weren't one liners supposed to be a backward construction used in stupid languages? Is that the sound of a thousand Pythonistas hurriedly explaining that, yes, you can actually break that compound up into several lines, either on brackets or by \ characters or by partial construction? Oh, what a pity.)
Anyway. A friend of mine and I noted that it seemed a little odd that the upper case characters weren't included in the string. Maybe, we reasoned, there was some reason that they didn't include these characters (and the punctuation that isn't on the numeric keys). But, looking through the actual changeset that combined all the various salts and secrets into one thing, and looking at where the secret key was used in the code, it seems that it's always fed into the md5 hasher. This takes bytes, basically, so there was no reason to limit it to any particular character subset.
So my preferred snippet would be:
from random import choice s = [chr(i) for i in range(32,38) + range(40,127)] print ''.join([choice(s) for i in range(50)])So you can at least read your secret key, and it doesn't include the single quote character (ASCII 39) that would terminate the string early. The update to the original functionality is in ticket 9687, so let's see what the Django admins make of it.
Last updated: | path: tech / web | permanent link to this entry
The one thing I've found myself struggling with in the various web pages I've designed is how to do the sort of general 'side bar menu' and 'pages list' - showing you a list of which applications (as Django calls them) are available and highlighting which you're currently in - without hard coding the templates. Not only do you have to override the base template in each application to get its page list to display list to display correctly, but when you add a new application you then have to go through all your other base templates and add the new application in. This smacks to me of repeating oneself, so I decided that there had to be a better way.
Django's settings has an INSTALLED_APPS tuple listing all the installed applications. However, a friend pointed out that some things listed therein aren't actually to be displayed. Furthermore, the relationship between the application name and how you want it displayed is not obvious - likewise the URL you want to go to for the application. And I didn't want a separate list maintained somewhere that listed what applications needed to be displayed (Don't Repeat Yourself). I'm also not a hard-core Django hacker, so there may be some much better way of doing this that I haven't yet discovered. So my solution is a little complicated but basically goes like this:
First, you do actually need some settings for your 'shown' applications that's different from the 'silent' ones. For me this looks like:
We build the INSTALLED_APPS tuple that Django expects out of the silent and shown apps, although I imagine a few Python purists are wishing me dead for the map lambda construct. My excellent defence is a good grounding in functional programming. When my site supports Python 3000 and its pythonisations of these kind of concepts, I'll rewrite it.SILENT_APPS = ( 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites', ) SHOWN_APPS = ( ('portal', { 'display_name' : 'Info', 'url_name' : 'index', }), ('portal.kb', { 'display_name' : 'KB', 'url_name' : 'kb_index', }), ('portal.provision', { 'display_name' : 'Provision', 'url_name' : 'provision_index', }), ) INSTALLED_APPS = SILENT_APPS + tuple(map(lambda x: x[0], SHOWN_APPS))
So SHOWN_APPS is a tuple of tuples containing application paths and dictionaries with their parameters. In particular, each shown application can have a display_name and a url_name. The latter relates to a named URL in the URLs definition, so you then need to make sure that your index pages are listed in your application's urls.py file as:
Note the 'name' parameter there, and the use of the url() constructor function.url(r'^$', 'kb.views.vIndex', name = 'kb_index'),
You then need a 'context processor' to set up the information that can go to your template. This is a piece of code that gets called before the template gets compiled - it takes the request context and returns a dictionary which is added to the dictionary going to the template. At the moment mine is the file app_name_context.py:
Note the use of reverse. This takes a URL name and returns the actual defined URL for that name. This locks in with the named URL in the urls.py snippet. This is the Don't Repeat Yourself principle once again: you've already defined how that URL looks in your urls.py, and you just look it up from there. Seriously, if you're not using reverse and get_absolute_url() in your Django templates, stop now and go and fix your code.from django.conf import settings from django.core.urlresolvers import reverse def app_names(request): """ Get the current application name and the list of all installed applications. """ dict = {} app_list = [] project_name = None for app, info in settings.SHOWN_APPS: if '.' in app: name = app.split('.')[1] # remove project name else: name = app project_name = name app_data = { 'name' : name, } # Display name - override or title from name if 'display_name' in info: app_data['display_name'] = info['display_name'] else: app_data['display_name'] = name.title() # URL name - override or derive from name if 'url_name' in info: app_data['url'] = reverse(info['url_name']) else: app_data['url'] = reverse(name + '_index') app_list.append(app_data) dict['app_names'] = app_list app_name = request.META['PATH_INFO'].split('/')[1] if app_name == '': app_name = project_name dict['this_app'] = app_name return dict
We also try to do the Django thing of not needing to override behaviour that is already more or less correct. So we get display names that are title-cased from their application name, and URL names which are the application name with '_index' appended. You now need to include this context processor in the list of template context processors that are called for every page. You do this by using the TEMPLATE_CONTEXT_PROCESSORS setting; unfortunately, if this isn't listed (and it isn't by default) then you get a set of four very useful context processors that you don't want to miss, so you have to include them all explicitly if you override this setting. So in your settings.py file you need to further add:
The most inconvenient part of the whole lot is that you now have to use a specific subclass of the Context class in every template you render in order to get these context processors working. You need to do this anyway if you're writing a site that uses permissions, so there is good justification for doing it. For every render_to_response call you make, you now have to add a third argument - a RequestContext object. These calls will now look like:TEMPLATE_CONTEXT_PROCESSORS = ( "django.core.context_processors.auth", "django.core.context_processors.debug", "django.core.context_processors.i18n", "django.core.context_processors.media", "portal.app_name_context.app_names", )
The last line is the one that's essentially new.return render_to_response('template_file.html', { # dictionary of stuff to pass to the template }, context_instance=RequestContext(request))
Finally, you have to get your template to show it! This looks like:
With the apprporiate amount of CSS styles, you now get a list of applications with the current one selected, and whenever you add an application this will automatically change to include that new application. Yes, of course, the solution may be more complicated in the short term - but the long term benefits quite make up for it in my opinion. And (again in my opinion) we haven't done anything that is too outrageous or made<ul>{% for app in app_names %} <li><a class="{% ifequal app.name this_app %}menu_selected{% else %}menu{% endifequal %}" href="{{ app.url }}">{{ app.display_name }}</a></li> {% endfor %}</ul>
Last updated: | path: tech / web | permanent link to this entry
I urge every Australian to write to Senator Conroy and/or their local Member of Parliament on this issue - it is one we cannot afford to be complacent about!Instead, the Government should either put the money toward the National Broadband Network programme, or run their own ISP with the clean feed technology to compete with the regular ISPs.
- 1% false positive rate is way too high to be usable.
- 75% slower is too low to be usable, and the faster filters have a higher false positive rate.
- It only blocks standard web traffic, not file sharing, chat or other protocols.
- If you filter HTTPS, you cripple the financial system of internet shopping, banking, and personal information (e.g. tax returns).
- If the Government ignores who's requesting filtered content, then those wishing to circumvent it can keep on looking with no punishment. If the Government does record who requests filtered content, then even ASIO will have a hard time searching through the mountain of false positives.
- We already have filtering solutions for those that want it, at no cost.
- Mandatory filtering leads to state-run censorship and gives an in for the Big Media Corporations to 'protect their assets' by blocking anything they like.
- The whole thing is morally indefensible: it doesn't prevent a majority of online abuse such as chat bullying or file trading, and it relies on the tired old 'think of the children' argument which is beneath contempt.
- People who assume that their children are safe under such as system and therefore do not use other protection mechanisms such as watching their children or providing appropriate support are living in a false sense of security.
Regards, Paul.
Last updated: | path: tech / web | permanent link to this entry
Needless to say they raised the bar this year. Up until 2007 it was just a hidden field in the form. In 2008 they added a checksum - this delayed me a good five minutes while I worked out how they'd generated it. This year they've upped the ante, including both a different checksum and adding a salt to it. Another five minute's playing with bash revealed the exact combination of timestamp, delimiter, and phrase necessary to get a correct checksum. I am also made of cheese.
Naturally, don't bother emailing me to find out how I did it; the fun is in the discovery!
Last updated: | path: tech / web | permanent link to this entry
I blog this because I've just been struggling with a problem in Django for the last day or so, and after much experimentation I've finally discovered what the error really means. Django, being written in Python, of course comes with huge backtraces, verbose error messages, and neat formatting of all the data in the hopes that it will give you more to work with when solving your problem. Unfortunately, this error message was both wrong - in that the error it was complaining about was not actually correct - and misleading - in that the real cause of the error was something else entirely.
Django has a urls.py file which defines a set of regular
expressions for URLs, and the appropriate action to take when receiving
each one. So you can set up r'/poll/(?P
url(r'/poll/(?P
And then in your templates you can say:
<a href="{{ url poll_view_one poll_id=poll.id }}">{{ poll.name }}</a>
Django will then find the URL with that name, feed the poll ID in at the appropriate place in the expression, and there you are - you don't have to go rewriting all your links when your site structure changes. This, to me, is a great idea.
The problem was that Django was reporting that "Reverse for 'portal.address_new_in_street' not found." when it was clearly listed in a clearly working urls.py file. Finally, I started playing around with the expression, experimenting with what would work and what wouldn't in the expression. In this case, the pattern was:
new/in/(?P
When I changed this to:
new/in/(?P
It suddenly came good. And then I discovered that the the thing being fed into the 'suburb_id' was not a number, but a string. So what that error message really means is "The pattern you tried to use didn't match because of format differences between the parameters and the regular expression." Maybe it means that you can have several patterns with the same name that will try to match based on the first such pattern that does so. But until then, I'll remember this; and hopefully someone else trying to figure out this problem won't butt their head against a wall for a day like I did.
Last updated: | path: tech / web | permanent link to this entry
So how do I feel as a Perl programmer writing Python? Pretty good too. There are obvious differences, and traps for new players, but the fact that I can dive into something and fairly quickly be fixing bugs and implementing new features is pretty nice too. Overall, I think that once you get beyond the relatively trivial details of the structure of the code and how variables work and so on, what really makes languages strong is their libraries and interfaces, and this to me is where Perl stands out with its overwhelmingly successfull CPAN and Python, while slightly less organised from what I've seen so far, still has a similar level of power.
About the only criticism I have is the way the command line option processing is implemented - Python has tried one way (getopt) which is clearly thinking just like a C programmer, and another (optparse) which is more object oriented but is hugely cumbersome to use in its attempt to be flexible. Neither of these hold a candle to Perl's GetOpt::Long module.
Last updated: | path: tech / web | permanent link to this entry
I solved this problem rather neatly by getting my code to write out the HDF object to a file, rsync'ing that file back to my own machine, and then test the template locally.
I knew that ClearSilver's Perl library had a 'readFile' method to slurp an HDF file directly into the HDF object, and a quick check of the C library said that it had an equivalent 'writeFile' call. So happily I found that they'd also provided this call in Perl. My 'site library' module provided the $hdf object and a Render function which took a template name; it was relatively simple to write to a file derived from the template name. That way I had a one-to-one correspondence between template file and data file.
Then I can run ClearSilver's cstest program to test the template - it takes two parameters, the template file and the HDF file. You either get the page rendered, or a backtrace to where the syntax error in your template occurred. I can also browse through the HDF file - which is just a text file - to work out what data is being sent to the template, which solves the problem of "why isn't that data being shown" fairly quickly.
Another possibility I haven't explored is to run a test suite against the entire site using standard HDF files each time I do a change to make sure there aren't any regressions before uploading.
Hopefully I've piqued a few people's interest in ClearSilver, because I'm going to be talking more about it in upcoming posts.
Last updated: | path: tech / web | permanent link to this entry
I wonder if there was anyone in Microsoft Internet Explorer development team around the time they were producing 5.0 that was saying, "No, we can't ship this until it complies with the standard; that way we know we'll have less work to do in the future." If so, I feel doubly sorry for you: you've been proved right, but you're still stuck.
However, this is not a new problem to us software engineers. We've invented various test-based coding methodologies that ensure that the software probably obeys the standard, or at least can be proven to obey some standard (as opposed to being random). We've also seen the nifty XSLT macro that takes the OpenFormula specification and produces an OpenDocument Spreadsheet that tests the formula - I can't find any live links to it but I saved a copy and put it here. So it shouldn't actually be that hard to go through and implement, if not all, then a good portion of the HTML standard as rigorous tests and then use browser scripting to test its actual output. Tell me that someone isn't doing this already.
But the problem isn't really with making software obey the standard - although obviously Microsoft has had some problem with that in the past, and therefore I don't feel we can trust them in the future. The problem is that those pieces of broken software have formed a defacto standard that isn't mapped by a document. In fact, they form several inconsistent and conflicting standards. If you want another problem, it's that people writing web site code to detect browser type in the past have written something like:
if ($browser eq 'IE') { if ($version <= 5.0) { write_IE_5_0_HTML(); } elsif ($version <= 5.5) { write_IE_5_5_HTML(); } else { write_IE_HTML(); } ... }When IE 7 came along and broke new stuff, they added:
} elsif ($version <= 6.0) { write_IE_6_0_HTML();It doesn't take much of a genius to work out that you can't just assume that this current version is the last version of IE, or that new versions of IE aren't necessarily going to be bug-for-bug compatible with the last version. So really the people writing the websites are to blame.
Joel doesn't identify Microsoft's correct response in this situation. The reason for this is that we're all small coders reading Joel's blog and we just don't have the power of Microsoft. It should be relatively easy for them to write a program that goes out and checks web sites to see whether they render correctly in IE 8, and then they should work together with the web site owners whose web sites don't render correctly to fix this. Microsoft does a big publicity campaign about how it's cleaning up the web to make sure it's all standard compliant for its new standards-compliant browser, they call it a big win, everyone goes back to work without an extra headache. Instead, they're carrying on like it's not their fault that the problem exists in the first place.
Microsoft's talking big about how it's this nice friendly corporate citizen that plays nice these days - let's see it start fixing up some of its past mistakes.
Last updated: | path: tech / web | permanent link to this entry
Suddenly I realised that I should do what wikis and most other good content management systems have done for ages - made URLs which reference things by name rather than number and let the software work it out in the background. Take the name for the set, flatten it into lower case and replace spaces with underscores; it would also be easily reversible. CDs might be a bit more challenging but there are only one or two that have a repeated name, and I'd have to handle such conflicts anyway at some point.
That combined with my planned rewrite of the site to use some sane HTML templating language - my current choice is ClearSilver - so that it's not all ugly HTML-in-the-code has given me another project for a good week or so of coding. Pity I'm at LCA and have to absorb all those other great ideas...
Last updated: | path: tech / web | permanent link to this entry
I had originally considered writing a Perl LWP [1] program that performed a request to edit the page, with my credentials, but I figured that was a ghastly kludge and would cause some sort of modern day wiki-equivalent of upsetting the bonk/oif ratio (even though MediaWiki obviously doesn't try to track who's editing what document when). But then I discovered MediaWiki's Special:Export page and realised I could hack it together with this.
The question, however, really comes down to: how does one go about taking a manual written in something like MediaWiki and producing some more static, less infrastructure-dependent, page or set of pages that contains the documentation while still preserving its links and cross-referencing? What tools are there for converting Wiki manuals into other formats? I know that toby has written the one I mentioned above; the author of this ghastly piece of giving-Perl-a-bad-name obviously thought it was useful enough to have another in the same vein. CPAN even has a library specifically for wikitext conversion.
This requires more research.
[1] - There's something very odd about using a PHP script on phpman.info to get the manual of a Perl module. But it's the first one I found. And it's better than search.cpan.org, which requires you to know the author name in order to list the documentation of the module. I want something with a URL like http://search.cpan.org/modules/LWP.
Last updated: | path: tech / web | permanent link to this entry
Of course, there were still obstacles. CGI::Ajax's natural way of doing things is for you to feed all your HTML in and have it check for the javascript call and handle it, or mangle the script headers to include the javascript, and spit out the result by itself. All of my scripts are written so that the HTML is output progressively by print statements. This may be primitive to some and alien to others, but I'm not going to start rewriting all my scripts to pass gigantic strings of HTML around. So I started probing.
Internally this build_html function basically does:
if ($cgi->param('fname')) { print $ajax->handle_request; } else { # Add the <script> tags into your HTML here }For me this equates to:
if ($cgi->param('fname')) { print $ajax->handle_request; } else { print $cgi->header, $cgi->start_html( -script => $ajax->show_javascript ), # Output your HTML here ; }I had to make one change to the CGI::Ajax module, which I duly made up as a patch and sent upstream: both CGI's start_html -script handler and CGI::Ajax's show_javascript method put your javascript in a <script> tag and then a CDATA tag to protect it against being read as XML. I added an option to the show_javascript method so that you say:
$cgi->start_html( -script => $ajax->show_javascript({'no-script-tags' => 1}) ),and it doesn't output a second set of tags for you.
So, a few little tricks to using this module if you're not going to do things exactly the way it expects. But it can be done, and that will probably mean, for the most of us, that we don't have to extensively rewrite our scripts in order to get started into AJAX. And I can see the limitations of the CGI::Ajax module already, chief amongst them that it generates all the Javascript on the fly and puts it into every page, thus not allowing browsers to cache a javascript file. I'm going to have a further poke around and see if I can write a method for CGI::Ajax that allows you to place all the standard 'behind-the-scenes' Javascript it writes into a common file, thus cutting down on the page size and generate/transmit time. This really should only have to be done once per time you install or upgrade the CGI::Ajax module.
Now to find something actually useful to do with Ajax. The main trap to avoid, IMO, is to cause the page's URL to not display what you expect after the Javascript has been at work. For instance, if your AJAX is updating product details, then you want the URL to follow the product's page. It should always be possible to bookmark a page and come back to that exact page - if nothing else it makes it easier for people to find your pages in search engines.
Last updated: | path: tech / web | permanent link to this entry
On reading their article I get the impression that they think that this is both a hitherto-unknown phenomenon and one which is still baffling web developers. This puzzles me, as even a relative neophyte such as myself knows how to make these documents available to search engines: indexes. All you need is a linked-to page somewhere which then lists all of the documents available. This page doesn't have to be as obvious as my Set Dance Music Database index - it can be tucked away in a 'site map' page somewhere so that it doesn't confuse too many people into thinking that that's the correct way to get access to their documents. However, don't try to hide it so that only search engines can see it, or you'll fall afoul of the regular 'link-farming' detection and elimination mechanisms most modern search engines employ.
Of course, being a traditionalist (as you can see from both the content and design of the Set Dance Music Database) I tend to think that lists are still useful, at least if kept small. And I do need to put in some mechanisms for searching on the SDMDB, as well as a few other drill-down methods. So giving your people just a search form alone may not be catering to all the methods people employ when finding content. Wikis have realised this years ago - people like interlinking. And given that these 'deep web' documents are still accessible via a simple URL, if you really need to you can assist the search engines by creating your own index page to their documents by basically scripting up a search on their website that then puts the links into your index, avoiding listing duplicates.
So the real question is: why are the owners of these web sites not doing this? We may just need to suggest it to them if they haven't thought of it themselves. The benefits of having their documents listed on Google are many - what downsides are there? I'm sure the various criticisms of such indexing are mainly due to organisational bias and narrow-mindedness, and can either be solved or routed around.
There are two variants of this that annoy me. One is the various websites where the only way to get to what you want is by clicking - no direct link is ever provided and your entire navigation is all done through javascript, flash or unspeakable black magic. These people are making it purposefully hard for you to get straight to what you want, either because they want to show you a bunch of advertising on the way or because they want to know exactly what you're up to on their site for some insidious purpose. There is already one Irish music CD store online that I've basically had to completely ignore (except for cross-checking with material on other sites) because there is no way for me to refer people directly to a CD. I refuse outright to give instructions such as "go to http://example.com and type in the words 'Tulla Ceili Band' in the search box", because that's not good navigation.
The other type of annoyance I find ties in with this: it is the practice of making a hidden index, or a privileged level of access, available to search engines that normal people don't see. I've seen a few computing and engineering websites do this, and Experts Exchange is particularly annoying for it: you can google your query and see an excerpt from the page with the question but when you go there you find out that access to the answers requires membership and/or payment. This, as far as I'm concerned, is just a blatant money-grabbing exercise and should be anathema. Either your results are free to access, or they're not - search engines should not be privileged in that respect.
Last updated: | path: tech / web | permanent link to this entry
Dear people,
On Wednesday the 28th of February, a user from your address dsl.dynamic81213236104.ttnet.net.tr made two edits to our Wiki. You can see the page as changed at http://mabula.net/rugbypilg/index.cgi?action=browse&id=HomePage&revision=18, including the above address as the editor. Your client is obviously defacing our and other sites like it, which is probably against your terms of service. In addition, they are too lame to be on the internet. Please take them off it so that they do not do any further damage to themselves and others.
We have reversed their changes and our site is back to normal.
Yours sincerely,
Paul Wayper
Last updated: | path: tech / web | permanent link to this entry
For a while now, I've done a certain amount of checking that the lie submitted meets certain sanity guidelines that also filter out a lot of comment spam. In each case, the user is greeted with a helpful yet not prescriptive error message: for instance, when the lie contains an exclamation point the user is told "Your lie is too enthusiastic". (We take lying seriously at Dave's Web Of Lies.) This should be enough for a person to read and deduce what they need to do to get a genuine lie submitted, but not enough for a spammer to work out quickly what characters to remove for their submission to get anywhere. Of course, this is violating rule 1 above: spammers don't care if any number of messages get blocked, so long as one message gets through somehow.
This still left me with a healthy chunk of spam to wade through and mark as rejected. This also fills up my database (albeit slowly), and I object to this on principle. So I implemented a suggestion from someone's blog: include a hidden field called "website" that, when filled in, indicates that it's from a spammer (since it's ordinarily impossible for a real person to fill any text in the field). Then we silently ignore this field. No false positives? Sounds good to me.
Initial indications, however, were that it was having no effect. I changed the field from being hidden to having the style property "display: none", which causes any modern browser to not display it, but since this was in the stylesheet a spammer would have no real indication just by scraping the submit page that this field was not, in fact, used. This, alas, also had no effect. I surmised that this was probably because the form previously had no 'website' field and spammers were merely remembering what forms to fill in where, rather than re-scraping the form (though I have no evidence for this). Pity.
So my next step was to note that a lot of the remaining spam had a distinctive form. The 'lie' would be some random comment congratulating me on such an informative and helpful web site, the 'liar' would be a single word name, and there was a random character or two tacked on the lie to make it unlikely to be exactly the same as any previous submission. So I hand-crafted a 'badstarts.txt' file and, on lie submission, I read through this file and silently ignore the lie if it starts with a bad phrase. Since almost all of these are crafted to be such that no sane or reasonable lie could also start with the same words, this reduces the number of false positives - important (in my opinion) when we don't tell people whether their submission has succeeded or failed.
Sure enough, now we started getting rejected spams. The file now contains about 36 different phrases. I don't have any statistics on how many got through versus how many got blocked, but that's just a matter of time... And I'm probably reinventing some wheel somewhere, but it's a simple thing and I didn't want to use a larger, more complex but generalised solution.
I'd be willing to share the list with people, but I won't post the link in case spammers find it.
I really want to avoid a captcha system on the Web Of Lies. I like keeping Dave's original simplistic design, even if there are better, all-text designs that I could (or perhaps should) be using.
Last updated: | path: tech / web | permanent link to this entry
What INFURIATES me beyond measure is the way the people who run the domain registers then cash in on any businesses' past success by installing a copy-cat templated redirector site that earns them a bit of money from the hapless people who mistake it for the real thing. They're getting good too: it was so well layed out it took me several moments to work out that there was nothing actually useful on the site. Previous attempts I've seen have been pretty much just a bunch of prepackaged searches on the keywords in your previous site listed down the page, with a generic picture of a woman holding a mouse or going windsurfing (or for the more extreme sites going windsurfing holding a mouse). Now it's getting nasty.
It's not good enough that these domain registrars take money for something they've been proven to lose, 'mistakenly' swap to another person, revoke without the slightest authority, fraudulently bill for, and costs them nothing to generate. They they have to leech off the popularity of any site that goes under, not only scamming a few quick thousand bucks in the process but confusing anyone who wanted just a simple page saying "this company is no longer doing business". There must be something preventing this from happening in real life - businesses registering the name of a competitor as soon as they'd closed, buying up the office space and setting up a new branch. Except that there'd be some dodgy marketing exec handing them money for every person who wandered in and asked "Is this where I get my car repaired?". This sounds criminal to me.
Last updated: | path: tech / web | permanent link to this entry
Looking for information and found it in this great site... - Jimpson.
Thank you for your site. I have found here much useful information... - Jesus.
The irony is that I don't know whether to include them because they are, indeed, genuine lies. But, on principle, I reject them. It's not as if liars get linkbacks on DWOL anyway...
Last updated: | path: tech / web | permanent link to this entry
Last updated: | path: tech / web | permanent link to this entry
I know other people on campus use Ubuntu. I know about http://debian.anu.edu.au, although I haven't configured my Ubuntu installation to use it as a source. I personally think it makes the Internet a better place to get your new and updated packages from the closest mirror you can. If your ISP has a mirror, then definitely use that because it almost certainly won't use up your download gigabytes per month quota.
So imagine if there was a system whereby users could submit and update yum and apt-get configurations based on IP ranges. Then a simple package would be able to look up which configuration would apply to their IP address, and it would automatically be installed. Instantly you'd get the fastest, most cost-effective mirrors available. You could probably do the lookup as a DNS query, too. It'd even save bandwidth for the regular mirrors and encourage ISPs to set up mirrors and maintain the configurations, knowing that this bandwidth saving would be instantly felt in their network rather than relying on their customers to find their mirrors and know how to customise their configurations to suit.
Hmmmm.... Need to think about this.
Last updated: | path: tech / web | permanent link to this entry
But, the temptation to carry things too far has always been strong within me. So, of course, the flashing cursor at the end of the page wasn't good enough on its own: I had to have an appropriate command come up when you hovered over the link. After a fair amount of javascript abuse, and reading of tutorials, I finally got it working; I even got it so that the initial text (which has to be there for the javascript to work) doesn't get displayed when the document loads.
Score one for pointless javascript!
Last updated: | path: tech / web | permanent link to this entry
At the moment the site is more or less working. You can subscribe anew, or existing subscribers can log in, and see their details and change things as necessary. Email addresses will be confirmed before actually allowing a change. The only thing that doesn't exist yet is the ability to take money for the subscription. Enter the stumbling block.
Up until now I've been intending to use PayPal to collect money, but the key thing holding me back is the paucity of actual working code to do the whole online verification of payment thing. Maybe I'm more than usually thick, but I find the offerings on PayPal's website and on CPAN to be hard to understand - they seem to dive too quickly into the technical details and leave out how to actually integrate the modules into your workflow. I'd be more than glad to hear from anyone with experience programming websites in Perl using PayPal, especially PayPal IPN. I obviously need time to persevere.
Now Google is offering their Checkout facility, and I'm wondering what their API is going to look like. Is it going to make it any easier to integrate into my website than PayPal? Is the convenience of single-sign-on billing going to be useful to me? Should I wait and see?
Last updated: | path: tech / web | permanent link to this entry
All posts licensed under the CC-BY-NC license. Author Paul Wayper.
Main index
/ tbfw/
- © 2004-2023
Paul Wayper
Valid HTML5