Dave's Free Press: Journal

violence, pornography, and rude words for the web generation

 

Recent posts

(subscribe)

Recently commented posts

(subscribe)

Journals what I read

Search

geeky politics silly rant religion meta music perl weird drinking culture london language transport olympics media maths hacking sport photography etiquette web spam film bastards bryar holidays amazon palm telecoms bbc cars travel clothes rsnapshot books yapc phone whisky lolcats deafness home radio environment privacy iphone linux curry security art unix work go latin anglo-saxon business kindle gps bramble
Tue, 21 Oct 2008

Thanks, Yahoo!

[originally posted on Apr 3 2008]

I'd like to express my warm thanks to the lovely people at Yahoo and in particular to their bot-herders. Until quite recently, their web-crawling bots had most irritatingly obeyed robot exclusion rules in the robots.txt file that I have on CPANdeps. But in the last couple of weeks they've got rid of that niggling little exclusion so now they're indexing all of the CPAN's dependencies through my site! And for the benefit of their important customers, they're doing it nice and quickly - a request every few seconds instead of the pedestrian once every few minutes that gentler bots use.

Unfortunately, because generating a dependency tree takes more time than they were allowing between requests, they were filling up my process table, and all my memory, and eating all the CPU, and the only way to get back into the machine was by power-cycling it. So it is with the deepest of regrets that I have had to exclude them.

Cunts.

[update] For fuck's sake, they're doing it again from a different netblock!

Posted at 17:35:55 by David Cantrell
keywords: geeky | meta | perl | rant
Permalink | 1 Comment

Can't you set an apache rewrite rule based on their UserAgent or something ? Just redirect them to a blank page.

Posted by vegiVamp on Wed, 22 Oct 2008 at 09:29:20


Sorry, this post is too old for you to comment on it.

Archive