Dave's Free Press: Journal

violence, pornography, and rude words for the web generation


Recent posts


Recently commented posts


Journals what I read

geeky politics rant silly religion meta music perl weird drinking culture london language transport sport olympics hacking media maths web photography etiquette spam amazon film bastards books bryar holidays cars palm telecoms travel yapc bbc clothes rsnapshot phone environment whisky security home radio lolcats deafness curry art work privacy iphone linux bramble unix go business engineering kindle gps economics latin anglo-saxon money electronics
Tue, 21 Oct 2008

Thanks, Yahoo!

[originally posted on Apr 3 2008]

I'd like to express my warm thanks to the lovely people at Yahoo and in particular to their bot-herders. Until quite recently, their web-crawling bots had most irritatingly obeyed robot exclusion rules in the robots.txt file that I have on CPANdeps. But in the last couple of weeks they've got rid of that niggling little exclusion so now they're indexing all of the CPAN's dependencies through my site! And for the benefit of their important customers, they're doing it nice and quickly - a request every few seconds instead of the pedestrian once every few minutes that gentler bots use.

Unfortunately, because generating a dependency tree takes more time than they were allowing between requests, they were filling up my process table, and all my memory, and eating all the CPU, and the only way to get back into the machine was by power-cycling it. So it is with the deepest of regrets that I have had to exclude them.


[update] For fuck's sake, they're doing it again from a different netblock!

Posted at 17:35 by David Cantrell
keywords: geeky | meta | perl | rant
Permalink | 1 Comment

Can't you set an apache rewrite rule based on their UserAgent or something ? Just redirect them to a blank page.

Posted by vegiVamp on Wed, 22 Oct 2008 at 09:29:20

Sorry, this post is too old for you to comment on it.
