Tag Archive for 'bugs'

29
Mar

obfuscated-openssh BUGZ!!!!

Just my luck.

I spent the last four hours hunting down a bug in obfuscated-openssh that prevented it from working properly WITH PROXIES!!!!

But since I’m a Fucking Genius I nailed it.  And I wouldn’t call it a “bug” per se.  It’s more of an omission on the author’s part.  An oopsie, so to speak.

I will post the fix after I’ve done some more testing and when I can get some free time.

08
Sep

Junkbusting

I have finally starting clearing the junk out.  For example, since the beginning there have been about 20-30 Japanese entries in the list that were garbage.  They’re finally gone.

I also learned a lesson about wget that didn’t directly affect the list.  Under certain circumstances, if you get, say, a “403 Access Denied” response, wget will not store the page you would normally see in your browser.  This only affected the “Timeout” servers, but there is more junk to be found if there is a 302 or 304 redirect.

I exported all the non-CoDeeN proxies and used SwitchProxy, a FireFox plug-in, to check the junk factor.  There’s still a fair amount in there, but the next purge should take care of most of it.

It seems that Interesting Sites 1 and 2 are gone for good.  No more 75,000+ proxy imports.  I’m glad I got those when I could.  Curious Site is still supplying proxies, and of course I still hit the other lists every night (but they have nothing).  I’m running the Google Hack on and off but not getting much live data.  I’m going to keep hitting it because that’s where the Interesting Sites came from in the first place.  Somewhere, there’s an IS-3 out there.

 

06
Aug

Check and Recheck

Today I implemented the proxy re-check code.  There are over 11,000 proxies in the “gold” table that used to be alive but didn’t respond during the daily purge.

No response can be due to a number of things, including the proxy judge system being down.

A lot of proxies are actually coming back to life.  These are very, very encouraging results.  The first results will show up in this evening’s 8PM run, which will probably run late since the code will re-re-check all the resurrected proxies.

I will be interested to see if the Bahrainian proxies have resurfaced.

Earlier today, I “snuck in” to a very active Members Only proxy forum, using a password from www.bugment.com (highly recommended) and snatched a few thousand or so proxies from various recent postings.  The vast majority were already in my database.  Nothing new there.

While I was there I did some reading on proxy judges and it turns out this is a something of a cottage industry.  I had to laugh because I’ve been using free, absolutely unknown proxy judges for months now.  MONTHS!  Amazing.  These people are clueless.

True, they do disappear, but not as frequently as proxies, and when I throw the re-check into the schedule, that problem will take care of itself.

UPDATE 7:30PM

In typical Hinky Dink fashion I screwed up the speed ratings again.

So, to get things back to normal (-ish) I am running tomorrow’s purge early while I go back over the code.  By 8PM the first page should have the correct speed ratings.  It should finish running by 10PM.

Upside:  I added over 300 proxies with the re-checker.  Slightly over half the dead ones were checked, so there should be another 300 in there somewhere.

Downside: a lot of CoDeeN proxies have resurfaced.

03
Aug

Forum Mining

I found a nasty bug in the Google Hack.  I was going to call it “interesting” but I’ve been overusing the hell out of that word and I’m trying to strike it from my vocabulary.

I have been using links2 to get rid of html markup.  It works fine until you pipe it to a file.  All sorts of crazy, subtle things happen.

It will translate a “?” to %3f, “=” to %3d, etc. , which is fine until you subsequently pipe that back to wget, which does not translate it back.  So if you have a URL like…

index.php%3fshowtopic%3d54476%27st%3d50

which should  be

index.php?showtopic=54476&st=50

… wget then sends it verbatim and the remote site chokes with a 404 Not Found.

This behavior in links2 is not observed when it displays in a terminal, only when it’s piped to a file.

OK, nice catch.  It means there is a little life left in the Google Hack, since it has not been getting any forum data since it was hatched.  And there is tons of data in proxy forums (in fact the operators of such forums hate  being mined and you usually need to be registered – sucking out of Google’s cache can get around registration sometimes).

I was sure we were going to top out here pretty soon, but the database may make it to 400,000 rows yet.

I have my doubts about half a million, though.

BTW, it’s Sunday.  Will Bahrain make another appearance or has that particular pooch been screwed?

29
Jul

Speed Still Screwed Up

Somewhere my speed calculations are messed up, but after the daily check-and-purge the numbers should be correct.  I’m having a heck of a time tracking this down and it happens whenever I start spreading code across multiple machines.

Bahrain Proxy Madness is spreading this week.  Cities other than Manama are showing up, the most interesting (to me at least) being “Isa Town”.

Interesting because “ISA” is Microsoft’s Internet Security and Acceleration (ISA) Server, which is a proxy server.  How fitting

But according to the VIA headers I’ve seen they’re using NetCache servers, not ISA.  They must be refurbs, since NetworkAppliance sold that line off to BlueCoat back in 2006.

28
Jul

It must be Monday

Why?

Because the purge ran and one third of the proxies are gone.

Also, this is the first run with the fixed speed calculations.  Those are finally back to normal.  You may recall I upped the TIMEOUT to 45 seconds (from 30) last week.  Besides screwing up all the speed values it helped to add to the total proxy count.  There are a few agonizingly slow proxies listed in there but they’re not the majority and they might be of use to somebody, somewhere.

I have been using the DualCore AMD64 (my Mythbuntu system) all weekend for the Google Runs because it’s just so darned fast.  I can run about 70 database checks per second on it, even with the database on the other end of the network and on a VM.  I may in fact turn the TIMEOUT up to 60 seconds and start retesting some old data. 

You may well ask “How does slowing down a fast machine  get more work done?” 

It’s all in the forks, boys and girls.  The system can fork more processes even though they’re only waiting for a TIMEOUT.  The AMD64x2 has more RAM and more cycles to dedicate to that.  The VM can’t touch it in that regard. 

In fact, I’m just about Googled out.  With the faster machine doing all the Google Hacking I’m getting more and more dry runs.  Of course, this whole business is cyclical (look at Bahrain for instance), so just taking a break for a day or two is probably a good thing.

At least one of my “Proxy Judge” sites decided it was a REAL proxy judge after all and changed its format to be helpful.  In so doing it turned itself into a wothless proxy judge, at least as far as I’m concerned.  As a result, there are more “Undefined” servers than is usual.  That site is going out of the proxy judge rotation permanently.

This week I will be taking a closer look at the “Undefined” sites to see if I can get rid of them once and for all.