Archive for July, 2008

29
Jul

Speed Still Screwed Up

Somewhere my speed calculations are messed up, but after the daily check-and-purge the numbers should be correct.  I’m having a heck of a time tracking this down and it happens whenever I start spreading code across multiple machines.

Bahrain Proxy Madness is spreading this week.  Cities other than Manama are showing up, the most interesting (to me at least) being “Isa Town”.

Interesting because “ISA” is Microsoft’s Internet Security and Acceleration (ISA) Server, which is a proxy server.  How fitting

But according to the VIA headers I’ve seen they’re using NetCache servers, not ISA.  They must be refurbs, since NetworkAppliance sold that line off to BlueCoat back in 2006.

28
Jul

It must be Monday

Why?

Because the purge ran and one third of the proxies are gone.

Also, this is the first run with the fixed speed calculations.  Those are finally back to normal.  You may recall I upped the TIMEOUT to 45 seconds (from 30) last week.  Besides screwing up all the speed values it helped to add to the total proxy count.  There are a few agonizingly slow proxies listed in there but they’re not the majority and they might be of use to somebody, somewhere.

I have been using the DualCore AMD64 (my Mythbuntu system) all weekend for the Google Runs because it’s just so darned fast.  I can run about 70 database checks per second on it, even with the database on the other end of the network and on a VM.  I may in fact turn the TIMEOUT up to 60 seconds and start retesting some old data. 

You may well ask “How does slowing down a fast machine  get more work done?” 

It’s all in the forks, boys and girls.  The system can fork more processes even though they’re only waiting for a TIMEOUT.  The AMD64x2 has more RAM and more cycles to dedicate to that.  The VM can’t touch it in that regard. 

In fact, I’m just about Googled out.  With the faster machine doing all the Google Hacking I’m getting more and more dry runs.  Of course, this whole business is cyclical (look at Bahrain for instance), so just taking a break for a day or two is probably a good thing.

At least one of my “Proxy Judge” sites decided it was a REAL proxy judge after all and changed its format to be helpful.  In so doing it turned itself into a wothless proxy judge, at least as far as I’m concerned.  As a result, there are more “Undefined” servers than is usual.  That site is going out of the proxy judge rotation permanently.

This week I will be taking a closer look at the “Undefined” sites to see if I can get rid of them once and for all.

27
Jul

It must be Sunday

Why?

Because Bahrain’s back again.

They’ll all be gone by Friday and then the cycle will start all over.

27
Jul

460K Results

The 460K Random Run has completed – faster than I anticipated – and the results are in

  • CLOSED PORTS: 441,497
  • DUPE ENTRIES: 5,532
  • NEW PROXIES: 35

Is that pathetic or what?  Of the new proxies most were end-user type DSL or cable systems in South America, Poland, or Spain (judging by the FQDN).

Here is the interesting part: the 431K hosts with “CLOSED” ports are live hosts.  Maybe they were proxies last week.  Maybe they’ll be proxies next week.   Maybe they are simply IP addresses that have changed hands via DHCP.

This is also the reason it ran faster than I expected.  It was programmed to bypass any testing on closed ports and just go to the next one.

I did a random sampling (nmap) of a few addresses and found – I hate to say it again – “interesting” results.  One address was 100% filtered.  The next had a single (non-proxy) port open.  One had MySQL, VNC, NetBIOS, and HTTP ports open.  That one smelled like a honeypot.

Very curious.  And someone went to a lot of trouble to compile that list.

26
Jul

The 460K Random Run

I ran across this unsort utility earlier in the week and it was perfect for the 460K Run (the proxy list from the “Interesting Site”).

Perfect because there were thousands of dupes and in order to get rid of them the file had to be sorted for unique entries.  After it was sorted it was, well… sorted.  Testing all those ports sequentially is simply bad form.  It sends warning signs to both my ISP and the remote ISP that Something’s Up.  Randomized, it’s just so much background noise to the remote ISP.

Locally it doesn’t really help with my ISP, but I’ve been doing this for three months and they don’t seem to care, although it is an exponential increase in activity.

I figure this should take about 5000-7500 seconds running on the DualCore AMD64 box.  It’s been running for about ten minutes and the vast majority of the ports are closed.  The ones that are “open|filtered” (per nmap) are already in the database (whether they’re active or not, regardless of how they’re listed in the database, will be determined in a future run).  So far there are no open ports I don’t already have, but this is just the beginning.

I have a feeling this is a meaningless exercise in futility, but I have to get this list behind me.

It’s an obsession.

Speaking of which I have recently registered the domain names proxyobsession.com and proxyobsession.net.  If you go to either one you will end up at The List for now.

***UPDATE 10:40AM***

The 460K Random Run has been running smoothly for three hours and I’m getting approximately 3-4 new live, open proxies per hour.  That may not seem like a lot (and it isn’t) but it’s about what I expected.

26
Jul

Trashed Pages

If you dropped by last night you may have noticed some of the pages were broken.  This is a recurring problem I am having with GoDaddy’s hosting service.  In fact, if you Google…

426 connection closed godaddy

You can read all about it.  The first two hits will be me.

25
Jul

Bahrain

Easy come, easy go.

After this morning’s purge there wasn’t a single Bahrainian flag left in The List.  Not one.  

There was a bit of a bug in the page code and a lot of proxies added since last night showed up with a negative speed.  I upped the timeout by 50%, from 30 seconds to 45 maximum, but missed one calculation.  Every run after 10AM today is correct.

Why increase the timeout in the first place?  Because it’s an international list.  It may take the system here in the USA 38 seconds to get a page from a proxy in Zimbabwe, but a user in Kenya may get it in 5.  You never know.  Plus, it boosts the proxy count and since the daily purge is so damned effective these days I need all the data I can get, even though I’m getting a lot of data.

I have enlisted my Mythbuntu system for some grunt work.  AMD64 DualCore, 2G of RAM, and lots of cycles to spare when I’m not watching TV (plus, after I upgraded to Ubuntu v8.04 MythTV is broken anyway… I need to work on that).  It is a lot more capable than the VM that has been running the show and I can get a lot more done.

24
Jul

300K Milestone!

The Google Hack took us to a quarter million and has now kicked us up to more 320,000 IP:ports in the database.

That’s 100K in a week.  And without  dipping into the 460K proxies from the “Interesting Site” I found earlier this week (I’m still trying to figure out how to crack that nut without pissing off my ISP).

As usual, the good proxies go as fast as they come.  Some kind of cosmic proxy equilibrium going on there.  If things go as usual by the end of tomorrow’s purge we’ll be back down to 150 working proxies.

Why do they light up and go dark so fast?  Good question.  A lot of it has to do with the Bahrainian proxies, since they’re the biggest block.  Once Bahrain Telecom gets their act together we may get a better picture of what’s going on in the wild.

23
Jul

Interesting Site Part 2

In total, the site I mentioned yesterday had over 450,000 proxies tucked away in text files. I thought a long time about adding that stuff to the database, but on closer inspection it looked like mostly junk.

I know. I’ve said it a hundred times. The database has mostly junk in it already. I just can’t see tripling the size of it with this particular junk. There are far too many oddball ports for my taste. And there are IPs with 4 or more different ports listed. No, it just doesn’t look right.

I had an idea to just run through it and find any and all open ports in that list and to Hell with the rest, so I cooked up some quick bash kiddie scripts and ran with it… for about five minutes. It simply ran too fast. That kind of activity throws up red flags, so I shut it down and backed off. But still… it’s tempting. If the numbers I’ve run across are any indication, there could be anywhere from 300 to 600 live proxies in all that mess. I may chop it down into smaller files and give it another whack sometime. A slow, leisurely, measured whack. Or rather, whacks. Spread out over a few months. Sounds like a weekend project.

The other interesting thing about that site was 14 million email addresses stuffed into .RAR archives (I didn’t count them but the filenames themselves indicated the total numbers).

OK, so we have:

An “abandoned” Web site with…
14 million email addresses, and…
nearly half a million proxy addresses
Hmmm… ya think maybe there was some spamming going on here?

Those half a million proxies could have been a rented bot army, which would account for the oddball port numbers. The bot theory is good because I randomly tested a handfull and found live hosts with closed ports. And the ones I tested all had ISP type DNS names.

You certainly can find some peculiar things on the Intertubes!

22
Jul

Interesting Site

Once upon a time, when I was doing manual, ad hoc Google grazing, I ran across a list – actually a single text file – with over 70,000 proxies in it. Well, those went into the database long ago but today’s Google Hack hit it again, so I had to kill the run and twiddle the hack so that it will ignore the site from now on.

70,000 proxies, at this point, is over a quarter of the total database and to process them would just be so much wheel-spinning, even with the recent performance enhancements (I didn’t say… ugh… tweaks).

But this time I took a closer look at the site. I can only say it’s… very, very interesting.

It looks like an abandoned Web site, but there are relatively fresh files just sitting there and most of them are text files full of proxies.

I think I will pull down the entire site and put the other VM to work on them to see what happens.

BTW, the Google Hack snarfed down about 120 Bahrainian proxies before I killed it.