Archive for August, 2009

29
Aug

My Bad

Last week when I shit out a bunch of new code I made a teensy weensy mistake – I forgot to chmod a key script as executable.  None of the newly discovered (“pending”) proxies were processed the entire week.

DOH!

That was why the number of pages (proxy count divided by the page size) was wrong.

Fuck.

So simple.  So foul.

Had I not been hammered with actual WORK at my little security cubicle job all week, I would have found it sooner (it took all of five minutes to debug).

That in itself is a whole ‘nother tale of woe (thankfully not a security incident – if we had one of those we’d probably go into serious meltdown mode).

23
Aug

HTTP_VIA 1.0 cache-mex-popocatepetl-1

Apparently, nearly every open proxy in Mexico goes through this box.  Maybe that’s why they named it after a volcano.

Yes, I admit it.  Since yesterday I’ve been resurrecting dead proxies right and left, entertaining myself by watching the HTTP headers fly by (I am easily amused).  And going through the list I have noticed a few things.

For one, the “Proxies of 2008″ are dying off, which is to be expected.  There are only about 60 or so left.

Second, this month, August 2009, has been a great month for proxies!  Twelve full pages worth.  August has also been a great month for Mexican proxies, with most of them discovered on the 15th.  Most of them are going through the server in the title of this article.  It’s a shame I never bothered to put the VIA header in the database.  There’s a lot of good information there, lost forever.  Oh, well.

August 2008 was also a good month for proxies.  It was when we hit our first 1,000,000 boxes.  And if anyone besides me remembers back then, September 2008 was extremely dry.  Things did not pick up again until February of 2009.

Thirdly, the sites I mentioned yesterday as running “TeamViewer” all turned out to be a single IP address with 200 open ports (and probably more, since I started dropping any port less than 80 over a year ago).  And although they were running TeamViewer yesterday, all of them are open proxies today (so far, 40 out of the 200 open ports are now reporting as Transparent proxies in the list – I just haven’t hit them all yet).  Which means they’ll probably be dead tomorrow.

Here is the whois information on that particular IP:

teamviewer

Note they assigned a netname to a single IP address.  This seems unusual to me, but it could be a common practice (in fact this ISP does it a lot, if you look here and search for “SXTY”).  This probably at least partially explains the “here today, gone tomorrow” nature of Chinese proxies.  But the tie-in with “This site is running TeamViewer” is still a head-scratcher.  WTF is up with that?

I did solve my issue with the proxy judges that don’t return an Http-Referer header.  They still all return the User-Agent header, so I use that as well.  Still, I have a nagging suspicion that some proxy judges are lying to me.  I would use my own (yes, I have one of my very own design), but I have found that a lot of servers in foreign countires have difficulties resolving mrhinkydink.com.  Perhaps it may be useful as a backup of last resort for High Anon proxies (all the judges I use identify Transparent and Anonymous proxies faithfully).

Web site/false proxy detection (“Offline/WEB” in the database) is now rock solid.  I used to depend on the headers returned, but you will often get an “HTTP 200 OK” result for a login page instead of the expected (some RFC dweebs would say “correct”)  ”HTTP 403 Forbidden” result, followed by an “HTTP 302 Object Moved” to the login page.  You can’t expect Web developers to play by the rules, since they’re morons.

Other Tidbits

I keep checking the Israeli obfuscator site manually, although it has been removed from the code.  The name still resolves but the site times out and the nmap still shows port 80 as “filtered”.

I added “Cameroon’s favorite proxy list” under the page title, just for the Hell of it.

We added another 100,000 proxies to the database, hitting the 2.1 million mark this week.  And there’s more than 107,000 proxies in the “gold” table (address/ports that are or were open proxies since March 2008).

Something isn’t right with the code.  Right now there are 1140 proxies, but only 19 pages.  At fifty per page there should be 23 pages.  Come to think of it, this probably explains the missing proxies from 2008.

Unfortunately, I haven’t looked at the forum spammer’s reporting site since I started mirroring it.

Remember the Canadian hospital proxies?  They’re all gone now.  Someone figured it out and fixed it.  Good for them!

22
Aug

Slightly Overboard

Since the recheck, resurrect, and discover code is all pretty much in line after todays’ hacks I launched a HUGE resurrection batch.

Right now there are over 1200 checked proxies in the list and they keep coming back from the dead.

That’s 23 pages worth. The CoDeeN’s topped out at about 350 (there were just over 80 this morning).

It will all get knocked back before tomorrow morning, but the “short list” probably won’t be so short this time around.

22
Aug

“This site is running TeamViewer” and Other CRAP

The code is getting much better at detecting non-proxies with the Referer hack, but as my code gets better I keep stumbling across crud in other people’s code.

Since twiddling the CoDeeN detection, I have been making small resurrection runs on the “gold” database, pulling 500 “dead” proxies at a time and checking the results. Interestingly, I’m getting somewhere between 1% and 2% worth of good hits, which is very typical of even the crappiest proxy lists found in the wild (with the possible exception of ProxyCemetary, which must be the worst proxy list of all time).

I discovered that two of my proxy judges (so far) don’t bother to return the Http-Referer header, which is fine for Transparent and Anonymous proxies, but sucks ass when it comes to High Anon servers. So, out they go. All of the other judges seem fine.

I’m not sure why you would bother not returning that field, although most of these pages don’t know they’re proxy judges.

Then there’s shit like this. Don’t worry, it won’t bite.

If you Google the IP, it’s in proxy lists everywhere, over 1500 hits worth.

How the Hell does that get into proxy lists? Maybe it is a proxy if you have a login (I haven’t tried brute forcing a login), but it’s useless if you don’t.

Then there’s “This site is running TeamViewer.” This is usually on a variety of ports on the same IP address. Pure crap. There are hundreds of them. Check out this nonsense from their “Security Statement“:

International top corporations from all kinds of industries (including such highly sensitive sectors as banks and other financial institutions) are successfully using TeamViewer.

And their IP addresses are on proxy lists?

DOH!

22
Aug

Stupid CoDeeN Tricks

I bumped into an unintended side effect of using the Http-Referer as a definitive proxy marker. It seems it has killed off most of the CoDeeN proxies, which do not return an Http-Referer.

Of course, they do, but the only thing I use to detect CoDeeN servers is the infamous “(Not Really) Welcome” page they shove in your face when you first connect. After ten seconds they give you the page you requested (which, being a proxy judge page, does have the Referer), but by then I’ve closed the connection and moved onto the next proxy anyway.

And here’s an interesting tidbit about CoDeeN: they keep track of who they think you are by looking at your IP address and User-Agent. If they have seen your IP+UA before within a certain timeout period, you don’t get the welcome page.

For some reason they seem to think the IP+UA is a unique identifier (probably a hash of some sort). Since I can’t change my IP, I guarantee the User-Agent is always unique by stuffing it with random numbers. Microsoft has made this a piece of cake by adding the “.NET CLR [1-3].[0-9].xxxxx.[0-9]” extensions to the User-Agent field of Internet Explorer. Since there are no less than three versions of dot-NET compatibility (this week) plus multiple versions of IE and Windows platforms there’s a lot of room for randomness in those numbers.

This also gives you a modicum of plausible deniability with CoDeeN when they call the cops on you, since it’s highly improbable than any of those User-Agent values are actually valid (and they obviously don’t check).

Since there are often multiple ports on any CoDeeN server and servers at the same location with different IP addresses probably use the same backend IP+UA database, you have to change the User-Agent every time you test or you will get an invalid result (that is, it will not be an obvious CoDeeN server).

Even though no one gives a rat’s ass about CoDeeN servers, I thought it was important enough to fix. You really should be aware of them, especially if you want to avoid them. That’s why it’s important to detect them properly.

Still, I can’t help wondering if there’s some server room Trevor at PlanetLab scratching his head over all those weird MSIE User-Agents that have shown up in the database over the last year…

16
Aug

Another Dead Proxy List

The latest javascript-obfuscated proxy list has apparently died, which is kind of a shame because it was good for a few proxies a day while it was up.

Plus, de-obfsucating it was good for lulz.

I’m not quite certain when that happened, but the other day we had a power blip and as a result the 3PM harvest didn’t run.  I noticed it was gone after I ran the script manually.  Every page timed out.

I did quite of bit of Googling to see if it was really down or whether they had simpky blocked my IP (it happens, and when it does I just use a proxy).  It looks like it was heavily promoted from December of  ’08 to May of this year.  In fact a few “respectable” (relatively speaking) sites have praised (trolled?) it since the beginning of the year.  The domain name still resolves, guerrilladns.com says it’s the only DNS name on that IP, and nmap says port 80 is “filtered” but I’m getting nothing from it.

So out it goes.  One positive side effect of it going down was I found another site with a good list.  This site also had a very nice proxy-harvesting shell script on it.  In fact he hits the same sites I hit.  And it’s enlightening to see how someone else would do it.  I get a little down and dirty when it comes to sites that require cookies, but he had a much more elegant approach.

My vacation is over.  The List has been running on full auto for two weeks now.  The purge is working right and the resurrection kicks in as expected when the list gets too short.  However, I still need to work on the Referer hack.

Which is good because I’ll need something to do at work this week.

lulz

05
Aug

Forum SPAMMER Site Mirrored

I used GNU wget to mirror the forum SPAMMER’s site and ended up with about 3G of data. I’m going to mirror it every day to see if anything new pops up. After the first run it gets easier, since wget only downloads new and changed files.

As far as the proxies go, it was just as I suspected. Out of 2879 unique proxies (from 268M of log data), not including SOCKS, there was nothing but TIMEOUT and CLOSED proxies. Not one single live box. Probably 85% were already in the database, but that’s just an eyeball estimate.

There were some hints at the scams this guy is running SEO on, along with some telling graphics. It was a good education on these guys, but I’ve barely scratched the surface. I have a lot going on today and I’m going to be very busy until next week, so I’ll save my findings for later.

04
Aug

Google Hack: Not Dead Yet

No sooner do I declare the Google Hack played out, it turns me into a liar.

Shortly after my last post, I ran the Hack one more time before putting it out to pasture. A few minutes into the run, it came up with one of “those” sites. It seems to have every kind of scam on the Net (lose weight, make money from Google/Craigslist, payday loans, etc.) and some type of blackhat SEO angle. There are dozens of text files listing “users” of various online forums, the accounts they use, URLs for their profiles, and…

… the proxies they use to post.

I ran a few Google searches on some of these names and they are absolutely everywhere. There’s hardly an online forums that hasn’t been hit by these accounts so it’s definitely a forum or comment SPAM operation of some sort.

I’m not sure if this is good news or bad news. I’ve done a few runs against what they’ve got and so far it appears they don’t have anything that isn’t already in my database, although I am getting a few new TIMEOUTs and port CLOSED results. In other words, dead crap.

Anyway, there is so much stuff here that I’m going to mirror the site and bang at it at my leisure. It’s well-indexed by Google, so it’s no Big Secret, but the last time I ran across a site like this it didn’t last more than a couple of months. And the information is so… varied… that an offline study is the best way to mine it.

Check back in to see how it goes!

04
Aug

Proxy List Going On Full Auto

Seventeen months and two million proxies later, I think I finally have this proxy business sorted out.

I’m pretty sure have everything I can possibly get out of the Google Hack and I’m hitting every proxy list that has anything to offer. The recent hack of the proxy recheck code (with the addition of an HTTP Referer header – from of a selection of over 40,000 random URLs) made a big, BIG difference. The last recheck only carved the list down by a third, so it’s finally going into the daily cron schedule.

Previously, the recheck hacked the list down by two-thirds, to about 250 or so non-CoDeeN servers (and usually taking a chunk of CoDeeNs with it, but who cares about those anyway?). Running the resurrection code on the dead proxies generally doubled that number.

Now, at 11:15AM EST every day, whenever there are more than 925 “live” proxies, the recheck code will kick in and purge the dead ones. At 12:15PM, if there are less than 275 in the list, the proxy resurrection code will trigger to beef up the list.

The net result should be a decent equilibrium with about 400-650 LIVE proxies at any given time.

Although the Google Hack has run its course, there is still the possibility that new proxy lists will pop up and old lists will die, so I’m going to run it about once a month. Right now I run it every other week and I’m getting one or two new proxies for the trouble.

After that I may move it from the mrhinkydink.com domain and put it here on ProxyObsession for good. I’m already looking into my hosting options to put everything here anyway. That may happen next week.

What happens after that? IPv6 (Internet Protocol version 6) is coming sooner or later, with a long period of IPv4 backward compatibility required. 4-to-6 proxies are going to be with us for a long time.

04
Aug

Hinky’s Linkbait Policy

OK, listen up. Blog SPAM like this is VERBOTEN:

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Give me a break. I’m not that stupid.

If you’re going to linkbait me, at least try to stay on topic, like this guy, who is obviously a master linkbaiter.