Archive for October, 2009

25
Oct

User-Agent: FuckYou 2.0

It has been one God-Awful slow month.

This morning I thought I’d hack around a bit with the code base.  I ran across a site that has apparently evaded my proxy harvesting for quite some time.  This time it wasn’t a JavaScript jockey, which I can handle, or an Ajax Asshole, which I can’t.

It was fucking Unicode.  Ugh.

All my fine Linux utilities barf all over Unicode, except for Links2, which I have generally abandoned in favor of curl and wget for reasons I can’t remember (you may recall html2text was trashed due to its insistence on adding character-backspace-character sequences that make no sense at all).  So I dusted it off and made a script just for this new site.

Surprisingly, there were three good proxies in that list.  Plus there were quite a few that were dead, but not in the database.

I’ll take what I can get, but this site pissed me off so bad I just had to add a “User-Agent: FuckYou 2.o” header to the script before I threw it into the 4AM daily run.

Now I’m spot-checking all my code for new tricks like this one.

15
Oct

Passive Proxy Lists In The Wild

You learn something new every day.  If not, you’re not doing it right.

Consider this Web page (it won’t bite – I think).

Isn’t this the most special thing you’ve ever seen?  It seems this Webmaster (or, more likely, Web “developer”) has decided to collect proxy data and he probably doesn’t even realize it.

This kind of shit happens all the time.  Some Web clown finds something kewl on SourceForge and decides he’s going to put it on every stinking two-bit PHP/MySQL site he touches.  In a week or so, he gets bored with it and forgets about it completely.  But he doesn’t remove it.  Ever. It runs and runs, collecting data forever.

This is not necessarily a Bad Thing™.  I’d never run it and I wouldn’t recommend that you run it, but these people are supplying us with some very valuable (and up to date) proxy information.

This is exactly how the proxy judges I use came into being.  A cool page somebody found,  ten fucking years ago. Copied, pasted, and deployed damn near everywhere, but not like this beauty.

This thing is BIG.  It is HUGE.  Do a Google search on it and you will get 60,000+ hits.  A lot of those hits are articles about an SQL injection bug or two in this little gem, but most will be hits on sites that are running it for site statistics.

Some of those sites aren’t places you want to click on, so tread carefully.  Some are just plain deadwebs that nobody ever hits, except for your occasional wandering GoogleBot.

I considered scraping these sites, but there’s hardly any point.  I can almost guarantee that any open proxies in all that data is already in my database.  Still, for the active sites running this crap, the proxy data is probably fresher than 99% of the proxy lists on the Web.  The trick would be finding those high traffic sites.

In case you’re wondering, I didn’t mention the name of the software on purpose.

11
Oct

Sloooooooooow Week

Normally we get about two purges per week, but there was only one last week and it’s been a slow, slow trudge ever since getting the numbers back up.

And now the Cameroonians are bitching that there aren’t any new UK proxies.

I keep trying to tell them I have no control over where the proxies come from.  After all, I don’t scan.  I harvest lists.  What goes around, ends up in the List.  If I actively scanned for proxies my ISP would shut me down overnight (I do all this work from the comfort of my 500 room mantion, you know).  In fact, as nasty as ISPs have gotten lately, I don’t think I could have started the list this year.  With 2.2 million proxies in the database, most of the grunt work involves checking to see if a “new” proxy listing is really new.  The days of checking 25,000+ addresses in a single run are long gone.

I ran the new extensions to the Top Ten List for a very short time.  I found some situations I had not anticipated and live proxies were being marked as dead.  This was not a huge problem since they were all Chinese proxies.  So I put that on the back burner and started frying other fish.

Most of those fishies are in my own little pond (at work), but one is very large indeed.

On Thursday I disclosed a minor vulnerability I discovered to the vendor involved.  I’m beginning to think I might have jumped the gun, but the wheels are in motion and I’m committed.  As usual, I’m doing some hard core research after the fact. One of two products is the culprit and I think I picked the right vendor, but I may be getting smacked down in the near future.

We’ll see how that little drama plays out.  I hate being wrong, but sometimes I just can’t keep my mouth shut.  If I am right, I will publish at the UT blog on November 8th.  If not, I’ll probably never mention it again!

06
Oct

Top Ten List Adds Possibilities

I have been going over the emails generated by the Top Ten List’s cron jobs and it’s getting me thinking.  So I’m just going to mull over a few ideas here.

A successful catch should update the “Last Checked” timestamp and the speed value in the database.  That’s easy enough.

Since I’m going for a faster proxy than usual, a “Timeout” doesn’t really count.  It may still reply given the full 45 seconds I normally use.  Do nothing with those, just wait for the next garbage collect (list purge).

Then there’s those “connection refused” results.  Update timestamp, zero speed, and mark as “Offline”.  It will fall off the list next time it’s published.

I’m getting 403s (“Authentication Required”) and 407s (“Proxy Authentication Required”) here and there, which, regardless of the timeout, indicates a worthless proxy – unless you like to pass the time hacking passwords.  I don’t.  So in those cases, update the “Last Checked” timestamp, set the speed to zero, and mark the proxy “Offline”.  The next time the list is generated, it’s gone.

I get an occasional 404 (“Not Found”).  This means nothing, since a lot of foreign (to me) proxies have a hard time resolving my domain name the first time anyway.   Do nothing at all with those since a purge will eventually take care of them.

I could reset the time requirement to the list’s default of 45 seconds, but that would take away from the intrinsic value of the Top Ten.  After all, they’re supposed to be good and fast.

Also, limit the search to proxies whose “Last Checked” value is less than curdate(), that is, proxies that haven’t been checked today.  That would cause the list to be searched more aggressively.  But the problem there is purge days, since they will have all been checked “today”.  And, “today’s” proxies would never make it to the Top Ten.

There’s probably a way around that, but I’d have to think about it for a while.

These are tiny little changes you’ll never notice, but it will make the list a bit more reliable.  And we want reliable proxies!

04
Oct

Hinky’s Top Ten In Production

The link to Hinky’s Top Ten Proxies will appear for the first time on the next page run, which will be at 4PM EDT.

Right now there’s a duplicate in the list but it should roll out by the time the full page is published.  Once that rolls out they should always be unique.  The code has been jiggered so that it will never test a proxy that’s already in the Top Ten list.  That doesn’t mean they’ll always be fresh.  It only means they’ll be unique.

Since it is a .txt file, you may be prompted to download it.  Or not.  It depends on your browser.

The most recently tested proxy will be at the bottom of the list, the oldest at the top, under the timestamp.  I’m not sure if it will work with SwitchProxy or not with that timestamp.  Apparently that little motherfucker gave up on supporting it.  It still doesn’t work with FireFox 3.5.

I hate depending on other peoples’ code.  They always end up letting you down.

04
Oct

Eating My Own Dog Food, Part II

I have a silly expansion on a couple of my UT99 servers that forwards player conversation to Twitter.  I realize that Makes No Sense, but I was bored and needed to practice my Scriddie Skillz.  It’s been running since August and has probably spit out about a quarter of a million tweets since then.

UT99 players are not that talkative, so I have a script that injects conversation into the game.  That is where 99.9% of the tweets come from.  With all that mindless jabbering, I hit Twitter’s hourly limits whenever the UT99 servers are busy.

Yesterday I decided to leverage the list to get around that.  Once every 15 minutes, a script queries the database and gets a proxy that:

  • Is not in the USA
  • Is not a CoDeeN proxy
  • Responds in less than 20 seconds

Then, the Twitter scripts use that proxy to send tweets.  It has been working well.  I check the proxies against My Own Personal Proxy Judge for functionality only (it doesn’t care about Transparent, Anonymous, etc.).  As a result I get a private list of working proxies.

I’m thinking I can leverage this private list as “Hinky’s Top Ten List”.  It should be fairly dynamic, since it finds a working server every 15 minutes.  Of course, since the list is only refreshed once every 120 minutes, it could get stale quickly, but I’m mulling that problem over and it should be a simple hack to keep it fresh.  Some may balk that it has no U.S. proxies, but fuck them anyway.  They won’t get pissed if I don’t mention it.

One drawback: it may get fairly repetitive.  Especially immediately following a purge, which tends to happen about twice a week.

Look for a link soon.  It will probably be along the lines of the CoDeeN lists, which are simple text files.

03
Oct

2.2 Million Proxies

Time marches on.  The List contracts and expands.  Proxies come and proxies go.  Hinky’s machines blink and churn, scouring the Web, witness to the progress of the Great Circle of Life.

My Saudi proxy came back to life, so I went back to it after the Polish proxy admins bought a clue.  I’ve never had a problem with their restrictions against “non-Islamic” Web sites and abhorrence of everything Russian.  They’re clueless of the non-Islamic sites I visit and it doesn’t seem like they’ll be getting a clue anytime soon.

It looks like the port 8085 Koobface proxies are all shut down now.  I’m sure another variant will come along sooner or later.

Today, the list is dominated by Brazil, with the US in second place, followed by Indonesia, India, and Germany.  Even though the Red Flag of China seems to dominate Page One most of the time, they are at a distant 21st place.

The reason being I don’t like Chinese proxies much and continue to double-check them constantly.  This is why the list expands and contracts during the day.

Every now and then I’ve been running the old Google Hack, but nothing new shows up.  I haven’t seen a decent new proxy list since the last JavaScript Joker came along.  The pendulum has swung heavily to the CGI proxy side, with everyone and his brother trying to make a dime with a proxy site.  From what I have read, that market is heavily saturated with noobz and retards.

Perhaps the looming Christmas Season will bring something new.

02
Oct

Technical Difficulties

The damned cable went out last night, so the list hasn’t been updated in over six hours. Rather than play catch-up, I’m just going to go with the flow.