Archive for August, 2008

31
Aug

New Code On Deck

I spent all day hacking at the new page refresh code.

It’s going to be a winner.

I have one more page to make the old-fashined way and then I can switch over.

I did all my development on an old, reliable Ubuntu 6.06 LTS VM.  Since I usually develop on the AMD64x2, which uses special credentials on the production database, I had to make sure I didn’t screw it up.  I edited all the scripts to point to the VM’s copy of the database (it’s 10 days old) but just to be sure I didn’t miss anything I added some firewall rules to prevent the VM from talking to the production database.

And sure enough I didn’t get them all.  In fact what happened was I had a copy of everything in my /home folder, but I sudo’d into root without realizing I wasn’t in root’s folder.  I also neglected to give the VM a decent amount of memory and left the query limits at the level of the AMD64x2.

Double whammy.

I’ve never seen a session crash quite like that.  The OS killed all my processes including the root sessions when it maxed out.

I got the resource issue squared away and removed all the script copies in my /home folder and hammered it out.  There were a few hair-pulling bugs but by the third or fourth run of the page code it ran slicker’n shit.

The last old-fashined run just wrapped up.  The 4PM run will be on the new code.

31
Aug

Back On Schedule

We now have 1,089,613 proxies in the database, which is astounding considering two days ago there were 890,000.

That’s slighlty less than 100,000 proxies per day  for the last two days, or about 4000+ per hour.  And as usual most of them were dead, but there were enough live ones to slow down publishing the list.

In fact there were only two page refreshes yesterday since it was taking so long to go through them all.

But we’re all caught up now and back on the usual publishing schedule.

Meanwhile I’m working on the page refresh to make it faster so days like yesterday don’t happen again.  Proxies come and go so fast you need the freshest possible data.  A twice-a-day refresh is not going to cut it.

Instead of moving the newly found proxies from the main table into the “gold” table before every page run, I will be putting them in both the main and “gold” table.  This way I can run a modified version of the resurrection code on the new proxies, which will run much faster than the old sequential method.

For now we’re on the old method because things have died dwon a bit.  The file at IS-1 stopped refreshing last night.  If it starts back up you will notice some page delays if I don’t get this code working today.

30
Aug

Million Mark Hit!

We hit it, but the page doesn’t show it yet.

I did a very ugly thing to compensate for GeoCity Lite’s tendency to do nothing with an address it can’t find.  I ran an SQL statement on the entire database to fix the blank data.  These days it’s taking a long, long time.

It was a cheap hack, what can I say?  In the end it took less time to alter the GeoCity code.

So I rewrote the test-geo-city.c program that comes with the binary version of their database to spit out the values I want.  One more “clean” of the database and I can stop doing it.

Great, but right now it’s almost 4PM and the 2PM run hasn’t finished yet.

Also, I got a call from GoDaddy and they’re moving me to another server.  There may be some disruption in service.

The IS-1 suck is awesome.  A few bugs to work out but it’s running fine.  It appears they reset the file every now and then so I have to hack around that.

I plan to rewite the main page on the Web site to reflect the fact that most of the data no longer comes from proxy lists.  The majority of all proxies in the database came from the Google Hack and the “Interesting Sites” found using it. 

I am convinced now more than ever that all online proxy lists, with the exception of the Dinkster’s, are PURE CRAP.  They have nothing on me.  I am the Proxy King.

30
Aug

Can't Keep Up With IS-1

I woke up this morning to find a new file on IS-1.  I downloaded it and started banging on it.

An hour later I refreshed the page and the same file’s timestamp had changed.  I never noticed this before so I’m starting to wonder whether it hasn’t done this all along.  If so, this site has the richest supply of proxies on the Internet.

I’m at the limit of my processing power importing three file simultaneously on the AMD64x2 box, so I may have to enlist another VM if the file updates again today.  Or I can just start stockpiling data and catch-as-catch can.

-= UPDATE 12:00PM =-

I have implemented a check once evry 15 minutes on this file and it appears it is refreshed every 30 minutes, like clockwork.  It’s not a new file, but an update.  The file always has about 250,000 proxies so I’ll need to hack out a diff to make this manageable.

-= UPDATE 1:15PM =-

I hacked out the diff.  Using – surprise – diff!

This site just may max out my processing capabilities.  Right now the page says we have 995,000 proxies, but we’ve probably already gone over a million.

The page updates are taking almost an hour with the extra data.  The twelve o’clock run didn’t make it to the server until 12:46.  I may have to look at that code.  It checks the new proxies sequentially and with a 45 second timeout that can slow things down considerably.  There must be some multitasking opportunities in there somewhere.

29
Aug

890,000+ Proxies and Going Strong

While I was playing catch-up with IS-1 (Interesting Site #1), they sent another update.  I’m cranking on it right now.  The One Million Mark is in sight and approaching faster than I thought.

This project is, in fact, turning into IS-1′s proxy list.  That is, if they had a proxy list (they seem to be in the SPAM business).  I’m not really keeping track, but at least two thirds of the database came from that site.

Thanks guys, whoever you are.  I don’t agree with what you do but your data is primo!

In other news, I got my first Uknownian site.  Turns out it’s in Argentina in a /15 CIDR block.

The system is now stable.  No surprises in the morning, no sudden lockups.  Life is good.

I have been playing around with a new version of nmap that is very slick.  You can read about it here if you are so inclined.  You’ll have to compile it yourself if you want to check it out, but it’s worth the effort (you’ll need <subversion).

25
Aug

IS-1 At It Again

I woke up this morning and decided to check up on Interesting Site I (IS-1).  Sure enough, there was a new file, dated today!

I downloaded it and let the AMD64x2 have at it. 

It’s still running.

So far it’s added ~50,000 proxies to the database (with ~200 good ones so far).  Even proxies with “weird” ports are turning out to be OK, so I may revisit my decision not to add the other 400,000 proxies in the other files from IS-1.  Plus there is a lot of port 1080 systems in there, so that could be more grist for the SOCKS mill.

If I decide to do it, we should hit the million mark by the weekend or early next week.

Yesterday I was hacking away like a madman on the code all morning.  I was on a serious roll, boys and girls.  Then, just about  1PM, the power went out (yes, I do this all from the comfort of my own home) and stayed out until 5PM.

Extremely  aggravating.

And to top it off the lease expired on the IP address I’ve had since… well… since the last power outage, whenever that was (I have my gateway box on a UPS, but it’s only good for about 90 minutes).  So I lost half a day and then another hour and a half getting everything on the new IP address.

The good news is the 1G of RAM I took out seems to have stabilized the box running the system’s VM.  If it lasts through the weekend I’ll probably put the 80G drive back in.

24
Aug

Unknownia

It never fails. You fix one problem, three more show up.

The issue with the GeoLite City database fixed, a few dozen proxies that have been dropped because of SQL errors every time they showed up over the past five months went into the database.

With NULL columns values. ARGH!

This screwed up a number of the page updates for the 23rd & 24th.

I hacked together a fix to update the database with non-NULL values and came up with a new problem: there was no flag icon for “unknown country”. So I made one.

Right now I’m playing catch-up with yesterday’s runs. Halfway there so far.

23
Aug

Quick Fixes

A couple of my proxy judges fell off the face of the planet so I took them off the list and ran the resurrection code against the database.

Since I moved to VMware Server, the time has been all screwed up.  I just noticed this today.  I forgot to run the VMware tools and sync the guest with the host.

Why was this never a problem with VMware Player?

I have had a longstanding problem with IPs that are not in the GeoLite City database.  For some reason, long ago, I put the latitude & longitude of the IP addresses in the database.  Well, if GeoLite City doesn’t have the IP it returns NULL values for latitude & longitude and the SQL errors out (yes, I neglected to specify a default value).  That’s fixed now but I suspect some RFC 1918 addresses may find their way into the database (10.x.x.x and 192.168.x.x are hammered by the scripts but the 172.16-32.x.x addresses aren’t in there yet).  Also, some IANA reserved and unallocated IPv4 addresses may have the same issue.

These will never make it to the Web page, but I don’t like them in the database.

This hardware is still driving me nuts.  Still hanging.  Today I ripped out the two odd 512M sticks and took the box down to 2G of RAM (from 3G).  If it hangs again, the add-in USB 2.0 card goes (the mobo only supports USB 1.1).  If it hangs after that, the video card gets replaced.

I suspect the video card (ATI Radeon X1300 re-branded) because even if I shut down the box normally, when it reboots the event log says it rebooted after a bugcheck (STOP 0x0000007f).

Every single time.  Epic FAIL/

Whenever I’ve seen a STOP 0x0000007f it’s always been video related.  Very rarely is it anything else.  MS really fucked things up after they moved the GDI out of kernel space and into user space starting with NT4 (to get it ready for “DirectX”).  This shit never happened in NT 3.51.  Oh, BSODs happened now and then in the  3.51 days but never  with the frequency they did on NT4 (or beyond).  In fact, NT4 was a great shock and a tremendous disappointment to customers who had learned to love the stability of NT 3.51.  It wasn’t until SP3 that NT4 ran worth a diddly damn.

Yes, I go back that far.  Mr. HinkyDink has been around the farm once or twice.  How time flies, boys and girls!

ATI isn’t helping with their Driver of the Fucking Month either.  Forty five fucking minutes of downtime whenever they put out a new driver.  Ugh.

Regardless of the host system’s problems, the stability of the VM and MySQL still amazes me (knock on wood).  It’s been mistreated, barfed on, run on a disk with bad sectors, and it still keeps kickin’.   But there’s no way in Hell I’m stopping the backups!

21
Aug

Hard Drive Problems

The 80G hard drive turned out to be a corker.

The weekend after I installed it I bought a 500G drive. Sheer coincidence. After the 80G died every night for about 3 or 4 days I moved everything over, diddled the drive letters and now we’re back in business.

That was yesterday.

Tuesday, it died while I was at work. I spent most of the day trying out my Disaster Recovery Plan (don’t tell my boss), which, as it turns out, leaves much to be desired, although as luck would have it the database backup ran just before the system crashed. But I was missing a few core utilities and only managed to run about four updates. When I switched back into production I didn’t bother bringing the updated database over assuming I’d get the data again, but we were down to less than 500 proxies so I ran the resurrection script and brought it back up to over twelve hundred.

That huge increase in the number of proxies prompted me to take a look at the recheck code. I think I have fixed that issue but we’ll just have to see how it goes.

Meanwhile, as I was going over my DRP, GoDaddy decided to migrate the Web site to a new server, so everything sort of worked out.

So… back in business and back in maintenance mode.

16
Aug

Half a Million Dead Proxies

The Half Mill mark came much sooner than I thought it would but we hit it and kept on going, up to 512,000+ dead proxies, five months to the day after the start of the project.

The data (some 86,000 rows) that put us over that mark came from Interesting Site I. You may recall there is more data from that site that has never been entered, primarily because of the oddball ports listed in it. I may lift the ban on these in the near future, because the last batch had some hits on those oddball ports.

I finally upgraded the system to VMWare Server. In addition I added an 80G hard drive and copied the VM files over, so now I have a complete backup of the system at the Half Mill mark, which is a good place to be.