Tag Archive for 'cs-1'

27
Sep

CS-1 Back in Production

CS-1 rose from the dead on Wednesday.

I didn’t notice until just moments ago. For the past week I’ve been putting most of my efforts in refining the Google Hack using CoDeeN proxies.

Turns out, they’re wise to Google harvesting.

You can only get ~500-1000 search results from any one CoDeeN server. That was hardly enough for my traditional method, which basicly just searched for port numbers.

CoDeeN’s restrictions taught me to maximize my results by subtracting certain search terms, like “-guestbook” and “-mp3″ and even “-SOCKS”. You can get completely different results with the same ports and different “minus” terms.

I don’t know why that never occured to me before, but it has been an excellent learning opportunity.

While I was learning all these wonderful things, Google lifted my ban, so I applied all this new found logic to the original hack.

The result? Thousands of new (DEAD) proxies and a smattering of active ones.

So the list goes on. I have backed off on the purge to keep the numbers up, but there is still a high percentage of good proxies in there.

31
Aug

New Code On Deck

I spent all day hacking at the new page refresh code.

It’s going to be a winner.

I have one more page to make the old-fashined way and then I can switch over.

I did all my development on an old, reliable Ubuntu 6.06 LTS VM.  Since I usually develop on the AMD64x2, which uses special credentials on the production database, I had to make sure I didn’t screw it up.  I edited all the scripts to point to the VM’s copy of the database (it’s 10 days old) but just to be sure I didn’t miss anything I added some firewall rules to prevent the VM from talking to the production database.

And sure enough I didn’t get them all.  In fact what happened was I had a copy of everything in my /home folder, but I sudo’d into root without realizing I wasn’t in root’s folder.  I also neglected to give the VM a decent amount of memory and left the query limits at the level of the AMD64x2.

Double whammy.

I’ve never seen a session crash quite like that.  The OS killed all my processes including the root sessions when it maxed out.

I got the resource issue squared away and removed all the script copies in my /home folder and hammered it out.  There were a few hair-pulling bugs but by the third or fourth run of the page code it ran slicker’n shit.

The last old-fashined run just wrapped up.  The 4PM run will be on the new code.

16
Aug

Half a Million Dead Proxies

The Half Mill mark came much sooner than I thought it would but we hit it and kept on going, up to 512,000+ dead proxies, five months to the day after the start of the project.

The data (some 86,000 rows) that put us over that mark came from Interesting Site I. You may recall there is more data from that site that has never been entered, primarily because of the oddball ports listed in it. I may lift the ban on these in the near future, because the last batch had some hits on those oddball ports.

I finally upgraded the system to VMWare Server. In addition I added an 80G hard drive and copied the VM files over, so now I have a complete backup of the system at the Half Mill mark, which is a good place to be.

15
Aug

Curious Site

Notice I didn’t say “interesting”?

The other day, a run to Interesting Site II (IS-2) barfed while I wasn’t looking. It was not a disaster and there was no great loss of data. In fact I wouldn’t have noticed if I hadn’t looked at the log for that event.

I send all the cron job email to my Yahoo account so I can look at it from anywhere and on this particular run, between hundreds of MySQL errors, was a URL. This in itself is curious, since anything from that site should have been chewed down to only IP:port strings.

Intrigued, I pasted it into a browser and was greeted with a few thousand proxies arranged end-to-end in a continous string. I ran a one time pass at the site and got a handful of new proxies.

I went back to the browser and refreshed the page.

It changed.

There was new data on the page. I refreshed it again and there was more. New data was coming in every few seconds to this site.

Of course, I immediately put it into the rotation and dubbed the place “Curious Site I” (CS-1).

Upon investigation, it turned out to be a type of subscription-only proxy list. There is no login, your account is in the URL. The account had to be that of the operator of IS-2.

Whether CS-1 is the sole source of proxies for IS-2 remains to be seen, but now I’m not so sure that IS-2 is as “evil” as I assumed to be in the beginning. Watching this play out should be educational.

IS-1, the original Interesting Site, had fresh data yesterday. A list with 115,000+ proxies in it. I’ve been processing it since yesterday and there is some good stuff in there.

This project is becoming less and less about the Other Lists. As a source of new data, they dried up weeks ago. With this new data from IS-1 we should hit the half million mark in the database before noon today.