<< Prev  |  TOC  |  Front Page  |  Talkback  |  FAQ  |  Next >>
LINUX GAZETTE
...making Linux just a little more fun!
Perl One-Liner of the Month: The Mystery of the Red Worm
By Ben Okopnik

- "It's just a little further along... right through here, Frink."

Woomert and Frink had had a long walk along the tunneled VPN connection, their footsteps echoing against the titanium walls; once they reached the comfortable spacious environment of their destination system, Frink had wanted to pull up a comfortable variable and rest his tired feet, but Woomert had insisted on pushing on. Now, Woomert turned and entered the room marked "/var/log/apache".

"Ah, here we are - there's `/var/log/apache/access.log' - and just in time from the looks of it. The poor thing is up to 400MB and it's nearly filling up the partition, and it's only been a few days since it was rolled over!"

Frink, having dropped onto the sparkling clean floor (the system had recently been swept by "cruft") and arranging himself in a tailor's seat, stared at the bulging log file in fascination.

- "What happened here, Woomert? I just came in to tell you about the latest story in the newspapers, praising you to the sky for your solution to the Missing Databases Mystery [1] at the Bigrich Bank, and you dragged me off without a word. Not that I mind, but..."

The famous detective smiled self-deprecatingly.

- "I do tend to get a bit concentrated while on the job, don't I? Oh well - there are worse things. All right, here is what's happening: the client, a small company that specializes in making horseshoe welding sprockets for accountants working in the napkin-fringing industry, has become suspicious of a few odd things happening with their web site. For example, their response time often spikes right through the roof, and they've been returning the `Server busy' message much too often as compared to normal operation. There hasn't been any huge jump in the amount of business they do - less, since the economy these days doesn't permit too many luxuries like their product - so..."

Frink nodded.

- "It sounds like a DoS (Denial of Service) attack, Woomert."

- "Indeed." Woomert, deep in thought, pulled on his typing gloves and approached the local terminal. "This is a highly competitive industry, you know. This company maintains its lead by ferret-polishing the final product, but it's a narrow margin; the competitors would all love to get an advantage, and DoSing their web site just might do it. We've been hired to look around and report anything unusual, so this is just a statistics gathering mission.

Here, let's test a few things. First, though, lets make a couple of copies of this file where it won't cram things quite as badly... There, I've put them both in `/home/woomert'. We don't really want to lose any of the data if we should accidentally damage or destroy one file, do we? Now, let's zero out the actual log and restart the server... excellent. Now - on to exploring the files. Given that you suspect a DoS - I do, as well - what would you look for, Frink?"

Frink scratched his head and frowned in concentration.

- "I'm not sure, Woomert. I think I'd like to figure out the average hits per IP, and then maybe look at the sorted list of the same. That would tell us if someone is really slamming this server and from where, don't you think?"

Woomert smiled happily.

- "Why, Frink, that sounds like an excellent idea! Yes, let's take a look at the average:


perl -wlne'/^(\S+)/;$h{$1}++}{print"$h{$_}\t$_"for sort{$h{$a}<=>$h{$b}}keys%h' access.log

12.30830039525692

- "Hmm, interesting. Taking into account that the number is going to be higher due to the very large DoS entries - we're still assuming those, but it's a fair bet - that's not an unreasonable number. Most people will probably examine a few models before making their decision to buy; after all, it is a once-in-a-lifetime purchase. In fact, this company led the rest of the pack in offering lifetime warranties... All right - now let's look at that sorted list:


perl -wlne'/^(\S+)/;$h{$1}++}{print"$h{$}\t$"for sort{$h{$a}<=>$h{$b}}keys%h' access.log

...
22 users.osceola.k12.fl.us
26 152.31.2.221
26 modem-140.nyc-tc01a.fcc.net
28 62.84.228.7
31 209.106.1.124
103 bdsl.66.13.44.110.gte.net
112 24-164-141-122.si.rr.com
611 nyny01hsiapat.everestbroadband.com
1085 162.66.50.6
2817 web-05.segfl.ifl.net
55055 wsip66-210-242-2.ph.ph.cox.net
71031 205.213.111.53
85120 pc-80-193-117-84-cw.blueyonder.co.uk
97000 151.138.254.21
111092 168.11.225.251
122101 syr-24-92-242-3.twcny.rr.com
155017 212.85.1.1
175990 pool-68-161-90-99.ny325.east.verizon.net
181222 1cust185.tnt15.nyc9.da.uu.net
315078 pool-141-155-115-168.ny5030.east.verizon.net
- "Well, well; would you look at that! What's your estimate, Frink?"

Frink stared at the screen for a moment, then nodded. When he spoke, there was a confident note in his voice.

- "It's a DoS. I'm willing to believe that someone would spend a day or so browsing this site, so the 103 and the 112 are border cases, but - 315 thousand hits? I don't know that I'd call it a DDoS (a distributed DoS, where many machines at ones are attempting to flood a given network or host) because the number of machines is fairly small, but it should definitely be an issue for further investigation - perhaps contacting the ISPs for those domains - and a temporary block at the firewall. Woomert, could we look at a sample entry for the different hits? I have a theory about this. If it's a long `GET' string, then... I wonder."

Woomert looked thoughtful, then nodded.

- "I see where you're going, Frink, and it's a reasonable possibility. Here, this will show the longest entry for a given IP:


perl -lne'/^(\S+).*?"(.*?)"/;length$h{$1}>length$2or$h{$1}=$2}{print"@a"while@a=each%h' access.log

... 
pool-68-161-90-99.ny325.east.verizon.net GET /default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u685
8%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a
HTTP/1.0
syr-24-92-242-3.twcny.rr.com GET /default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%
ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a
HTTP/1.0
1cust185.tnt15.nyc9.da.uu.net GET /default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6
858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a
HTTP/1.0
212.85.1.1 GET /default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u90
90%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0
...

Woomert and Frink looked at the screen, at each other, and exchanged a high-five salute, with Frink adding his "double twist with finger snaps" variation.

- "Woo-hoo! I called it right, Woomert. What do you think?"

- "You did indeed, Frink, It looks like a modified version of a common worm, Code Red. The good news is that we're not dealing with a particularly sophisticated attacker, though: a Code Red infection attempt, which is what this is, is not the same thing as a Code Red DoS, which is just a network slam of a specific IP - and it only works against legacy operating systems, certainly nothing as modern as Linux - which is what this site runs. All these guys have going for them is bandwidth, and that's not particularly bad - and once the client blocks those IPs and notifies the relevant ISPs, it won't be an issue at all. In fact, there are analysis and response utilities that can track this sort of thing and do it automatically, and I'll recommend them to the client. Here - "

Woomert quickly fired off the results and his comments to the client by piping them into the "mail" program, and turned to Frink.

- "...shall we? I have a Paglia e Fieno con Pollo e Funghi that should only take a few minutes to finish preparing, a tiramisu made to my own recipe for desert, and a really great '97 Rosso di Cerbaiona wine that should go well with it all. My girlfriend, the lovely Priority Interrupt, is going to join us."

- "Well, if you're sure that I won't be in the way..."

- "Nonsense, Frink; we'd love your company."


After dinner, Frink lounged in the big armchair and Priority curled up in Woomert's lap and lit a huge Toscano cigar, which she used to produce beautuful double and sometimes triple rings of smoke. At Woomert's inquiring glance, she reached up and placed the cigar between his lips.

- "Aaah - luxury." Woomert leaned back and blew several rings that interlaced with Priority's. They smiled at each other. "So, Frink - questions? Answers? Guesses? Lay it on, my friend."

Frink grinned at the two of them across the room.

- "Sure. I'm getting pretty good at reading those one-liners, though - I might need a little help, but I think I'm getting there. What was it you typed, now... ah, here it is - I copied it into my Zaurus:


perl -wlne'/^(\S+)/;$h{$1}++}{$a=@a=values%h;map{$b+=$_}@a;print$b/$a' access.log
OK, so - the `-wlne' switches enable warnings; read and print everything in `line mode', which strips EOL characters before and adds them back after; loops over the entire file one line at a time; and executes the code that follows. That's the easy part - I've been studying ``perldoc perlrun'' lately. Now, the code -

/^(\S+)/

is a regex that captures all non-space characters starting from the beginning of the line. If we look at some typical lines from `access.log' -

127.0.0.1 - - [09/Mar/2003:22:14:46 -0500] "GET / HTTP/1.0" 200 50000 "http://localhost/" "Lynx/2.8.4rel.1 libwww-FM/2.14" webcache-01.segfl.ifl.net - - [01/Apr/2003:05:45:27 -0500] "GET / HTTP/1.0" "-" 200 5238

we'll see that it's going to catch the IP or the hostname, either of which is terminated by a space. Next, I see something you've done before:

$h{$1}++

That's a frequency count, isn't it?"

At Woomert's encouraging nod and smile, Frink went on.

- "OK. '$1' is a variable created by Perl which holds the first capture - that is, the contents of the first pair of parentheses in a regex. In this case, that's the IP. So, you use the IP as a key in the `%h' hash - and increment the value associated with that IP every time you see it. If it's a new IP, you get a new key.

Next... um. Next, there's a closing brace all by itself... and I don't understandand what it does - or even why the code works. Shouldn't that fail with a syntax error?"

Woomert grinned.

"Normally, it would. However - go ahead and pull up ``perldoc perlrun'' again, and take a look at the entry for `-p':

# From ``perldoc perlrun''
   while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
Note the ``your program goes here'' comment. What happens if you insert a closing brace there?"

Frink concentrated on the code. Suddenly, his face lit up.

- "Oh! I see it, I see it! A closing brace would terminate the `while' statement... and an opening brace after it would create a block just past it. What you've done is get out of the `while' loop; everything after the curly brace only gets executed once. This is almost the same as using an `END{}' block. Wonderful! [2]

All right, since we have that, the rest isn't too tough. Let's see:

$a=@a=values%h;

All right, you extract the list of values - all the counts - from the hash and set `$a' to the number of values returned; that's what you get when you look at a list in a scalar context (it's a bit more complex than that, but that's the part that's important right now.) Next, you sum up all those values -

map{$b+=$_}@a;

The `map' function iterates over `@a' and increments `$b' by the value of each of the elements. Last but not least -

print$b/$a

you print out the ratio of that sum over the count of the elements - thus dividing the total hits by the number of IPs. How's that?"

Woomert and Priority clapped and cheered as Frink turned pink and bowed, smiling.

- "Thank you, thank you... I guess spending all that time studying under Woomert's direction is starting to pay off - thanks, Woomert! The rest of them are somewhat similar:


perl -wlne'/^(\S+)/;$h{$1}++}{print"$h{$_}\t$_"for sort{$h{$a}<=>$h{$b}}keys%h' access.log

The first part we already know - do a frequency count of the IPs. In the end block, however, you do something different; we'll parse it right to left, just as Woomert taught me:

sort{$h{$a}<=>$h{$b}}keys%h

OK - this time, you extract the keys, and... oh, I see. You want to sort the hash by value, but plain old ``for ( values %h ){ ... }'' won't work - there's no way to retrieve a key given a value, since values aren't necessarily unique. So, you change the `sort' routine - just as ``perldoc -f sort'' explains - to sort the keys based on the value. This is done by using `$a' and `$b' which are the variables that Perl uses to hold the elements that `sort' is currently comparing. In return, you get a list that's sorted by value and still allows you to look at keys - slick! Next, you take that list and print it with a bit of formatting:

print"$h{$_}\t$"for ...

You loop over the list of returned keys with the "for" operator. The default variable in the loop, `$_', contains each key in turn, and `$h{$_}' return its associated value. You then print a tab and the key - which is the IP or the hostname. This gives us our list of IPs - and the associated number of hits.

Last but not least, we have this:


perl -lne'/^(\S+).*?"(.*?)"/;length$h{$1}>length$2or$h{$1}=$2}{print"@a"while@a=each%h' access.log

Whew. It's a tough one. Let's see: the regex isn't too bad -

/^(\S+).*?"(.*?)"/

It captures the IP as before, but now it also matches any character up to the first double-quote - the '?' modifier following the '*' quantifier makes the expression non-greedy so that it is the first one - and captures everything until the next double-quote, with the same non-greedy method. The first double-quotes... oh, that would be the HTTP request string, just what we wanted to see. Next... ooops. Woomert - help?"

Woomert lazily extracted a laser pointer from his shirt pocket and pointed.

- "I assume you mean this?

length$h{$1}>length$2or$h{$1}=$2

What I needed to do here is save the longest string as the value. In order to do that, I had to compare the current value for a given IP with the next value for that IP that came along. However, the initial value for a new key is undefined - and Perl would give us an error message if we compared something to an `undef'. That, as well as the interpolation of "@a" at the end are both things that would cause Perl to generate a non-fatal warning - so I turned off the warnings by skipping the `-w' switch, something you should not do unless you understand all the effects of doing so (read ``perldoc perllexwarn'' for more.) The method itself is fairly simple: I compare the length of the value currently assigned to the key; if it is greater, I replace the old value with the current one (contained in `$2'). Note that I'm using the soft `or' operator: a logical `or' (||) would not work here, since it would bind too tightly to the surrounding elements.

Can you do the rest?"

Frink nodded.

- "Yes; it looks fairly easy.

print"@a"while@a=each%h

I've seen you do this before... oh yes. It's a ``while each'' loop that retrieves a key-value pair from a hash; you're assigning them to an array and printing the array. Since you've interpolated it by using double quotes around the array name, you'll get a space between the elements - which makes it nicely readable. All together, this prints out our hash - in more-or-less random order, but we don't really care since we just want to see what's in it. Right?"

- "Very good, Frink; you've done very well. I'll be relying on you to provide some backup in our further adventures, then. Are you ready for it?"

- "I... I hope so, Woomert." Frink looked up, proud as can be. "I believe so. I'll certainly do my best. I'll head off for home then, and leave you two alone. Have a great night."

As the door closed behind the inordinately proud Frink, Priority smiled at Woomert.

- "You've made Frink's week, you know. That's quite a compliment."

- "He deserves it; he's learned quite a lot. I'm very pleased with him, and quite proud of him. And now, Priority," Woomert's laser pointer fired a beam at the stereo, which softly began to play an Ella Fitzgerald/Luis Armstrong rendition of ``Can Anyone Explain'', "we have far more important things to discuss than Frink or programming..."


[1] In regard to this, my mysterious correspondent notes: "This is a case where continued secrecy is necessary to the Bank's security arrangements. Perhaps one day, the world will be apprised of the brilliant, decisive, and above all courageous actions of the Great Detective and his assistant."

[2] Woomert, as my correspondent noted, does not take credit for this particular Perl hack; it was created by Abigail in
comp.lang.perl.misc and seems to have become an idiom, at least to a degree. In fact, Abigail's brilliant one-liners have been known to stump Woomert on occasion...

 

Ben is a Contributing Editor for Linux Gazette and a member of The Answer Gang.

picture Ben was born in Moscow, Russia in 1962. He became interested in electricity at age six--promptly demonstrating it by sticking a fork into a socket and starting a fire--and has been falling down technological mineshafts ever since. He has been working with computers since the Elder Days, when they had to be built by soldering parts onto printed circuit boards and programs had to fit into 4k of memory. He would gladly pay good money to any psychologist who can cure him of the resulting nightmares.

Ben's subsequent experiences include creating software in nearly a dozen languages, network and database maintenance during the approach of a hurricane, and writing articles for publications ranging from sailing magazines to technological journals. Having recently completed a seven-year Atlantic/Caribbean cruise under sail, he is currently docked in Baltimore, MD, where he works as a technical instructor for Sun Microsystems.

Ben has been working with Linux since 1997, and credits it with his complete loss of interest in waging nuclear warfare on parts of the Pacific Northwest.


Copyright © 2003, Ben Okopnik. Copying license http://www.linuxgazette.com/copying.html
Published in Issue 90 of Linux Gazette, May 2003

<< Prev  |  TOC  |  Front Page  |  Talkback  |  FAQ  |  Next >>