Demonstrating the power of perl

I haven’t scripted in perl for quite some time (~~dis~~advantages of moving into management 🙂 ). Today, we had to analyze some log files at work and thought I would dust off my scripting skills..

The source data is Apache web logs and we had to find out the number of hits from a unique IP address for a particular scenario.

Pretty simple right, grep will do the job very well. As demonstrated in this blog post. But we had to analyze the data for a ton of servers and I really didn’t want to repeat the same command again and again. Did you know that laziness is the mother of invention :). So I wrote a simple perl script to do the job for me. The biggest advantage of writing this perl script was not that it helped reduce the copy/paste job, but the speed that the script took to run. Details of the comparison below

HOW 99% OF ENGINEERS WOULD DO IT

The analysis consisted of getting web logs for the last week (and some of these log files were already rotated/compressed). Concatenating them to create one large file and then getting the number of hits by IP for a certain condition. This can be done very simply by using a couple of commands that come standard with any *nix system

cp
cat
grep for each day we needed the data

The final grep command would look like this

Timing this command showed that it took ~1 min and 22 seconds to run it.

HOW THE 1% DO IT:)

I wrote this perl script (disclaimer : I am not a programmer :), so pls excuse the hack code).

[code]

#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;

# Variables
my $version = 0.1;

# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;

# Create one large file to parse
`cp /opt/apache/logs/access_log ~/access_log`;
`cp /opt/apache/logs/access_log.1.gz ~/access_log.1.gz`;
`cp /opt/apache/logs/access_log.2.gz ~/access_log.2.gz`;

`gunzip access_log.1.gz`;
`gunzip access_log.2.gz`;

`cat access_log.2 access_log.1 access_log > final_access_log`;

# Hostname
$hostName=`hostname`;
chomp($hostName);

print "The Hostname of the server is : $hostName \n";

# Process the log file file, one line at a time
open(INPUTFILE,"< final_access_log") || die "Couldn’t open log file, exiting $!\n";

while (defined ($line = <INPUTFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_20_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[21\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_21_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[22\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_22_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[23\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_23_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[24\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_24_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[25\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_25_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}

if ($line =~ m/\[26\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_26_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[27\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_27_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[28\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_28_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
}

`rm final_access_log`;
`rm access_log`;
`rm access_log.1`;
`rm access_log.2`;

for ($day=0; $day < 9; $day++)
{
$outputLog = $hostName."_2_2".$day."_2012.txt";
$inputLog = "2_2".$day."_2012_log_file";

$dateString = "\\[2".$day."/Feb/2012";

print "Running the aggregator with following data\n";
print "Input File : $inputLog\n";
print "Output Log : $outputLog\n";
print "Date String: $dateString\n";

# Cleanup after yourself
`rm $inputLog`;
}

[/code]

I wrote a smaller script to do the same job as the command line hack that I tried earlier and compared the time. First, here is the smaller script

[code]

#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;

# Variables
my $version = 0.1;
# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;
open (TEMPFILE,"< final_log");

# Match date and write to another log file
while (defined ($line = <TEMPFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
{
open(OUTPUTFILE, ">> perl_speed_test_output.log");
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
}

[/code]

Timing this script, showed that it took 21 seconds to run it. > 300% improvement in speed and more importantly, less load (RAM utilization) on the system

One has to love technology :).

Ashok says:

March 2, 2012 at 12:03 am

I couldn’t understand the below part clearly, however this sounds to be a great script work. Amazed on you, for maintaining your scripting skills regardless of position you hold. Though am fully in Technical, still am finding hard to maintain or enhance my scripting skills 🙂

for ($day=0; $day < 9; $day++)
{
$outputLog = $hostName."_2_2".$day."_2012.txt";
$inputLog = "2_2".$day."_2012_log_file";

$dateString = "\[2".$day."/Feb/2012";

- Vinay says:
  
  March 2, 2012 at 12:23 am
  
  Ashok – I am using a for loop to generate a piece of string text. Instead of typing the file name 9 times, I just used the loop. I should have done the same in the main program, to reduce the size, but was feeling lazy :).
  
brice says:

March 2, 2012 at 10:32 am

or one line of bash! assuming you have a file “hostfile” in the current directory with a list of all the hosts you need to look at… like…
webserver1
webserver2
webserver3

This would save you having to move files around manually…

for H in `cat hostfile`; do ssh ${H} `cat /var/log/httpd/access.log* | grep -i “splash.do” | grep -i productcode | cut -d’ ‘ -f 1 – > ~/outputLog`; done ; cat ~/outputLog | sort |uniq -c | sort -rn

or if your logroll gz’s them after one day…
for H in `cat hostfile`; do ssh ${H} `cat /var/log/httpd/access.log | grep -i “splash.do” | grep -i productcode | cut -d’ ‘ -f 1 – > ~/outputLog`; do ssh ${H} `zcat /var/log/httpd/access.log.* | grep -i “splash.do” | grep -i productcode | cut -d’ ‘ -f 1 – > ~/outputLog`; done ; cat ~/outputLog | sort |uniq -c | sort -rn

😉 perl is still fun though!

HOW 99% OF ENGINEERS WOULD DO IT

HOW THE 1% DO IT:)

You may also like...

3 Responses

Leave a Reply Cancel reply

Recent Posts

Recent Comments

HOW 99% OF ENGINEERS WOULD DO IT

HOW THE 1% DO IT:)

You may also like...

Interesting (infrastructure) tidbits about Microsoft Azure

Cheesy Tollywood Dialogs

Overheard : Data and Insights

3 Responses

Leave a Reply Cancel reply

Recent Posts

Recent Comments