I haven’t scripted in perl for quite some time (disadvantages of moving into management 🙂 ). Today, we had to analyze some log files at work and thought I would dust off my scripting skills..
The source data is Apache web logs and we had to find out the number of hits from a unique IP address for a particular scenario.
Pretty simple right, grep will do the job very well. As demonstrated in this blog post. But we had to analyze the data for a ton of servers and I really didn’t want to repeat the same command again and again. Did you know that laziness is the mother of invention :). So I wrote a simple perl script to do the job for me. The biggest advantage of writing this perl script was not that it helped reduce the copy/paste job, but the speed that the script took to run. Details of the comparison below
HOW 99% OF ENGINEERS WOULD DO IT
The analysis consisted of getting web logs for the last week (and some of these log files were already rotated/compressed). Concatenating them to create one large file and then getting the number of hits by IP for a certain condition. This can be done very simply by using a couple of commands that come standard with any *nix system
- cp
- cat
- grep for each day we needed the data
The final grep command would look like this
[code] grep -i "\[20/Feb/2012" final_log | grep -i "splash.do" | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/2_20_2012_ip_report.log [/code]
Timing this command showed that it took ~1 min and 22 seconds to run it.
HOW THE 1% DO IT:)
I wrote this perl script (disclaimer : I am not a programmer :), so pls excuse the hack code).
[code]
#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;
# Variables
my $version = 0.1;
# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;
# Create one large file to parse
`cp /opt/apache/logs/access_log ~/access_log`;
`cp /opt/apache/logs/access_log.1.gz ~/access_log.1.gz`;
`cp /opt/apache/logs/access_log.2.gz ~/access_log.2.gz`;
`gunzip access_log.1.gz`;
`gunzip access_log.2.gz`;
`cat access_log.2 access_log.1 access_log > final_access_log`;
# Hostname
$hostName=`hostname`;
chomp($hostName);
print "The Hostname of the server is : $hostName \n";
# Process the log file file, one line at a time
open(INPUTFILE,"< final_access_log") || die "Couldn’t open log file, exiting $!\n";
while (defined ($line = <INPUTFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_20_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[21\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_21_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[22\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_22_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[23\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_23_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[24\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_24_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[25\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_25_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[26\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_26_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[27\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_27_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[28\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_28_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
}
`rm final_access_log`;
`rm access_log`;
`rm access_log.1`;
`rm access_log.2`;
for ($day=0; $day < 9; $day++)
{
$outputLog = $hostName."_2_2".$day."_2012.txt";
$inputLog = "2_2".$day."_2012_log_file";
$dateString = "\\[2".$day."/Feb/2012";
print "Running the aggregator with following data\n";
print "Input File : $inputLog\n";
print "Output Log : $outputLog\n";
print "Date String: $dateString\n";
`grep -i "splash.do" | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/$outputLog`;
# Cleanup after yourself
`rm $inputLog`;
}
[/code]
I wrote a smaller script to do the same job as the command line hack that I tried earlier and compared the time. First, here is the smaller script
[code]
#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;
# Variables
my $version = 0.1;
# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;
open (TEMPFILE,"< final_log");
# Match date and write to another log file
while (defined ($line = <TEMPFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
{
open(OUTPUTFILE, ">> perl_speed_test_output.log");
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
}
`grep -i "splash.do" perl_speed_test_output.log | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/perl_speed_test_output_ip.log`;
[/code]
Timing this script, showed that it took 21 seconds to run it. > 300% improvement in speed and more importantly, less load (RAM utilization) on the system
One has to love technology :).
I couldn’t understand the below part clearly, however this sounds to be a great script work. Amazed on you, for maintaining your scripting skills regardless of position you hold. Though am fully in Technical, still am finding hard to maintain or enhance my scripting skills 🙂
for ($day=0; $day < 9; $day++)
{
$outputLog = $hostName."_2_2".$day."_2012.txt";
$inputLog = "2_2".$day."_2012_log_file";
$dateString = "\[2".$day."/Feb/2012";
Ashok – I am using a for loop to generate a piece of string text. Instead of typing the file name 9 times, I just used the loop. I should have done the same in the main program, to reduce the size, but was feeling lazy :).
or one line of bash! assuming you have a file “hostfile” in the current directory with a list of all the hosts you need to look at… like…
webserver1
webserver2
webserver3
This would save you having to move files around manually…
for H in `cat hostfile`; do ssh ${H} `cat /var/log/httpd/access.log* | grep -i “splash.do” | grep -i productcode | cut -d’ ‘ -f 1 – > ~/outputLog`; done ; cat ~/outputLog | sort |uniq -c | sort -rn
or if your logroll gz’s them after one day…
for H in `cat hostfile`; do ssh ${H} `cat /var/log/httpd/access.log | grep -i “splash.do” | grep -i productcode | cut -d’ ‘ -f 1 – > ~/outputLog`; do ssh ${H} `zcat /var/log/httpd/access.log.* | grep -i “splash.do” | grep -i productcode | cut -d’ ‘ -f 1 – > ~/outputLog`; done ; cat ~/outputLog | sort |uniq -c | sort -rn
😉 perl is still fun though!