One of my friends, Karthick, is getting married and he put this innovative wedding invitation together.. And yes, pls do crash his wedding, I am sure he won’t mind :).

One of my friends, Karthick, is getting married and he put this innovative wedding invitation together.. And yes, pls do crash his wedding, I am sure he won’t mind :).

I haven’t scripted in perl for quite some time (disadvantages of moving into management 🙂 ). Today, we had to analyze some log files at work and thought I would dust off my scripting skills..
The source data is Apache web logs and we had to find out the number of hits from a unique IP address for a particular scenario.
Pretty simple right, grep will do the job very well. As demonstrated in this blog post. But we had to analyze the data for a ton of servers and I really didn’t want to repeat the same command again and again. Did you know that laziness is the mother of invention :). So I wrote a simple perl script to do the job for me. The biggest advantage of writing this perl script was not that it helped reduce the copy/paste job, but the speed that the script took to run. Details of the comparison below
The analysis consisted of getting web logs for the last week (and some of these log files were already rotated/compressed). Concatenating them to create one large file and then getting the number of hits by IP for a certain condition. This can be done very simply by using a couple of commands that come standard with any *nix system
The final grep command would look like this
[code] grep -i "\[20/Feb/2012" final_log | grep -i "splash.do" | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/2_20_2012_ip_report.log [/code]
Timing this command showed that it took ~1 min and 22 seconds to run it.
I wrote this perl script (disclaimer : I am not a programmer :), so pls excuse the hack code).
[code]
#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;
# Variables
my $version = 0.1;
# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;
# Create one large file to parse
`cp /opt/apache/logs/access_log ~/access_log`;
`cp /opt/apache/logs/access_log.1.gz ~/access_log.1.gz`;
`cp /opt/apache/logs/access_log.2.gz ~/access_log.2.gz`;
`gunzip access_log.1.gz`;
`gunzip access_log.2.gz`;
`cat access_log.2 access_log.1 access_log > final_access_log`;
# Hostname
$hostName=`hostname`;
chomp($hostName);
print "The Hostname of the server is : $hostName \n";
# Process the log file file, one line at a time
open(INPUTFILE,"< final_access_log") || die "Couldn’t open log file, exiting $!\n";
while (defined ($line = <INPUTFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_20_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[21\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_21_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[22\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_22_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[23\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_23_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[24\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_24_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[25\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_25_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[26\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_26_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[27\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_27_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[28\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_28_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
}
`rm final_access_log`;
`rm access_log`;
`rm access_log.1`;
`rm access_log.2`;
for ($day=0; $day < 9; $day++)
{
$outputLog = $hostName."_2_2".$day."_2012.txt";
$inputLog = "2_2".$day."_2012_log_file";
$dateString = "\\[2".$day."/Feb/2012";
print "Running the aggregator with following data\n";
print "Input File : $inputLog\n";
print "Output Log : $outputLog\n";
print "Date String: $dateString\n";
`grep -i "splash.do" | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/$outputLog`;
# Cleanup after yourself
`rm $inputLog`;
}
[/code]
I wrote a smaller script to do the same job as the command line hack that I tried earlier and compared the time. First, here is the smaller script
[code]
#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;
# Variables
my $version = 0.1;
# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;
open (TEMPFILE,"< final_log");
# Match date and write to another log file
while (defined ($line = <TEMPFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
{
open(OUTPUTFILE, ">> perl_speed_test_output.log");
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
}
`grep -i "splash.do" perl_speed_test_output.log | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/perl_speed_test_output_ip.log`;
[/code]
Timing this script, showed that it took 21 seconds to run it. > 300% improvement in speed and more importantly, less load (RAM utilization) on the system
One has to love technology :).
Say you want to find out how many hits you are getting t0 a specific page from a particular source IP, you can use this quick collection of Linux tools to get this data
[code]grep -i "URL_TO_CHECK" PATH_TO_APACHE_ACCESS_LOG | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]
You are using
Example :
[code]grep -i "GET /" /opt/apache/logs/access_log | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]
gets you the report of hits to the index page.
I ran across a video of Peter Hurley, as a guest post on Scott Kelby’s blog, in which he discusses the importance of positioning the jaw lines of your subjects when taking portraits. Sri stopped by to check on Virat yesterday along with his family and I thought I should try out this tip. The effect was amazing.
BEFORE (as I would normally take portraits) 
AFTER (Shebang!!) 🙂 
Quick howto on finding the list of files being accessed by a process in Linux. I needed to find this for troubleshooting an issue where a particular process was using an abnormally high percentage of CPU. I wanted to find out what this particular process was doing and accessing.
For example, if I wanted to find the list of files being accessed by mysql, the process would look as such
[code] ps -ef | grep mysqld [/code]
which would show the output as
[code]samurai@samurai:~$ ps -ef | grep mysqld
mysql 3304 1 0 Feb04 ? 00:00:23 /usr/sbin/mysqld
samurai 23389 23374 0 14:57 pts/0 00:00:00 grep –color=auto mysqld
[/code]
I can then find the list of files being used by mysql by running
[code] sudo ls -l /proc/3304/fd [/code]
which would give me
[code]
lrwx—— 1 root root 64 Feb 7 15:00 0 -> /dev/null
lrwx—— 1 root root 64 Feb 7 15:00 1 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb 7 15:00 10 -> socket:[4958]
lrwx—— 1 root root 64 Feb 7 15:00 11 -> /tmp/ibdu9WRh (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 12 -> socket:[4959]
lrwx—— 1 root root 64 Feb 7 15:00 14 -> /var/lib/mysql/blog/wp_term_relatio nships.MYI
lrwx—— 1 root root 64 Feb 7 15:00 15 -> /var/lib/mysql/blog/wp_postmeta.MYI
lrwx—— 1 root root 64 Feb 7 15:00 17 -> /var/lib/mysql/blog/wp_term_relatio nships.MYD
lrwx—— 1 root root 64 Feb 7 15:00 18 -> /var/lib/mysql/blog/wp_term_taxonom y.MYI
lrwx—— 1 root root 64 Feb 7 15:00 2 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb 7 15:00 20 -> /var/lib/mysql/blog/wp_postmeta.MYD
lrwx—— 1 root root 64 Feb 7 15:00 21 -> /var/lib/mysql/blog/wp_term_taxonom y.MYD
lrwx—— 1 root root 64 Feb 7 15:00 22 -> /var/lib/mysql/blog/wp_terms.MYI
lrwx—— 1 root root 64 Feb 7 15:00 23 -> /var/lib/mysql/blog/wp_terms.MYD
lrwx—— 1 root root 64 Feb 7 15:00 3 -> /var/lib/mysql/ibdata1
lrwx—— 1 root root 64 Feb 7 15:00 4 -> /tmp/ibvANyz7 (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 5 -> /tmp/ibonS0mU (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 6 -> /tmp/ibcKctaH (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 7 -> /tmp/ibB5DS5t (deleted)
lrwx—— 1 root root 64 Feb 7 15:00 8 -> /var/lib/mysql/ib_logfile0
lrwx—— 1 root root 64 Feb 7 15:00 9 -> /var/lib/mysql/ib_logfile1
[/code]
Very thought provoking comment on trust and security by Mark Boyle, the Moneyless Man, on a recent episode of PRI‘s To the best of our knowledge program (I personally transcribed this.. so pls overlook any minor typos 🙂 )
What money has become is.. a substitute for trust. It has now become our primary source of security in the world and what I am trying to do personally is to find my primary source of security in the friendships I have and in my local community and my relationship with earth. Because most countries, such as Argentina and Indonesia and currently Zimbabwe have experienced this hyperinflation and you can have a million dollars in the bank. One day with devaluation, it can almost be worthless. No matter how badly I behave, my friend’s don’t devalue me that quickly. And I think real security comes in our relationships, whether to it be with our planet or whether with our local community. I think what we all can do is build a bit more diversity in how we meet our needs and to not be so reliant on cash.
You can get the full interview at http://feedproxy.google.com/~r/TTBOOK/~3/X009WjbiqB0/tbk120205a.mp3. Segment with Mark starts at ~42 min.
I was standing in line to get into a plane yesterday and heard this comment made by a gentleman to his friend
You know.. funny thing about work, it has to get done!!
The guys were discussing about how their wives don’t understand the pressures of work :).
Unless you are living under a rock or outside the US :).. you probably heard about the crazy legislation that the US congress and senates are proposing to help protect content creators (AKA Hollywood) from privacy. While I personally don’t have any issues with giving protection to content creators, it should not be at the cost of freedom for the rest of the world. Go to http://americancensorship.org/ to get more information about why this proposed legislation are bad.
Today (1/18/2012) has been designated as “Protest SOPA/PIPA day” by the technology world. I believe in the old adage, put your money where your mouth is :).. so I checked on the top 25 US sites (according to Alexa) to see how many of them are supporting this protest in a visible manner. Only 4 out of the 25 sites, put visible content on their websites regarding the protest. I think Google’s message was the most effective, where they did not reduce the functionality of the website, but provided a lot of visibility to the protest. I know which companies I am going to support/use moving forward :). I was very happy to see that three of the sites that I use on a regular basis (google, amazon and wikipedia) are supporting this protest. Here are screenshots of the protest from the the 4 sites that are in the top 25 visited sites in the US
Screenshots of some other sites that I visit on a regular basis and are supporting the protest
I started using Google Plus from last November and I should say that, even though I am a big proponent of keeping control over your digital avatar, it has been very easy to make (give) quick updates on Google plus than on this blog. Plus my friends and family don’t have to specially come to this site to get updates. They get the G+ updates as part of their regular email and/or when they log into their G+ stream. It is less work on everyones part.
That is one of the reasons, I believe G+ will be one of the first real contenders to Facebook. Even though Facebook boasts of more than 800 million users, it is still a “seperate” site that folks have to log into unlike Google plus, which is fast becoming part of the regular Google experience. Esp with the tweaks that Google made last week with incorporating G+ data into the search results, the line between a Google search and using Google Plus gets blurrier.
So the question (for me) is not if it is Facebook or G+.. but if it is the blog or G+..
New year… new addition to the family :).
Virat Kudithipudi, was born on 1/9/2012 at 22:22.
We went through a lot at the end of the pregnancy. First Jhanvi got shingles and then I contracted chickenpox from her. At one point, I didn’t even know if I would be able to be at his delivery. But thankfully everything worked out and I was cured by the time, Virat decided to arrive on planet earth. When the pediatrician checked him the day after his birth and announced that he was perfectly “healthy”, I chocked up. I understand you better now mom :)..
BTW.. if you didn’t get chickenpox as a kid.. run to the drugstore and get vaccinated for it. You don’t want to go through what I went 🙂
Welcome to the world kiddo..

Collection of pictures of the first few days..
http://www.flickr.com/photos/kudithipudi/collections/72157628830321025/