Technology

Project : Uptime

The uptime of this blog has been really bad recently. I switched to hosting it on a Rackspace virtual server last year and went with the cheapest option. A 256MB Linux virtual server that was costing me ~$12/month. I never got around to tuning the OS, so the server was always using swap and would go down pretty much every day. Last week, I upgraded the plan and moved to a 512MB server. But the uptime hasn’t been any better. Here’s a report from Pingdom (which by the way is a great service to track the uptime and responsiveness of your website) showing the availability of the site over the last year 96%!!.. And for someone that has been working in the operations and infrastructure world, that is unacceptable :). So my new goal is to maintain at least 99.5% uptime. Here is my plan to achieve this

  1. Move to a fresh VM with the latest kernel
  2. Upgrade to the latest version of Apache. Initially, I wanted to move to nginx or lighttpd, but with the recent Apache upgrade, I hear good things about Apache working well in low memory situations.
  3. Upgrade to latest version of MySQL and tune it for memory usage
  4. Configure cloudflare to serve a static version of front page, in case the server goes down. Design the static page to point people to my other digital presences (Google+, LinkedIn, Flickr etc)

I plan to blog the progress and learnings as I implement this plan.

HOW TO : Search and Replace text in a file with Perl

There are tons of sites (and tons of different ways to do this) about this information.. But wanted to note this down for my personal records. If you ever wanted to search for and replace certain text in a file, you can do it with perl with this quick one liner

[code]perl  -p -i -e ‘s/ORIGINAL_STRING/NEW_STRING/g’ FILE_NAME [/code]

Demonstrating the power of perl

I haven’t scripted in perl for quite some time (disadvantages of moving into management 🙂 ). Today, we had to analyze some log files at work and thought I would dust off my scripting skills..

The source data is Apache web logs and we had to find out the number of hits from a unique IP address for a particular scenario.

Pretty simple right, grep will do the job very well. As demonstrated in this blog post. But we had to analyze the data for a ton of servers and I really didn’t want to repeat the same command again and again. Did you know that laziness is the mother of invention :). So I wrote a simple perl script to do the job for me. The biggest advantage of writing this perl script was not that it helped reduce the copy/paste job, but the speed that the script took to run. Details of the comparison below

HOW 99% OF ENGINEERS WOULD DO IT

The analysis consisted of getting web logs for the last week (and some of these log files were already rotated/compressed). Concatenating them to create one large file and then getting the number of hits by IP for a certain condition. This can be done very simply by using a couple of commands that come standard with any *nix system

  • cp
  • cat
  • grep for each day we needed the data

The final grep command would look like this

[code] grep -i "\[20/Feb/2012" final_log | grep -i "splash.do" | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/2_20_2012_ip_report.log [/code]

Timing this command showed that it took ~1 min and 22 seconds to run it.

HOW THE 1% DO IT:)

I wrote this perl script (disclaimer : I am not a programmer :), so pls excuse the hack code).

[code]

#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;

# Variables
my $version = 0.1;

# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;

# Create one large file to parse
`cp /opt/apache/logs/access_log ~/access_log`;
`cp /opt/apache/logs/access_log.1.gz ~/access_log.1.gz`;
`cp /opt/apache/logs/access_log.2.gz ~/access_log.2.gz`;

`gunzip access_log.1.gz`;
`gunzip access_log.2.gz`;

`cat access_log.2 access_log.1 access_log > final_access_log`;

# Hostname
$hostName=`hostname`;
chomp($hostName);

print "The Hostname of the server is : $hostName \n";

# Process the log file file, one line at a time
open(INPUTFILE,"< final_access_log") || die "Couldn’t open log file, exiting $!\n";

while (defined ($line = <INPUTFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_20_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[21\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_21_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[22\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_22_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[23\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_23_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[24\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_24_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[25\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_25_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}

if ($line =~ m/\[26\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_26_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[27\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_27_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
if ($line =~ m/\[28\/Feb\/2012/)
{
open(OUTPUTFILE, ">> 2_28_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
}

`rm final_access_log`;
`rm access_log`;
`rm access_log.1`;
`rm access_log.2`;

for ($day=0; $day < 9; $day++)
{
$outputLog = $hostName."_2_2".$day."_2012.txt";
$inputLog = "2_2".$day."_2012_log_file";

$dateString = "\\[2".$day."/Feb/2012";

print "Running the aggregator with following data\n";
print "Input File : $inputLog\n";
print "Output Log : $outputLog\n";
print "Date String: $dateString\n";

`grep -i "splash.do" | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/$outputLog`;

# Cleanup after yourself
`rm $inputLog`;
}

[/code]

I wrote a smaller script to do the same job as the command line hack that I tried earlier and compared the time. First, here is the smaller script

[code]

#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;

# Variables
my $version = 0.1;
# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;
open (TEMPFILE,"< final_log");

# Match date and write to another log file
while (defined ($line = <TEMPFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
{
open(OUTPUTFILE, ">> perl_speed_test_output.log");
print OUTPUTFILE "$line\n";
close(OUTPUTFILE);
next;
}
}

`grep -i "splash.do" perl_speed_test_output.log | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/perl_speed_test_output_ip.log`;

[/code]

Timing this script, showed that it took 21 seconds to run it.  > 300% improvement in speed and more importantly, less load (RAM utilization) on the system

One has to love technology :).

HOW TO : Sort Apache Web Logs for hits by Unique IP Addresses

 

Say you want to find out how many hits you are getting t0 a specific page from a particular source IP, you can use this quick collection of Linux tools to get this data

[code]grep -i "URL_TO_CHECK" PATH_TO_APACHE_ACCESS_LOG | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]

You are using

  • grep to filter the string of the page you want the report on
  • cut to get the IP address from the log file
  • sort and uniq to sort the unique IP addresses
  • and finally sort -rn to sort the data in descending order

Example :

[code]grep -i "GET /" /opt/apache/logs/access_log | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]

gets you the report of hits to the index page.

HOW TO : Find list of files used by a process in Linux

Quick howto on finding the list of files being accessed by a process in Linux. I needed to find this for troubleshooting an issue where a particular process was using an abnormally high percentage of CPU. I wanted to find out what this particular process was doing and accessing.

  1. Find the process ID (pid) of the process you want to analyze by running[code] ps -ef | grep NAME_OF_PROCESS [/code]
  2. Find the files the process is accessing at a given time by running[code]sudo ls -l /proc/PROCESS_ID/fd [/code]

For example, if I wanted to find the list of files being accessed by mysql, the process would look as such

[code] ps -ef | grep mysqld [/code]

which would show the output as

[code]samurai@samurai:~$ ps -ef | grep mysqld
mysql     3304     1  0 Feb04 ?        00:00:23 /usr/sbin/mysqld
samurai  23389 23374  0 14:57 pts/0    00:00:00 grep –color=auto mysqld
[/code]

I can then find the list of files being used by mysql by running

[code] sudo ls -l /proc/3304/fd [/code]

which would give me

[code]

lrwx—— 1 root root 64 Feb  7 15:00 0 -> /dev/null
lrwx—— 1 root root 64 Feb  7 15:00 1 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb  7 15:00 10 -> socket:[4958]
lrwx—— 1 root root 64 Feb  7 15:00 11 -> /tmp/ibdu9WRh (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 12 -> socket:[4959]
lrwx—— 1 root root 64 Feb  7 15:00 14 -> /var/lib/mysql/blog/wp_term_relatio                        nships.MYI
lrwx—— 1 root root 64 Feb  7 15:00 15 -> /var/lib/mysql/blog/wp_postmeta.MYI
lrwx—— 1 root root 64 Feb  7 15:00 17 -> /var/lib/mysql/blog/wp_term_relatio                        nships.MYD
lrwx—— 1 root root 64 Feb  7 15:00 18 -> /var/lib/mysql/blog/wp_term_taxonom                        y.MYI
lrwx—— 1 root root 64 Feb  7 15:00 2 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb  7 15:00 20 -> /var/lib/mysql/blog/wp_postmeta.MYD
lrwx—— 1 root root 64 Feb  7 15:00 21 -> /var/lib/mysql/blog/wp_term_taxonom                        y.MYD
lrwx—— 1 root root 64 Feb  7 15:00 22 -> /var/lib/mysql/blog/wp_terms.MYI
lrwx—— 1 root root 64 Feb  7 15:00 23 -> /var/lib/mysql/blog/wp_terms.MYD
lrwx—— 1 root root 64 Feb  7 15:00 3 -> /var/lib/mysql/ibdata1
lrwx—— 1 root root 64 Feb  7 15:00 4 -> /tmp/ibvANyz7 (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 5 -> /tmp/ibonS0mU (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 6 -> /tmp/ibcKctaH (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 7 -> /tmp/ibB5DS5t (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 8 -> /var/lib/mysql/ib_logfile0
lrwx—— 1 root root 64 Feb  7 15:00 9 -> /var/lib/mysql/ib_logfile1
[/code]

Protesting SOPA and PIPA

Unless you are living under a rock or outside the US :).. you probably heard about the crazy legislation that the US congress and senates are proposing to help protect content creators (AKA Hollywood) from privacy. While I personally don’t have any issues with giving protection to content creators, it should not be at the cost of freedom for the rest of the world. Go to http://americancensorship.org/ to get more information about why this proposed legislation are bad.

Today (1/18/2012) has been designated as “Protest SOPA/PIPA day” by the technology world. I believe in the old adage, put your money where your mouth is :).. so I checked on the top 25 US sites (according to Alexa) to see how many of them are supporting this protest in a visible manner. Only 4 out of the 25 sites, put visible content on their websites regarding the protest. I think Google’s message was the most effective, where they did not reduce the functionality of the website, but provided a lot of visibility to the protest. I know which companies I am going to support/use moving forward :). I was very happy to see that three of the sites that I use on a regular basis (google, amazon and wikipedia) are supporting this protest. Here are screenshots of the protest from the the  4 sites that are in the top 25 visited sites in the US

Google.com

Amazon.com

Wikipedia.Org

WordPress.com

Screenshots of some other sites that I visit on a regular basis and are supporting the protest

Boingboing.net

Wired.com

Arstechnica.com

Reddit.com

DuckDuckGo.com

G+ or Blog

I started using Google Plus from last November and I should say that, even though I am a big proponent of keeping control over your digital avatar, it has been very easy to make (give) quick updates on Google plus than on this blog. Plus my friends and family don’t have to specially come to this site to get updates. They get the G+ updates as part of their regular email and/or when they log into their G+ stream. It is less work on everyones part.

That is one of the reasons, I believe G+ will be one of the first real contenders to Facebook. Even though Facebook boasts of more than 800 million users, it is still a “seperate” site that folks have to log into unlike Google plus, which is fast becoming part of the regular Google experience. Esp with the tweaks that Google made last week with incorporating G+ data into the search results, the line between  a Google search and using Google Plus gets blurrier.

So the question (for me) is not if it is Facebook or G+.. but if it is the blog or G+..

 

HOW TO : Fix Jboss startup script for CentOS

Quick note for myself on fixing the default startup script provided by Jboss to work on CentOS. Thx to Shankar to finding the solution.

The default startup script (/$JBOSS_HOME/bin/jboss_init_redhat.sh) that Jboss provides does not work properly in CentOS. The start option works fine, but when you try to stop Jboss, it gives you a “No JBossas is currently running” message and quits.

Here’s a quick way to fix it. Edit the jboss_init_redhat.sh file and replace

[code]JBOSSSCRIPT=$(echo $JBOSSSH | awk ‘{print $1}’ | sed ‘s/\//\\\//g’) [/code]

with

[code]JBOSSSCRIPT=$(echo $JBOSSSH | awk ‘{print $1}’)[/code]

HOW TO : Move your life into the cloud – Part 2

Following up on my previous post (https://kudithipudi.org/2011/11/13/how-to-move-your-life-into-the-cloud/) on the journey to using online services to maintain your productivity, here are some services that I use on a regular basis

BOOKMARKING :

  • SERVICE : Every switched your computer and were frustrated of having to remember and bookmark all the links you saved up on your old computer? I use delicious, to save me from this frustration. Delicious is the grand daddy of online bookmarking services and even though there are several competing services with more social networking capabilities, delicious works well for me. It allows you to store all of your bookmarks in an easily accessible URL (for example, all my bookmarks are at http://delicious.com/kudithipudi) and annotate them with tags
  • COST : free
  • OTHER CHOICES : pinboard, StumbleUpon, Google Bookmarks (being discontinued from Dec 2011), Firefox Sync, EverNote

HOW TO : Move your life into the cloud

Nope… I am not too late to get on the “cloud” bandwagon :). I started writing this post in Dec 2009 and here’s a screenshot of my drafts to prove it 

And I have finally decided that it is time to complete the post and publish it.

I change laptops every 6 months or so and a lot of my friends ask me how I manage to swap them so quickly and yet stay productive. I am sure a lot of you can relate to this. It usually take a month or so to get your workstation to a “state” that you feel comfortable with and are productive. Here are the tricks/tools I use to make the switching of a laptop/desktop to be a no-brainer activity. And I utilize the “cloud” heavily for this.

I adhere to a couple of simple rules to make sure I can be productive anywhere, even in situations, where I don’t have my workstation with me.

  • Everything I produce should be searchable
  • Everything I produce should be available on the web
  • Everything I produce should be easy to share

With these principles in mind, here are the services I use..

PHOTOS :

  • SERVICE : I use flickr to store all my pictures. I have taken ~40 thousand pictures since 2003 and everyone of them is online at http://www.flickr.com/photos/kudithipudi. I wish, flickr was around when I was a kid, so that I had a place to store all the pictures from my childhood instead of rotting away in some old cardboard box. I like Flickr for it’s simplicity and ease of use. There are other sites that offer a lot more features, but the features offered by Flickr are are more than enough for me.
  • COST : $24.95/year to upload/store unlimited number of pictures
  • OTHER CHOICES : There are plenty of photo storing/sharing sites. Some of the popular ones are picasa, photobucket, facebook

ON-LINE STORAGE :

  • SERVICE : I use dropbox to store any digital content I create. This overlaps a bit with the service I use to store documents I create. Dropbox is a service that allows you to synchronize files between different computers you have the agent installed on and at the same time stores them online for you. They offer 2GB of free space by default and you can earn more space by referring people to the service. (note : the links to dropbox are my referral links. If you sign up for the service, I get 250Mb of free space. If you don’t want to use the referral links, you can sign up for the service directly at www.dropbox.com).You would think 2GB is not a lot of space. But once you remove the music, movies and photos, you really don’t need a lot of space :). For example, I haven’t crossed 1.8 GB, even though I have an electronic record of all my important files all the way from 2006. All I do, when I switch to a new laptop is install the dropbox agent and voila all my files are downloaded and synced with the latest copies.
  • COST : free. If you need more space, dropbox offers it for a cost.
  • OTHER CHOICES : There’s plenty of competition for dropbox, but I don’t think anyone of them have come close to making the sharing/storage work as seamless as dropbox. Some of the popular ones are box.net, SugarSync,wuala, Amazon Cloud Drive

DOCUMENTS :

  • SERVICE : I use Google Docs to create and store documents, spreadsheets and presentations. Since it’s inception in 2006 as a simple online editor and spreadsheets service, Google Docs has come a long way. There are few things you cannot do in Google Docs, that you can do in a full fledged productivity suite like Microsoft Office. Plus it gives you the capability to collaborate with other people when creating documents.
  • COST : free.
  • OTHER CHOICES : The only other service that comes close to Google Docs is Zoho Suites. Microsoft has a competing product, Office Live, but I think they are confused on how to market it because it will eat into their most profitable franchise (Microsoft Office)

EMAIL :

  • SERVICE : I use Gmail for my email. Although there is a standalone version,  I use it as part of the services provided by Google Apps for my domain (kudithipudi.org). It offers free spam protection, 7GB of space and super fast search. What else can one ask for? 🙂
  • COST : free
  • OTHER CHOICES : There are several free email hosting providers. Some of the popular ones are hotmail, yahoo, aol

ONLINE PRESENCE :

  • SERVICE : I strongly believe that all of us have to manage our online presence. And I don’t mean just for the folks that work in technology, but everyone that uses the Internet. And that is pretty much most of the people living on planet earth :). There are several ways to do this (and I think that is for a another blog post), but the simplest way is to ensure you have a place where you can broadcast your presence. I use this blog as a way to document my thoughts, share ideas and in general manage  my on-line presence. I host this blog on a virtual server that I lease from Rackspace.
  • COST : $11/month
  • OTHER CHOICES : I would not recommend what I am doing for most people. There are several free platforms that you can host your blog on. I just do it this way, because I like to tinker with technology. Some of the popular blogging platforms are tumblr, blogspot, wordpress, squarespace.