Uncategorized

RESOLUTIONS : 2011 : January Update

As I mentioned here, I have made some resolutions for 2011. As with any good task list, it is worthless unless you take a look at it periodically and update it :). I am going to publish an update on each one of the resolutions every month. Here goes the first one

  1. Loose weight (AKA loose gut)
    • I am practicing part of the diet proposed by Tim Ferris in his Four Hour Body book. I am eating 2 egg whites for breakfast and than eating a small meal every 4 hours. I haven’t gone completely into the whole “white” carb diet he proposes though.
    • I also started tracking my weight and diet religiously on a daily basis. This is another of Tim’s ideas. He says that by tracking your weight everyday, you subconsciously start making better choices in terms of the food you eat. I this it makes sense :). I am tracking the data in a Google spreadsheet. Here is a chart of my weight for the last one month I started out at 194 lbs and am not at 188 lbs. Hopefully I will be able to keep this downward trend.
    • I also started working out (thanks to Jhanvi). We are working out at least 2 times a week.
  2. Increase web traffic to kudithipudi.org
    • I started posting more content on the site. I posted 8 articles in January.
    • No particular strategy other than writing more content, which will hopefully bring more traffic.
    • Have the following topics to write on (and some of them have been pending for a long time)
      • Moving your life to the cloud
      • Setting up a virtual server on the Rackspace Cloud Infrastructure
      • Configuring syslog-ng
      • Configuring nginx to reduce resource utilization on Linux server
  3. Achieve CISSP certification
    • No progress on this one at all.
  4. Go on a vacation
    • Jhanvi and I planned to go to the travel and adventure expo, that was held in Rosemont last weekend, but we got too lazy :).
    • Our trip to India (and it doesn’t count as a vacation 🙁 ) is planned for April.

Progress on 2 out of the 4 resolutions!!.. Not bad :).

PS : Thanks for all the support I have been getting on the first resolution :). I didn’t realize the situation was so bad :).

HOW TO : Configure Cache Expiration in Apache

Cache servers depend on cache control headers provided by the web server. Essentially, the web server (based on the configuration) specify’s what content is cache-able and for how long. (Note: Some of the cache servers might ignore this and have a default cache period for specific content. But that is not for another post 🙂 )

Here is a quick and dirty way to configure Apache 2.x server to enable cache control settings on all content in a directory

[bash]
ExpiresActive On
<Directory "/var/www/html/static">
Options FollowSymLinks MultiViews
Order allow,deny
Allow from all
ExpiresDefault "modification plus 1 hour"
</Directory>
[/bash]

This configuration tells apache to enable cache headers for all content in the /var/www/html/static folder. The cache expiration is set to expire 1 hour from the modification time of the content.

Analytics in the Cloud : Not there yet

I attended a webinar hosted by Deepak Singh from Amazon’s Web Service group on analytics in the cloud. He made a very compelling case for utilizing the cloud to build out your analytics infrastructure. Esp with the growing data sizes that we deal with now, I think it makes absolute sense. You can utilize different software stacks and grow (and shrink) your hardware stack as required. Great stuff..

But there is a catch. Most of the data generated by current organizations is “inside” their perimeters. Whether it is the OLAP database collecting all your data or that application that spews gigabytes of logs, most of the data is housed in your infrastructure. So if you want to use the cloud to perform analytics on this data, you have to first transfer this data to the cloud. And therein lies the problem. As Deepak mentioned in the webinar, human beings have to yet conquer the limitations of physics :).  You have to have a pretty big pipe to the Internet to just transfer this data.

Amazon has come up with various means to help with this issue. They are creating copies of publicly available data sets within their cloud so that customers don’t have to transfer them. They are also working with companies to keep private data sets in the cloud for other customers to use. So similar to how you would be able to spin up a Redhat AMI, by paying some license fee to Redhat, I believe they are looking at providing customers access to this private data sets by paying some fee to the company providing this data set. It is a win-win-win situation 🙂 for Amazon, the company providing the private data set and Amazon’s web services customers. They also support a one time import of data from physical disk or tape.

Coming back to the title of this post :). I think this field is still in it’s infancy. Once companies start migrating their infrastructure to the cloud (And yes, it will happen. It is only a matter of time :).), it will be a lot easier to leverage the cloud to perform your analytics. All your data will be in the cloud and you start leveraging the hardware and software stacks in the cloud.

LinkedIn Network Map

LinkedIn (professional networking site) is providing a way to map your networks to see where you have your strongest connections. Here is a map of my networks. You can click on the image to get to the live map.

My strongest connections so far are at

I wish they came up with a map showing the location of my network too. That way, I can find out if I can get a job in New Zealand through my network :).

HOW TO : Combining Perl and Zoho to produce reports

This HOW TO is more for my notes. We had a request at work, where we had to parse some log files and create a graph from the data in the log files.

The log files looked like this

[bash]
0m0.107s
0m0.022s
0m0.015s
2011-01-05_02_22
0m0.102s
0m0.024s
0m0.014s
2011-01-05_02_23
[/bash]

I wrote the following perl script to get the log file to look as such

[bash]| 0m0.107s| 0m0.022s| 0m0.015s| 2011-01-05 | 02:22

| 0m0.102s| 0m0.024s| 0m0.014s| 2011-01-05 | 02:23 [/bash]

perl script

[perl]
#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;

# Variables
my $inputFile = ‘input.txt’;
my $version = 0.1;

my $logFile = ‘parsed_input.csv’;

# Sub Functions
sub Log($$$);
sub Trim($);

# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;

# Open the output log file
open(LOGFILE,"> $logFile") || die "Couldn’t open $logFile, exiting $!\n";

# Open the input file
open(INPUTFILE,"< $inputFile") || die "Couldn’t open $inputFile, exiting $!\n";

# Process the input file, one line at a time
while (defined ($line = <INPUTFILE>)) {
chomp $line;
# Check for blank line
if ($line =~ /^$/)
{
# Start a new line in the output
print LOGFILE "\n";
}
else
{
# Split the date and time
if ($line =~ /2011/)
{
@date = split (/_/,$line);
print LOGFILE "| $date[0] | $date[1]:$date[2]";
}
else
{
# Write the value to the output
print LOGFILE "| $line";
}
}
}
[/perl]
I then took the parsed log files and imported them into the cloud based reporting engine provided by Zoho at http://reports.zoho.com

The final result are these reports

SERVER1

SERVER2

Did I say, I love technology? 🙂

HOW TO : Find out which network port a program is using in linux

Quick way to figure out, which ports a particular program is using in linux

[bash] netstat -plan | grep -i PROGRAM_NAME [/bash]

Example : Check which ports SSH is listening on

[bash]

samurai@samurai:~$ sudo /bin/netstat -plan | grep sshd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      5257/sshd
tcp        0     52 123.123.123.123:22      124.124.124.124:32846     ESTABLISHED 3551/sshd: samurai
tcp6       0      0 :::22                   :::*                    LISTEN      5257/sshd
unix  3      [ ]         STREAM     CONNECTED     5893     3551/sshd: samurai
unix  2      [ ]         DGRAM                    5849     3551/sshd: samurai

[/bash]

HOW TO : Manage startup services in Ubuntu

Most Redhat/Fedora users are used to chkconfig and service for controlling the services/programs that startup at boot time. Here is how you do it in Ubuntu

  • Check status of a particular service

[bash] sudo SERVICE_NAME status [/bash]

Example : Check the status of Apache Web Service

[bash]samurai@samurai:~$ sudo service apache2 status
Apache is running (pid 3496).[/bash]

  • Add a service to start on bootup

[bash] update-rc.d SERVICE_NAME add [/bash]

Example : Configure squid to start on bootup

[bash] update-rc.d squid add [/bash]

  • Stop a service from starting on bootup

[bash] update-rc.d SERVICE_NAME remove [/bash]

Example : Configure squid to NOT start on bootup

[bash] update-rc.d squid remove [/bash]

NOTE : You need to have a startup script in /etc/init.d for the service to ensure update-rc.d works fine.

New lens and my mom..

Picture of my mom taken with my new toy (Canon 50mm F/1.8 lens). My mom is usually a shy person and this is probably the first time I saw her being comfortable in front of a camera.

I took a photography class recently and am finally enlightened to what a good lens can do :).  Prior to the class, I was always of the opinion that a zoom lens is the way to go. But my instructor opened my eyes to the world of fixed (and expensive) lenses :).

More pictures from this photo shoot at http://www.flickr.com/photos/kudithipudi/sets/72157625773287053/with/5385052525/