
HOW TO : Find size of directories in current directory

Quick note for self. Simple bash loop to find out the size of each directory in the existing directory. This script is useful if you are running our of disk space and want to quickly find out the offending directory.

[code]for dir in $(find ./ -maxdepth 1 -type d); do echo ${dir}; du -ch ${dir} | grep -i total; done [/code]

Breaking this down

  • The find command prints out a list of directories. You can modify it to do recursive lookups by just removing the -maxdepth option. This output is fed into the bash loop
  • du gets the size of all the files (and sub directories) in the directory and grepping it for total gives you the total size of the directory


Project : Uptime

The uptime of this blog has been really bad recently. I switched to hosting it on a Rackspace virtual server last year and went with the cheapest option. A 256MB Linux virtual server that was costing me ~$12/month. I never got around to tuning the OS, so the server was always using swap and would go down pretty much every day. Last week, I upgraded the plan and moved to a 512MB server. But the uptime hasn’t been any better. Here’s a report from Pingdom (which by the way is a great service to track the uptime and responsiveness of your website) showing the availability of the site over the last year 96%!!.. And for someone that has been working in the operations and infrastructure world, that is unacceptable :). So my new goal is to maintain at least 99.5% uptime. Here is my plan to achieve this

  1. Move to a fresh VM with the latest kernel
  2. Upgrade to the latest version of Apache. Initially, I wanted to move to nginx or lighttpd, but with the recent Apache upgrade, I hear good things about Apache working well in low memory situations.
  3. Upgrade to latest version of MySQL and tune it for memory usage
  4. Configure cloudflare to serve a static version of front page, in case the server goes down. Design the static page to point people to my other digital presences (Google+, LinkedIn, Flickr etc)

I plan to blog the progress and learnings as I implement this plan.

HOW TO : Search and Replace text in a file with Perl

There are tons of sites (and tons of different ways to do this) about this information.. But wanted to note this down for my personal records. If you ever wanted to search for and replace certain text in a file, you can do it with perl with this quick one liner

[code]perl  -p -i -e ‘s/ORIGINAL_STRING/NEW_STRING/g’ FILE_NAME [/code]

Demonstrating the power of perl

I haven’t scripted in perl for quite some time (disadvantages of moving into management 🙂 ). Today, we had to analyze some log files at work and thought I would dust off my scripting skills..

The source data is Apache web logs and we had to find out the number of hits from a unique IP address for a particular scenario.

Pretty simple right, grep will do the job very well. As demonstrated in this blog post. But we had to analyze the data for a ton of servers and I really didn’t want to repeat the same command again and again. Did you know that laziness is the mother of invention :). So I wrote a simple perl script to do the job for me. The biggest advantage of writing this perl script was not that it helped reduce the copy/paste job, but the speed that the script took to run. Details of the comparison below


The analysis consisted of getting web logs for the last week (and some of these log files were already rotated/compressed). Concatenating them to create one large file and then getting the number of hits by IP for a certain condition. This can be done very simply by using a couple of commands that come standard with any *nix system

  • cp
  • cat
  • grep for each day we needed the data

The final grep command would look like this

[code] grep -i "\[20/Feb/2012" final_log | grep -i "splash.do" | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/2_20_2012_ip_report.log [/code]

Timing this command showed that it took ~1 min and 22 seconds to run it.


I wrote this perl script (disclaimer : I am not a programmer :), so pls excuse the hack code).


# Modules to load
# use strict;
use warnings;

# Variables
my $version = 0.1;

# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;

# Create one large file to parse
`cp /opt/apache/logs/access_log ~/access_log`;
`cp /opt/apache/logs/access_log.1.gz ~/access_log.1.gz`;
`cp /opt/apache/logs/access_log.2.gz ~/access_log.2.gz`;

`gunzip access_log.1.gz`;
`gunzip access_log.2.gz`;

`cat access_log.2 access_log.1 access_log > final_access_log`;

# Hostname

print "The Hostname of the server is : $hostName \n";

# Process the log file file, one line at a time
open(INPUTFILE,"< final_access_log") || die "Couldn’t open log file, exiting $!\n";

while (defined ($line = <INPUTFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_20_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
if ($line =~ m/\[21\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_21_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
if ($line =~ m/\[22\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_22_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
if ($line =~ m/\[23\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_23_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
if ($line =~ m/\[24\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_24_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
if ($line =~ m/\[25\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_25_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";

if ($line =~ m/\[26\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_26_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
if ($line =~ m/\[27\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_27_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";
if ($line =~ m/\[28\/Feb\/2012/)
open(OUTPUTFILE, ">> 2_28_2012_log_file") || die "Couldn’t open log file, exiting $!\n";
print OUTPUTFILE "$line\n";

`rm final_access_log`;
`rm access_log`;
`rm access_log.1`;
`rm access_log.2`;

for ($day=0; $day < 9; $day++)
$outputLog = $hostName."_2_2".$day."_2012.txt";
$inputLog = "2_2".$day."_2012_log_file";

$dateString = "\\[2".$day."/Feb/2012";

print "Running the aggregator with following data\n";
print "Input File : $inputLog\n";
print "Output Log : $outputLog\n";
print "Date String: $dateString\n";

`grep -i "splash.do" | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/$outputLog`;

# Cleanup after yourself
`rm $inputLog`;


I wrote a smaller script to do the same job as the command line hack that I tried earlier and compared the time. First, here is the smaller script


# Modules to load
# use strict;
use warnings;

# Variables
my $version = 0.1;
# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;
open (TEMPFILE,"< final_log");

# Match date and write to another log file
while (defined ($line = <TEMPFILE>)) {
chomp $line;
if ($line =~ m/\[20\/Feb\/2012/)
open(OUTPUTFILE, ">> perl_speed_test_output.log");
print OUTPUTFILE "$line\n";

`grep -i "splash.do" perl_speed_test_output.log | grep -i productcode | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/perl_speed_test_output_ip.log`;


Timing this script, showed that it took 21 seconds to run it.  > 300% improvement in speed and more importantly, less load (RAM utilization) on the system

One has to love technology :).

HOW TO : Sort Apache Web Logs for hits by Unique IP Addresses


Say you want to find out how many hits you are getting t0 a specific page from a particular source IP, you can use this quick collection of Linux tools to get this data

[code]grep -i "URL_TO_CHECK" PATH_TO_APACHE_ACCESS_LOG | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]

You are using

  • grep to filter the string of the page you want the report on
  • cut to get the IP address from the log file
  • sort and uniq to sort the unique IP addresses
  • and finally sort -rn to sort the data in descending order

Example :

[code]grep -i "GET /" /opt/apache/logs/access_log | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]

gets you the report of hits to the index page.

HOW TO : Find list of files used by a process in Linux

Quick howto on finding the list of files being accessed by a process in Linux. I needed to find this for troubleshooting an issue where a particular process was using an abnormally high percentage of CPU. I wanted to find out what this particular process was doing and accessing.

  1. Find the process ID (pid) of the process you want to analyze by running[code] ps -ef | grep NAME_OF_PROCESS [/code]
  2. Find the files the process is accessing at a given time by running[code]sudo ls -l /proc/PROCESS_ID/fd [/code]

For example, if I wanted to find the list of files being accessed by mysql, the process would look as such

[code] ps -ef | grep mysqld [/code]

which would show the output as

[code]samurai@samurai:~$ ps -ef | grep mysqld
mysql     3304     1  0 Feb04 ?        00:00:23 /usr/sbin/mysqld
samurai  23389 23374  0 14:57 pts/0    00:00:00 grep –color=auto mysqld

I can then find the list of files being used by mysql by running

[code] sudo ls -l /proc/3304/fd [/code]

which would give me


lrwx—— 1 root root 64 Feb  7 15:00 0 -> /dev/null
lrwx—— 1 root root 64 Feb  7 15:00 1 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb  7 15:00 10 -> socket:[4958]
lrwx—— 1 root root 64 Feb  7 15:00 11 -> /tmp/ibdu9WRh (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 12 -> socket:[4959]
lrwx—— 1 root root 64 Feb  7 15:00 14 -> /var/lib/mysql/blog/wp_term_relatio                        nships.MYI
lrwx—— 1 root root 64 Feb  7 15:00 15 -> /var/lib/mysql/blog/wp_postmeta.MYI
lrwx—— 1 root root 64 Feb  7 15:00 17 -> /var/lib/mysql/blog/wp_term_relatio                        nships.MYD
lrwx—— 1 root root 64 Feb  7 15:00 18 -> /var/lib/mysql/blog/wp_term_taxonom                        y.MYI
lrwx—— 1 root root 64 Feb  7 15:00 2 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb  7 15:00 20 -> /var/lib/mysql/blog/wp_postmeta.MYD
lrwx—— 1 root root 64 Feb  7 15:00 21 -> /var/lib/mysql/blog/wp_term_taxonom                        y.MYD
lrwx—— 1 root root 64 Feb  7 15:00 22 -> /var/lib/mysql/blog/wp_terms.MYI
lrwx—— 1 root root 64 Feb  7 15:00 23 -> /var/lib/mysql/blog/wp_terms.MYD
lrwx—— 1 root root 64 Feb  7 15:00 3 -> /var/lib/mysql/ibdata1
lrwx—— 1 root root 64 Feb  7 15:00 4 -> /tmp/ibvANyz7 (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 5 -> /tmp/ibonS0mU (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 6 -> /tmp/ibcKctaH (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 7 -> /tmp/ibB5DS5t (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 8 -> /var/lib/mysql/ib_logfile0
lrwx—— 1 root root 64 Feb  7 15:00 9 -> /var/lib/mysql/ib_logfile1

Overheard : Comment about trust and security

Very thought provoking comment on trust and security by Mark Boyle, the Moneyless Man, on a recent episode of PRI‘s To the best of our knowledge program (I personally transcribed this.. so pls overlook any minor typos 🙂 )

What money has become is.. a substitute for trust. It has now become our primary source of security in the world and what I am trying to do personally is to find my primary source of security in the friendships I have and in my local community and my relationship with earth. Because most countries, such as Argentina and Indonesia and currently Zimbabwe have experienced this hyperinflation and you can have a million dollars in the bank. One day with devaluation, it can almost be worthless. No matter how badly I behave, my friend’s don’t devalue me that quickly. And I think real security comes in our relationships, whether to it be with our planet or whether with our local community. I think what we all can do is build a bit more diversity in how we meet our needs and to not be so reliant on cash.

You can get the full interview at http://feedproxy.google.com/~r/TTBOOK/~3/X009WjbiqB0/tbk120205a.mp3. Segment with Mark starts at ~42 min.

Overheard : Comment on Work

I was standing in line to get into a plane yesterday and heard this comment made by a gentleman to his friend

You know.. funny thing about work, it has to get done!!

The guys were discussing about how their wives don’t understand the pressures of work :).