Technology

HOW TO : Combining Perl and Zoho to produce reports

This HOW TO is more for my notes. We had a request at work, where we had to parse some log files and create a graph from the data in the log files.

The log files looked like this

[bash]
0m0.107s
0m0.022s
0m0.015s
2011-01-05_02_22
0m0.102s
0m0.024s
0m0.014s
2011-01-05_02_23
[/bash]

I wrote the following perl script to get the log file to look as such

[bash]| 0m0.107s| 0m0.022s| 0m0.015s| 2011-01-05 | 02:22

| 0m0.102s| 0m0.024s| 0m0.014s| 2011-01-05 | 02:23 [/bash]

perl script

[perl]
#!/usr/bin/perl
# Modules to load
# use strict;
use warnings;

# Variables
my $inputFile = ‘input.txt’;
my $version = 0.1;

my $logFile = ‘parsed_input.csv’;

# Sub Functions
sub Log($$$);
sub Trim($);

# Clear the screen
system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;

# Open the output log file
open(LOGFILE,"> $logFile") || die "Couldn’t open $logFile, exiting $!\n";

# Open the input file
open(INPUTFILE,"< $inputFile") || die "Couldn’t open $inputFile, exiting $!\n";

# Process the input file, one line at a time
while (defined ($line = <INPUTFILE>)) {
chomp $line;
# Check for blank line
if ($line =~ /^$/)
{
# Start a new line in the output
print LOGFILE "\n";
}
else
{
# Split the date and time
if ($line =~ /2011/)
{
@date = split (/_/,$line);
print LOGFILE "| $date[0] | $date[1]:$date[2]";
}
else
{
# Write the value to the output
print LOGFILE "| $line";
}
}
}
[/perl]
I then took the parsed log files and imported them into the cloud based reporting engine provided by Zoho at http://reports.zoho.com

The final result are these reports

SERVER1

SERVER2

Did I say, I love technology? 🙂

HOW TO : Find out which network port a program is using in linux

Quick way to figure out, which ports a particular program is using in linux

[bash] netstat -plan | grep -i PROGRAM_NAME [/bash]

Example : Check which ports SSH is listening on

[bash]

samurai@samurai:~$ sudo /bin/netstat -plan | grep sshd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      5257/sshd
tcp        0     52 123.123.123.123:22      124.124.124.124:32846     ESTABLISHED 3551/sshd: samurai
tcp6       0      0 :::22                   :::*                    LISTEN      5257/sshd
unix  3      [ ]         STREAM     CONNECTED     5893     3551/sshd: samurai
unix  2      [ ]         DGRAM                    5849     3551/sshd: samurai

[/bash]

HOW TO : Manage startup services in Ubuntu

Most Redhat/Fedora users are used to chkconfig and service for controlling the services/programs that startup at boot time. Here is how you do it in Ubuntu

  • Check status of a particular service

[bash] sudo SERVICE_NAME status [/bash]

Example : Check the status of Apache Web Service

[bash]samurai@samurai:~$ sudo service apache2 status
Apache is running (pid 3496).[/bash]

  • Add a service to start on bootup

[bash] update-rc.d SERVICE_NAME add [/bash]

Example : Configure squid to start on bootup

[bash] update-rc.d squid add [/bash]

  • Stop a service from starting on bootup

[bash] update-rc.d SERVICE_NAME remove [/bash]

Example : Configure squid to NOT start on bootup

[bash] update-rc.d squid remove [/bash]

NOTE : You need to have a startup script in /etc/init.d for the service to ensure update-rc.d works fine.

HOW TO : Check IO speed on a Linux Machine

For my notes.. if you ever want to check the IO capability of a disk (local or network) on a linux machine, use the following command

[bash] dd if=/dev/zero of=test.file bs=4M count=1000 [/bash]

The above command make a copy of the output from /dev/zero to a file called test.file (you can locate the file on the disk you want to measure) with a block size of 4M for a total file size of 4000Mb.

Cloud Computing and your company's infrastructure

Bold forecast :).. But in 5 to 10 years, I predict the majority of a company’s infrastructure will be hosted in a “cloud”. If you recall (circa 2000..), most of the companies were hosting “anti-spam” services in house. If anyone suggested that we can outsource that service, you would get a “are-you-crazy” look :). And now, you will get the same look if anyone suggests they run the anti-spam service in house. I believe the same is going to happen for infrastructure. You might still be running some components in house, but it will get smaller and smaller. Companies will be forced to focus on their core competency rather than try to maintain an army of engineers to perform tasks that someone else might be a lot better at.

Speaking of being visionary, apparently Netflix operates most of their infrastructure in the cloud. If Netflix can operate in the cloud, a majority of us can too :). Here are some links regd their lessons from moving to a cloud.

http://blip.tv/file/4252897 (Video of Netflix Director of Engineering explaining their move to the cloud)

https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxwcmFjdGljYWxjbG91ZGNvbXB1dGluZ3xneDo2NDc2ODVjY2ExY2Y1Zjcz&pli=1 (Write up by a Netflix engineer about the move to the cloud from a storage and DB prospective)

HOW TO : Check status of bond interface in Linux

For my notes.. If you ever wanted to check the status of a bonded interface configure in Linux (esp RHEL), you can check the status by running the following command

[root@serverxyz bin]# cat /proc/net/bonding/bond0

i.e. assuming the name of your bond interface is bond0.

Output from the command

Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth3 (primary_reselect always)
Currently Active Slave: eth3
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:10:18:6e:b8:1a

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:21:5e:11:34:32

The configuration files involved are

/etc/sysconfig/network-scripts/ifcfg-bond0 (Bond Interface)

DEVICE=bond0
IPADDR=10.10.40.26
NETMASK=255.255.255.0
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
GATEWAY=10.10.40.1
NETWORK=10.10.40.0
BROADCAST=10.10.40.255
TYPE=Ethernet

/etc/sysconfig/network-scripts/ifcfg-eth3 (Primary Interface)

DEVICE=eth3
BOOTPROTO=none
ONBOOT=yes
HWADDR=00:10:18:6e:b8:1a
MASTER=bond0
SLAVE=yes
TYPE=Ethernet
USERCTL=no

/etc/sysconfig/network-scripts/ifcfg-eth0 (Secondary Interface)

DEVICE=eth0
HWADDR=00:21:5e:11:34:32
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
TYPE=Ethernet

Lessons of the Trade : Purging Databases

We ran into an interesting issue at work recently. Documenting the solution for my records..

BACKGROUND : We had a table in one of our databases that served as a “hopping” point for some jobs. Data was inserted into this table and at jobs get kicked off at periodic intervals to “process” the data and delete it.

CURRENT METHOD : Launch multiple jobs to process the data and delete the rows as soon as the data is processed. This is causing locks on the table because there are multiple delete operations occurring at the same time. Which in turn means that the jobs cannot complete processing the data causing the table to grow in size.

PROPOSED METHOD : Add a new column to the table called “PROCESSED_STATE” and modify the “processing” jobs to set a flag “Y” in this column as soon as the data is processed. Create a new job that will be launched periodically, which checks the PROCESSED_STATE column and if the flag is set to “Y”, deletes the row.

Morale of the story.. 🙂 .. Multiple deletes on a table are bad. Better way is to have multiple updates and one delete.

What happens when you get busy (lazy)?

Your site goes down 🙂

And traffic to the site drops!!

Things have been a bit crazy at work recently, so I didn’t get a chance to fix the site as soon as it went down (due to an error I still haven’t figured out). And as a result, the traffic to the site dropped.

I finally took the chance to move the site to a dedicated server running on the RackSpace Cloud services. Am putting together a post on how I handled this migration and will publish it soon.

The bad news is that I have lost traffic to the site that I have built over a period of time.. the good news is that I am the master of my own house (website) at last :).