Technology

HOW TO : Trick to find out your IP address from a web server farm

This is a quick trick I came up with to find out the IP address of a client that is trying to access a farm of web servers that you have access to. The diagram below shows the network path for a typical web server.

You have a client that might be sitting behind a (or multiple) proxy server. And there is a load balancer involved because you have multiple web servers for redundancy.

We were recently working on some rewrite rules for our web servers at work and we needed to find out what IP address the web servers were seeing the client traffic come from. Couple of challenges

  • Which web server do you check? The load balancer can send you traffic to any server.
  • What IP address are you going to look for? Wait that is the original problem right :).

The web servers usually write an entry to the error log when they serve a 404 error. So we can use that to figure out which web server you are hitting and what IP address the web server is seeing you as. Here’s the trick

  • On the client side go to http://WEBSITE_ADDRESS/Get_Me_My_IP (or some other URL, which you know doesn’t exist on the web site)
  • On the server side, grep for “Get_Me_My_IP” in the web server error logs

Here is an example, I ran on this website (https://kudithipudi.org)

[bash]
root@samurai:/var/log/apache2# grep -i what_is_my_ip access_kudithipudi.log
199.27.130.105 – – [04/Mar/2011:16:07:18 +0000] "GET /what_is_my_ip HTTP/1.0" 40 4 5495 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.14) Gecko/2 0110218 Firefox/3.6.14 ( .NET CLR 3.5.30729; .NET4.0E)"
[/bash]

  • From this entry I can figure out that my client is appearing as “199.27.130.105” to the web server.

HOW TO : Setup Global Redirect in Lighttpd

If you have ever managed a web application, you know you have to take it down at times :). And you usually want to show an simple page stating that you are down for maintenance. Here is a simple way to setup a “maintenance” splash page. The assumption is that you have a Linux server to host the maintenance page.

  • Configure lighttpd (HTTP Server) on the server using instructions from this article on Cyberciti.
  • Edit the lighttpd.conf file and add the following line in your site configuration

[bash] server.error-handler-404   = "index.html" [/bash]

  • Name your maintenance page as index.html and upload it to the document root (in this example, it is /var/www/html)

You are essentially telling the web server to display index.html whenever the user is trying to access content that is not present on the server. And since there is no content on the server other than the index.html, the web browser will always display the index.html page..

HOW TO : Dowload content from Oracle Metalink (Support) using wget

The usual process for a DBA to download files from Oracle Metalink (support) site is

  • Login to Metalink from his/her workstation
  • Download the file
  • Upload the file to the database server
  • Use the file

Say your database is in a data center and your workstation doesn’t have high speed connectivity to the data center, you can use the following trick to download content to a l[u]inux server in the data center that has Internet connectivity (and hopefully it is not your database server 🙂 ).

  • Log into Metalink from your workstation
  • Grab the link to the file/content you want to download (for example, we recently tried to download clusterware for Oracle 11G, and the link was http://download.oracle.com/otn/linux/oracle11g/linux.x64_11gR1_clusterware.zip)
  • Log into a server in your data center (it should have connectivity to the Internet and also to your database server)
  • Download the file using wget

[bash]wget http://download.oracle.com/otn/linux/oracle11g/linux.x64_11gR2_clusterware.zip –user ORACLE_ID –password ORACLE_ID_PASSWORD[/bash]

  • Replace the link with the link to your content and use your Oracle ID and password.
  • The file downloaded will have a strange name since wget  appends the sessionID to the end of the file. In the example I used above, the name of the file was “linux.x64_11gR2_clusterware.zip\?e\=1297470492\&h\=a66b265cc967a68c611052cb8e54356f
  • Rename the file and strip off the unnecessary data in the name using mv

HOW TO : Capture HTTP Headers using tcpdump

Quick how to on capturing HTTP headers using tcpdump on a web server (running Linux).

    • On the web server, issue the following command

      [bash] tcpdump -s 1024 -C 1024000 -w /tmp/httpcapture dst port 80 [/bash]

        • Stop the capture by issuing the break command (ctrl + c)
        • Open the capture file (httpcapture in this example) in wireshark and check out the headers under the  the HTTP protocol

        HOW TO : Configure Cache Expiration in Apache

        Cache servers depend on cache control headers provided by the web server. Essentially, the web server (based on the configuration) specify’s what content is cache-able and for how long. (Note: Some of the cache servers might ignore this and have a default cache period for specific content. But that is not for another post 🙂 )

        Here is a quick and dirty way to configure Apache 2.x server to enable cache control settings on all content in a directory

        [bash]
        ExpiresActive On
        <Directory "/var/www/html/static">
        Options FollowSymLinks MultiViews
        Order allow,deny
        Allow from all
        ExpiresDefault "modification plus 1 hour"
        </Directory>
        [/bash]

        This configuration tells apache to enable cache headers for all content in the /var/www/html/static folder. The cache expiration is set to expire 1 hour from the modification time of the content.

        Analytics in the Cloud : Not there yet

        I attended a webinar hosted by Deepak Singh from Amazon’s Web Service group on analytics in the cloud. He made a very compelling case for utilizing the cloud to build out your analytics infrastructure. Esp with the growing data sizes that we deal with now, I think it makes absolute sense. You can utilize different software stacks and grow (and shrink) your hardware stack as required. Great stuff..

        But there is a catch. Most of the data generated by current organizations is “inside” their perimeters. Whether it is the OLAP database collecting all your data or that application that spews gigabytes of logs, most of the data is housed in your infrastructure. So if you want to use the cloud to perform analytics on this data, you have to first transfer this data to the cloud. And therein lies the problem. As Deepak mentioned in the webinar, human beings have to yet conquer the limitations of physics :).  You have to have a pretty big pipe to the Internet to just transfer this data.

        Amazon has come up with various means to help with this issue. They are creating copies of publicly available data sets within their cloud so that customers don’t have to transfer them. They are also working with companies to keep private data sets in the cloud for other customers to use. So similar to how you would be able to spin up a Redhat AMI, by paying some license fee to Redhat, I believe they are looking at providing customers access to this private data sets by paying some fee to the company providing this data set. It is a win-win-win situation 🙂 for Amazon, the company providing the private data set and Amazon’s web services customers. They also support a one time import of data from physical disk or tape.

        Coming back to the title of this post :). I think this field is still in it’s infancy. Once companies start migrating their infrastructure to the cloud (And yes, it will happen. It is only a matter of time :).), it will be a lot easier to leverage the cloud to perform your analytics. All your data will be in the cloud and you start leveraging the hardware and software stacks in the cloud.

        LinkedIn Network Map

        LinkedIn (professional networking site) is providing a way to map your networks to see where you have your strongest connections. Here is a map of my networks. You can click on the image to get to the live map.

        My strongest connections so far are at

        I wish they came up with a map showing the location of my network too. That way, I can find out if I can get a job in New Zealand through my network :).

        HOW TO : Combining Perl and Zoho to produce reports

        This HOW TO is more for my notes. We had a request at work, where we had to parse some log files and create a graph from the data in the log files.

        The log files looked like this

        [bash]
        0m0.107s
        0m0.022s
        0m0.015s
        2011-01-05_02_22
        0m0.102s
        0m0.024s
        0m0.014s
        2011-01-05_02_23
        [/bash]

        I wrote the following perl script to get the log file to look as such

        [bash]| 0m0.107s| 0m0.022s| 0m0.015s| 2011-01-05 | 02:22

        | 0m0.102s| 0m0.024s| 0m0.014s| 2011-01-05 | 02:23 [/bash]

        perl script

        [perl]
        #!/usr/bin/perl
        # Modules to load
        # use strict;
        use warnings;

        # Variables
        my $inputFile = ‘input.txt’;
        my $version = 0.1;

        my $logFile = ‘parsed_input.csv’;

        # Sub Functions
        sub Log($$$);
        sub Trim($);

        # Clear the screen
        system $^O eq ‘MSWin32’ ? ‘cls’ : ‘clear’;

        # Open the output log file
        open(LOGFILE,"> $logFile") || die "Couldn’t open $logFile, exiting $!\n";

        # Open the input file
        open(INPUTFILE,"< $inputFile") || die "Couldn’t open $inputFile, exiting $!\n";

        # Process the input file, one line at a time
        while (defined ($line = <INPUTFILE>)) {
        chomp $line;
        # Check for blank line
        if ($line =~ /^$/)
        {
        # Start a new line in the output
        print LOGFILE "\n";
        }
        else
        {
        # Split the date and time
        if ($line =~ /2011/)
        {
        @date = split (/_/,$line);
        print LOGFILE "| $date[0] | $date[1]:$date[2]";
        }
        else
        {
        # Write the value to the output
        print LOGFILE "| $line";
        }
        }
        }
        [/perl]
        I then took the parsed log files and imported them into the cloud based reporting engine provided by Zoho at http://reports.zoho.com

        The final result are these reports

        SERVER1

        SERVER2

        Did I say, I love technology? 🙂

        HOW TO : Find out which network port a program is using in linux

        Quick way to figure out, which ports a particular program is using in linux

        [bash] netstat -plan | grep -i PROGRAM_NAME [/bash]

        Example : Check which ports SSH is listening on

        [bash]

        samurai@samurai:~$ sudo /bin/netstat -plan | grep sshd
        tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      5257/sshd
        tcp        0     52 123.123.123.123:22      124.124.124.124:32846     ESTABLISHED 3551/sshd: samurai
        tcp6       0      0 :::22                   :::*                    LISTEN      5257/sshd
        unix  3      [ ]         STREAM     CONNECTED     5893     3551/sshd: samurai
        unix  2      [ ]         DGRAM                    5849     3551/sshd: samurai

        [/bash]

        HOW TO : Manage startup services in Ubuntu

        Most Redhat/Fedora users are used to chkconfig and service for controlling the services/programs that startup at boot time. Here is how you do it in Ubuntu

        • Check status of a particular service

        [bash] sudo SERVICE_NAME status [/bash]

        Example : Check the status of Apache Web Service

        [bash]samurai@samurai:~$ sudo service apache2 status
        Apache is running (pid 3496).[/bash]

        • Add a service to start on bootup

        [bash] update-rc.d SERVICE_NAME add [/bash]

        Example : Configure squid to start on bootup

        [bash] update-rc.d squid add [/bash]

        • Stop a service from starting on bootup

        [bash] update-rc.d SERVICE_NAME remove [/bash]

        Example : Configure squid to NOT start on bootup

        [bash] update-rc.d squid remove [/bash]

        NOTE : You need to have a startup script in /etc/init.d for the service to ensure update-rc.d works fine.