Technology

Project Uptime : Progress Report 5 : Getting ready for Reddit and Hacker News

A very timely post on Hacker News by Ewan Leith about configuring a low end server to take ~11million hits/per month gave me some more ideas on optimizing the performance of this website. Ewan used a combination of nginx and varnish to get the server to respond to such traffic.

From my earlier post, you might recall, that I planned on checking out nginx as the web server, but then ended up using Apache. My earlier stack looked like this Based on the recommendations from Ewan’s article, I decided to add Varnish to the picture. So here is how the stack looks currently

And boy, did the performance improve or what. Here are some before and after performance charts based on a test run from blitz.io. The test lasted for 60 seconds and was for 250 simultaneous connections.

BEFORE

  • Screenshot of Response times and hit rates. Note that the server essentially stopped responding 25 minutes into the test.
  • Screenshot of the analysis summary. 84% error rate!!

AFTER

  • Screenshot of response times and hit rates
  • Screenshot of summary of Analysis. 99.98% success rate!!

 

What a difference!!.. The server in fact stopped responding after the first test and had to be hard rebooted.  So how did I achieve it? By mostly copying the ideas from Ewan :). The final configuration for serving the web pages looks like this on the server end

Varnish (listens on TCP 80) –> Apache (listens on TCP 8080)

NOTE : All the configuration guides (as with the previous entries of the posts in this series) are specific to Ubuntu.

  1. Configure Apache to listen on port 8080
    1. Stop Apache [code] sudo service apache2 stop [/code]
    2. Edit the following files to change the default port from 80 to 8080
      1. /etc/apache2/ports.conf
        1. Change [code]NameVirtualHost *:80
          Listen 80
          [/code]
        2. to [code]NameVirtualHost *:8080
          Listen 8080
          [/code]
      2. /etc/apache2/sites-available/default.conf (NOTE: This is the default sample site that comes with the package. You can create a new one for your site.  If you do so, you need to edit your site specific conf file)
        1. Change [code] <VirtualHost *:80> [/code]
        2. To [code]<VirtualHost *:8080> [/code]
    3. Restart apache and ensure that it is listening on port 8080 by using this trick.
  2. Install Varnish and configure it to listen on port 80
    1. Add the Varnish repository to the system and install the package[code]sudo curl http://repo.varnish-cache.org/debian/GPG-key.txt | apt-key add –
      sudo echo "deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-3.0" >> /etc/apt/sources.list
      sudo apt-get update
      sudo apt-get install varnish
      [/code]
    2. Configure Varnish to listen on port 80 and use 64Mb of RAM for caching. (NOTE: Varnish uses port 8080 to get to the backend, in this case Apache, by default. So there is no need to configure it specifically).
      1. Edit the file /etc/default/varnish
        1. Change [code]DAEMON_OPTS="-a :6081 \
          -T localhost:6082 \
          -f /etc/varnish/default.vcl \
          -S /etc/varnish/secret \
          -s malloc,256m"
          [/code]
        2. To [code] DAEMON_OPTS="-a :80 \
          -T localhost:6082 \
          -f /etc/varnish/default.vcl \
          -S /etc/varnish/secret \
          -s malloc,64m"
          [/code]
    3. Restart Varnish [code]sudo service varnish restart[/code]

      and you are ready to rock and roll.

There are some issues with this setup in terms of logging. Unlike your typical web server logs, where every request is logged, I noticed that not all the requests were being logged. I guess, that is because varnish is serving the content from cache. I have to figure out how to get that working. But that is for another post :).

HOW TO : Perform OCR on PDF files for free

I had to convert a scanned PDF file into an editable document recently. You can do this using OCR and there is a ton of software out there, that does this. There are even web based services that do this. But each of them had limitations (either had to buy the software or limit in the number of pages that can be scanned). I didn’t want to buy the license, since this is not something I would be doing regularly and the document I had to convert was 61 pages, so none of the online services allowed me to do it. I remembered reading that Google Docs, added this (OCR) capability a while ago and since I have a Google Apps account, I decided to give it a try.

Google also has a limit of 2 pages per OCR conversion. So after some brainstorming, I came up with this quick hack to use Google Docs for converting large PDF files into editable content.

  1. Split the PDF file into two page documents using PDFsam (Open Source PDF Split and Merge Tool).
  2. Log into your Google Docs interface at http://docs.google.com . All you need is a Google Account to use this feature
  3. Create a folder (collection) to organize your files. This is not required, but it will make searching for the files a lot easier
  4. Check the settings to convert PDF files to editable
  5. Upload the PDF files you created in step 1.
  6. As you upload the files, Google creates an editable document with the text from the PDF files. You can then create a new document and copy/paste the content from all the smaller files.

I think someone with more programming chops than me can improve this by using the Google API to do the copy/paste from the smaller docs into the final document :).

Project Uptime : Progress Report – 4

Continuing to lock down the server as part of project uptime a bit more.. I highly recommend enabling and using iptables on every Linux server. I want to restrict inbound traffic to the server to only SSH (tcp port 22) and HTTP(S) (tcp port 80/443). Here’s the process

Check the current rules on the server

[code]sudo iptables -L [/code]

Add rules to allow SSH, HTTP and HTTPS traffic and all traffic from the loopback interface

[code]sudo iptables -I INPUT -i lo -j ACCEPT
sudo iptables -A INPUT -m conntrack –ctstate RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A INPUT -p tcp –dport ssh -j ACCEPT
sudo iptables -A INPUT -p tcp –dport http -j ACCEPT
sudo iptables -A INPUT -p tcp –dport https -j ACCEPT
[/code]

Drop any traffic that doesn’t match the above mentioned criteria

[code]sudo iptables -A INPUT -j DROP [/code]

save the config and create script for the rules to survive reboots by running

[code]sudo su –
iptables-save > /etc/firewall.rules[/code]

now create a simple script that will load these rules during startup. Ubuntu provides a pretty neat way to do this. You can write a simple script and place it in /etc/network/if-pre-up.d and the system will execute this before bringing up the interfaces. You can get pretty fancy with this, but here is a simple scrip that I use

[code]
samurai@samurai:/etc/network/if-pre-up.d$ cat startfirewall
#!/bin/bash

# Import iptables rules if the rules file exists

if [ -f /etc/firewall.rules ]; then
iptables-restore </etc/firewall.rules
fi

exit 0
[/code]

Now you can reboot the server and check if your firewall rules are still in effect by running

[code]sudo iptables -L [/code]

Project Uptime : Progress Report – 3

Things have been a bit hectic at work.. so didn’t get a lot of time to work on this project. Now that that the new server has been setup and the kernel updated, we get down to the mundane tasks of installing the software.

One of the first things I do, when configuring any new server is to restrict root user from logging into the server remotely. SSH is the default remote shell access method nowadays. Pls don’t tell me you are still using telnet :).

And before restricting the root user for remote access, add a new user that you want to use for regular activities, add the user to sudo group and ensure you can login and sudo to root as this user. Here are the steps I follow to do this on a Ubuntu server

Add a new user

[code]useradd xxxx [/code]

Add user to sudo group

[code]usermod -G sudo -a xxxx[/code]

Check user can sudo to gain root access

[code]sudo su – xxxx
su – [/code]

Now moving into the software installation part

Install Mysql

[code]sudo apt-get install mysql-server [/code]

you will be prompted to set the root user during this install. This is quite convenient, unlike the older installs, where you had to set the root password later on.

Install PHP

[code]sudo apt-get install php5-mysql [/code]

In addition to installing the PHP5-mysql, this will also install apache. I know, I mentioned, I would like to try out the new version of Apache. But it looks like Ubuntu, doesn’t have a package for it yet. And I am too lazy to compile from source :).

With this you have all the basic software for wordpress. Next, we will tweak this software to use less system resources.

DID YOU KNOW : Windows mobile and wildcard certs don't work together

Wildcard SSL certificates allow you to use one certificate for all sub domains (up to one level) of a host. Say I got a wildcard SSL certificate for *.kudithipudi.org, I would be able to use it to provide SSL on blah.kudithipudi.org, ssltest.kudithipudi.org, youcannotbeserious.kudithipudi.org and the clients won’t complaint about it.

For some reason though, Windows Mobile phones don’t like wildcard certs. So if you are ever scratching your head, why every other client works, but windows mobile devices don’t..stop scratching and get a regular SSL certificate for your website/application.

Apparently, this is the case with

  • Windows CE
  • Windows Mobile 5.0
  • Windows Mobile 6.0
  • Windows Mobile 7.0

Don’t you get the feeling that someone keeps using the same library and never bothered to check/fix it? And searching on MSDN or any other Microsoft resource won’t provide you this information. This is my own deduction after beating my head against the wall for more than 3 days :).

Project Uptime : Progress Report – 2

After I installed the new kernel as mentioned in this update, I was still seeing the server booting up with the older version. I tried to have the kernel install option (apt-get install linux-image-VERSION) to overwrite the grub config, but the server wouldn’t boot up after that.

So after a lot of head scratching and googling, I found the solution. Rather than have the install program automatically update the grub config file, you have to manually edit it.

So install the latest kernel using the commands as mentioned in my previous post (https://kudithipudi.org/2012/03/07/project-uptime-progress-report-1/) and when prompted about the having an existing grub menu and if you want to overwrite it, just say no. Then do the following

  • Check the new kernel image filename by running [code] ll /boot/vmlinuz* [/code]
  • Edit the grub boot menu by editing the file /boot/grub/menu.1st and add a section for the new kernel image.
  • Edit the default boot option to the new kernel image (NOTE.. the sequence starts from 0)
  • Reboot the server and enjoy the new kernel

Here’s my before and after comparisons for the server I am building

BEFORE  (this is after the line ## End Default Options ##)

[code]

title Ubuntu 11.10, kernel 3.0.0-12-virtual
root (hd0)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 11.10, kernel 3.0.0-12-virtual (recovery mode)
root (hd0)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 11.10, kernel 3.0.0-16-virtual
root (hd0)
kernel /boot/vmlinuz-3.0.0-16-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-16-virtual
title Chainload into GRUB 2
root (hd0)
kernel /boot/grub/core.img

title Ubuntu 11.10, memtest86+
root (hd0)

[/code]

AFTER

[code]

title Ubuntu 11.10, kernel 3.0.0-12-virtual
root (hd0)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 11.10, kernel 3.0.0-12-virtual (recovery mode)
root (hd0)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 11.10, kernel 3.0.0-16-virtual
root (hd0)
kernel /boot/vmlinuz-3.0.0-16-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-16-virtual
title Chainload into GRUB 2
root (hd0)
kernel /boot/grub/core.img

title Ubuntu 11.10, memtest86+
root (hd0)

[/code]

BTW.. Grub is the boot manager (and a lot more) in Linux.

HOW TO : Configure Jboss for writing web access logs

One of the capabilities of Jboss is that it can serve HTTP traffic. By default Jboss does not log any of the HTTP traffic in it’s log files. Here is a quick howto on enabling this logging. This post is specific to Jboss 4.x (ancient!!) and I will post another one soon on how do it in version 5.x and newer.

Edit the server.xml file located in $JBOSS_HOME/servers/$PROFILE/deploy/jboss-web.deployer and replace the commented out access logger section as such

FROM

[code]<!–
<Valve className="org.apache.catalina.valves.AccessLogValve"
prefix="localhost_access_log." suffix=".log"
pattern="common" directory="${jboss.server.log.dir}"
resolveHosts="false" />
–> [/code]

TO

[code]<Valve className="org.apache.catalina.valves.AccessLogValve"
prefix="localhost_access_log." suffix=".log"
pattern="common" directory="${jboss.server.log.dir}"
resolveHosts="false" /> [/code]

This will start creating a file with the format localhost_access_log.CURRENT_DATE.log in the $JBOSS_HOME/server/$PROFILE/log folder

But it isn’t fun if you just leave the default logging right :). The pattern formats of common and combined are similar to the standard apache logging options. But if you wanted to have certain content and format in the log files, you have a lot of options. Jboss community has documented all the data that is exposed through this valve at http://docs.jboss.org/jbossweb/latest/api/org/apache/catalina/valves/AccessLogValve.html

So say, I want to log the referrer header, user agent and the value of a cookie called JSESSONID and log all this data into a file called jboss_web_access_log, I setup the options as such

[code]<Valve className="org.apache.catalina.valves.AccessLogValve"
prefix="jboss_web_access_log." suffix=".log"
pattern="%h %p %l %u %t %r %s %b ‘%{Referer}i’ ‘%{User-Agent}i’ ‘%{JSESSIONID}c’"
directory="${jboss.server.log.dir}"
resolveHosts="false" /> [/code]