Linux

Project Uptime : Progress Report – 2

After I installed the new kernel as mentioned in this update, I was still seeing the server booting up with the older version. I tried to have the kernel install option (apt-get install linux-image-VERSION) to overwrite the grub config, but the server wouldn’t boot up after that.

So after a lot of head scratching and googling, I found the solution. Rather than have the install program automatically update the grub config file, you have to manually edit it.

So install the latest kernel using the commands as mentioned in my previous post (https://kudithipudi.org/2012/03/07/project-uptime-progress-report-1/) and when prompted about the having an existing grub menu and if you want to overwrite it, just say no. Then do the following

  • Check the new kernel image filename by running [code] ll /boot/vmlinuz* [/code]
  • Edit the grub boot menu by editing the file /boot/grub/menu.1st and add a section for the new kernel image.
  • Edit the default boot option to the new kernel image (NOTE.. the sequence starts from 0)
  • Reboot the server and enjoy the new kernel

Here’s my before and after comparisons for the server I am building

BEFORE  (this is after the line ## End Default Options ##)

[code]

title Ubuntu 11.10, kernel 3.0.0-12-virtual
root (hd0)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 11.10, kernel 3.0.0-12-virtual (recovery mode)
root (hd0)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 11.10, kernel 3.0.0-16-virtual
root (hd0)
kernel /boot/vmlinuz-3.0.0-16-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-16-virtual
title Chainload into GRUB 2
root (hd0)
kernel /boot/grub/core.img

title Ubuntu 11.10, memtest86+
root (hd0)

[/code]

AFTER

[code]

title Ubuntu 11.10, kernel 3.0.0-12-virtual
root (hd0)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 11.10, kernel 3.0.0-12-virtual (recovery mode)
root (hd0)
kernel /boot/vmlinuz-3.0.0-12-virtual root=/dev/xvda1 console=hvc0 ro single
initrd /boot/initrd.img-3.0.0-12-virtual

title Ubuntu 11.10, kernel 3.0.0-16-virtual
root (hd0)
kernel /boot/vmlinuz-3.0.0-16-virtual root=/dev/xvda1 console=hvc0 ro quiet splash
initrd /boot/initrd.img-3.0.0-16-virtual
title Chainload into GRUB 2
root (hd0)
kernel /boot/grub/core.img

title Ubuntu 11.10, memtest86+
root (hd0)

[/code]

BTW.. Grub is the boot manager (and a lot more) in Linux.

HOW TO : Increasing number of processes that can be run by a user in Linux

By default, most of the Linux distros limit the number of processes that a user can spawn. This is put in place to limit (un)intended cases when a process might just fork off processes without a limit and bring down a server.

For RHEL (and CentOS), the default is 1024 processes per user. In some cases, you do need to increase the number of processes that a particular user can spawn. For example if you are running a database or an application server, you definitely want to tweak this number because these apps tend to create a lot of threads.

As a side note, if you run into this limitation on a machine running jboss, you typically see an error with the following string in your server logs [code]java.lang.OutOfMemoryError: unable to create new native thread.[/code]

. Looking at the error, one would think it is related to memory issues :).

OK.. back to the subject at hand. Here is the process for identifying your limits and then tweaking them as required in RHEL or CentOS.

  • Check the current limits on the number of processes a user can run by executing [code]ulimit -u[/code]
  • Edit the /etc/security/limits.conf file and add the required limits. You can get all the possible options by running man limits.conf. For example, if I wanted all the users to have a soft limit of 2000 and a hard limit of 4000, my limits.conf file wold look like this [code]# Increase the number of threads per process
    *       soft    nproc   200
    *       hard    nproc   4000 [/code]
  • Edit the /etc/security/limits.d/90-nproc.conf file and update it to have the same soft limits. By default, it has 1024 as the limit. So an updated file with my new limits as in the example above would look like this [code]
    # Default limit for number of user’s processes to prevent
    # accidental fork bombs.
    # See rhbz #432903 for reasoning.

    *          soft    nproc     2000[/code]

  • Restart the server. The updated settings won’t take affect until this is done
  • Check if you have the new limits by running [code]ulimit -u[/code]

You can also check the limits of a particular user by finding a process ID being executed by that user and running [code]sudo cat /proc/PROCESS_ID/limits [/code]

HOW TO : Clear unused swap memory in Linux

Inspired by a G+  post by Thomas Weeks .

swap memory is something used by the OS to essentially swap data to and forth if the main memory is not available. It is several times slower than RAM, since it uses hard disk to store the memory. And if you are constantly swapping, your system performance is going to be impacted quite a lot. You should always ensure that  your system is not swapping by adding the required RAM and/or stopping your application(s) from using so much memory. At times, because of spike in utilization, the OS might briefly use swap. And when it does, it doesn’t release the memory from swap. So from an analysis prospective, it makes it difficult to check (quickly) if your system is using swap or not. This is similar to errors on an interface in a router. Unless you clear them and monitor, you don’t know when the errors happened.

I was not aware that you could turn off swap devices while the OS is running and then enable them again. So here are the commands to do that in Linux

[code]swapoff -a[/code]

This essentially disables swap on all devices configured for swap in /etc/fstab

[code]swapon -a[/code]

This does the opposite of the first command. Enabled swap on all devices that have swap configured.

Tom put this into a nice alias by doing the following

[code]alias unswap=’sudo swapoff -a && sudo swapon -a'[/code]

Thx Tom…

Project Uptime : Progress Report – 1

Here is the first update on Project Uptime. I spun up a new server (doesn’t that sound so odd.. spun up a new server!! :)) with 512MB of RAM running Ubuntu 11.1o (Oneiric Ocelot). First order of business after spinning up the server?

  • Update to the latest and greatest patches

[code] sudo apt-get update [/code]

  • Update to the latest kernel.
    • First check the version of kernel you are running

[code] uname -r [/code]

    • Check the repository for latest version

[code] apt-cache search linux-image [/code]

    • Install latest version

[code] sudo apt-get install linux-image-LATEST-VERSION [/code]

    • Restart server

[code] sudo init 6 [/code]

HOW TO : Find size of directories in current directory

Quick note for self. Simple bash loop to find out the size of each directory in the existing directory. This script is useful if you are running our of disk space and want to quickly find out the offending directory.

[code]for dir in $(find ./ -maxdepth 1 -type d); do echo ${dir}; du -ch ${dir} | grep -i total; done [/code]

Breaking this down

  • The find command prints out a list of directories. You can modify it to do recursive lookups by just removing the -maxdepth option. This output is fed into the bash loop
  • du gets the size of all the files (and sub directories) in the directory and grepping it for total gives you the total size of the directory

 

Project : Uptime

The uptime of this blog has been really bad recently. I switched to hosting it on a Rackspace virtual server last year and went with the cheapest option. A 256MB Linux virtual server that was costing me ~$12/month. I never got around to tuning the OS, so the server was always using swap and would go down pretty much every day. Last week, I upgraded the plan and moved to a 512MB server. But the uptime hasn’t been any better. Here’s a report from Pingdom (which by the way is a great service to track the uptime and responsiveness of your website) showing the availability of the site over the last year 96%!!.. And for someone that has been working in the operations and infrastructure world, that is unacceptable :). So my new goal is to maintain at least 99.5% uptime. Here is my plan to achieve this

  1. Move to a fresh VM with the latest kernel
  2. Upgrade to the latest version of Apache. Initially, I wanted to move to nginx or lighttpd, but with the recent Apache upgrade, I hear good things about Apache working well in low memory situations.
  3. Upgrade to latest version of MySQL and tune it for memory usage
  4. Configure cloudflare to serve a static version of front page, in case the server goes down. Design the static page to point people to my other digital presences (Google+, LinkedIn, Flickr etc)

I plan to blog the progress and learnings as I implement this plan.

HOW TO : Sort Apache Web Logs for hits by Unique IP Addresses

 

Say you want to find out how many hits you are getting t0 a specific page from a particular source IP, you can use this quick collection of Linux tools to get this data

[code]grep -i "URL_TO_CHECK" PATH_TO_APACHE_ACCESS_LOG | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]

You are using

  • grep to filter the string of the page you want the report on
  • cut to get the IP address from the log file
  • sort and uniq to sort the unique IP addresses
  • and finally sort -rn to sort the data in descending order

Example :

[code]grep -i "GET /" /opt/apache/logs/access_log | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]

gets you the report of hits to the index page.

HOW TO : Find list of files used by a process in Linux

Quick howto on finding the list of files being accessed by a process in Linux. I needed to find this for troubleshooting an issue where a particular process was using an abnormally high percentage of CPU. I wanted to find out what this particular process was doing and accessing.

  1. Find the process ID (pid) of the process you want to analyze by running[code] ps -ef | grep NAME_OF_PROCESS [/code]
  2. Find the files the process is accessing at a given time by running[code]sudo ls -l /proc/PROCESS_ID/fd [/code]

For example, if I wanted to find the list of files being accessed by mysql, the process would look as such

[code] ps -ef | grep mysqld [/code]

which would show the output as

[code]samurai@samurai:~$ ps -ef | grep mysqld
mysql     3304     1  0 Feb04 ?        00:00:23 /usr/sbin/mysqld
samurai  23389 23374  0 14:57 pts/0    00:00:00 grep –color=auto mysqld
[/code]

I can then find the list of files being used by mysql by running

[code] sudo ls -l /proc/3304/fd [/code]

which would give me

[code]

lrwx—— 1 root root 64 Feb  7 15:00 0 -> /dev/null
lrwx—— 1 root root 64 Feb  7 15:00 1 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb  7 15:00 10 -> socket:[4958]
lrwx—— 1 root root 64 Feb  7 15:00 11 -> /tmp/ibdu9WRh (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 12 -> socket:[4959]
lrwx—— 1 root root 64 Feb  7 15:00 14 -> /var/lib/mysql/blog/wp_term_relatio                        nships.MYI
lrwx—— 1 root root 64 Feb  7 15:00 15 -> /var/lib/mysql/blog/wp_postmeta.MYI
lrwx—— 1 root root 64 Feb  7 15:00 17 -> /var/lib/mysql/blog/wp_term_relatio                        nships.MYD
lrwx—— 1 root root 64 Feb  7 15:00 18 -> /var/lib/mysql/blog/wp_term_taxonom                        y.MYI
lrwx—— 1 root root 64 Feb  7 15:00 2 -> /var/log/mysql/error.log
lrwx—— 1 root root 64 Feb  7 15:00 20 -> /var/lib/mysql/blog/wp_postmeta.MYD
lrwx—— 1 root root 64 Feb  7 15:00 21 -> /var/lib/mysql/blog/wp_term_taxonom                        y.MYD
lrwx—— 1 root root 64 Feb  7 15:00 22 -> /var/lib/mysql/blog/wp_terms.MYI
lrwx—— 1 root root 64 Feb  7 15:00 23 -> /var/lib/mysql/blog/wp_terms.MYD
lrwx—— 1 root root 64 Feb  7 15:00 3 -> /var/lib/mysql/ibdata1
lrwx—— 1 root root 64 Feb  7 15:00 4 -> /tmp/ibvANyz7 (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 5 -> /tmp/ibonS0mU (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 6 -> /tmp/ibcKctaH (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 7 -> /tmp/ibB5DS5t (deleted)
lrwx—— 1 root root 64 Feb  7 15:00 8 -> /var/lib/mysql/ib_logfile0
lrwx—— 1 root root 64 Feb  7 15:00 9 -> /var/lib/mysql/ib_logfile1
[/code]