Technology

Lessons of the Trade : Purging Databases

We ran into an interesting issue at work recently. Documenting the solution for my records..

BACKGROUND : We had a table in one of our databases that served as a “hopping” point for some jobs. Data was inserted into this table and at jobs get kicked off at periodic intervals to “process” the data and delete it.

CURRENT METHOD : Launch multiple jobs to process the data and delete the rows as soon as the data is processed. This is causing locks on the table because there are multiple delete operations occurring at the same time. Which in turn means that the jobs cannot complete processing the data causing the table to grow in size.

PROPOSED METHOD : Add a new column to the table called “PROCESSED_STATE” and modify the “processing” jobs to set a flag “Y” in this column as soon as the data is processed. Create a new job that will be launched periodically, which checks the PROCESSED_STATE column and if the flag is set to “Y”, deletes the row.

Morale of the story.. πŸ™‚ .. Multiple deletes on a table are bad. Better way is to have multiple updates and one delete.

What happens when you get busy (lazy)?

Your site goes down πŸ™‚

And traffic to the site drops!!

Things have been a bit crazy at work recently, so I didn’t get a chance to fix the site as soon as it went down (due to an error I still haven’t figured out). And as a result, the traffic to the site dropped.

I finally took the chance to move the site to a dedicated server running on the RackSpace Cloud services. Am putting together a post on how I handled this migration and will publish it soon.

The bad news is that I have lost traffic to the site that I have built over a period of time.. the good news is that I am the master of my own house (website) at last :).

I like to be in control of my destiny

I don’t have a Facebook or twitter account and that surprises a lot of my friends since I am such a geek :). And the reason I keep (kept) giving was that I want to be in control of my destiny. In this case, destiny being content. While Facebook and Twitter provide you with a easy way to connect with friends/relatives/stalkers etc, I believe it gives a lot of leeway on control over the content for the companies running these applications. I have all the means and ways to communicate with my friends and advertise what I need to world. How I do that? That is a blog post that I have been “drafting” for the last couple of months :)..Hope to publish it soon. And looks like the wider audience is finally waking up to it.

Check out this article on ReadWriteWeb regd how the tech leaders are calling for a boycott of Facebook and advocating for an open social networking protocol (http://www.readwriteweb.com/archives/more_web_industry_leaders_quit_facebook_call_for_o.php).

Another article on the same website, speaks about a study by the Advanced Institute of Science and Technology in Korea, which shows that Twitter is really not a social networking site, but more of a medium to broadcast your content (http://www.readwriteweb.com/archives/study_twitter_isnt_very_social.php). Doesn’t really support the argument I made earlier that Twitter is not going to make it..but it certainly supports the notion that once the hype is gone the influence of twitter as a medium will decrease.

Express.com DNS outage

I am sure a lot of people shop on express.com , but I probably get the credits for being the first blogger to post that express.com has not been responding to DNS queries since ~7:00 PM CST (4/26). Looks like Qwest is hosting DNS for Express. The name servers (most probably global load balancers) are not responding to DNS requests.

Here’s what I get, when I queried for www.express.com

Nameserver trace for www.express.com:

  • Looking for who is responsible for root zone and followed h.root-servers.net.
  • Looking for who is responsible for com and followed h.gtld-servers.net.
  • Looking for who is responsible for express.com and followed dca-ans-01.inet.qwest.net.

Nameservers for www.express.com:

  • dca-ans-01.inet.qwest.net returned (NORECORDS)
  • svl-ans-01.inet.qwest.net returned (NORECORDS)

I feel for the poor ops team scrambling around to bring up the service :). Another reason, you want diversity in your DNS hosting.

HOW TO : Configure mime type mappings in Jboss

Instructions for configuring the mime type mappings in Jboss. Mime types essentially tells the application processing the content (typically a browser), what the content is. More information here (http://en.wikipedia.org/wiki/Internet_media_type).

  • Locate the web.xml file for your Jboss instance. It is usually in $JBOSS_HOME/server/INSTANCE/deploy/jboss-web.deployer/conf/web.xml
  • Locate the setting <mime-mapping> and make the required edits. For example, the code for defining the mime type for javascript looks like this



js
application/javascript

  • Restart Jboss

Burj Dubai is down..

Not the building :).. In fact, it just opened up officially today. But it looks like the IT team of the Burj did not anticipate the traffic spike to it’e website http://www.burjdubai.com/, when the building opens. The site has been down since early CST.

All the free publicity the site is getting from the media is wasted because the site is down (OK.. I am exagerrating things a bit πŸ™‚ ).  If only the IT team at the Burj thought about this and deployed the site on a CDN, they could have averted this downtime. Using a CDN to power your site is becoming more of a norm than a luxury now a days. And with all the options in the market you have for a CDN, there is no excuse for any IT team to not implement this for a customer facing website.

HOW TO : Improve Jboss startup times

We run multiple applications in Jboss at my work and one of the applications used to take an inordinate time to come up. A typical application would take < 1 minute to get deployed and this particular application for some reason was taking ~7-8 minutes. We initially thought it was a bug in the code and gave hell to our development team :).. But on closer investigation, we found out that a feature we enabled in the Jboss server settings which allows content to be hosted on network storage was causing the issue.

I blogged the feature in Jboss to follow sym links here (https://kudithipudi.org/2008/07/25/howto-configure-jboss-to-follow-symbolic-links/). So essentially when Jboss was started, it was checking all the content in these network path to check for applications to deploy. And traversing a network share with 1000s of directories isn’t fun :)..

We fixed it by making a simple edit to the start up script. Here’s the psuedo code for the script

  1. Remove soft links to network share
  2. Start Jboss
  3. Put soft links to network share

And now the application starts in less than a minute :).

I guess there might be other elegant ways to do this. i.e. Configure Jboss to only deploy certain applications, but this did the trick for us :).

HOW TO : Advanced search and replace in Notepad++

Jhanvi asked me to help with editing a text file recently. She had a file in the format


'512'
'345'
'876'

and needed to convert it into the format below


INSERT INTO BLAH VALUE ('512');
INSERT INTO BLAH VALUE ('345');
INSERT INTO BLAH VALUE ('876');

There are multiple ways, one can do this. Here is how I did this using Notepad++, an open source text editor. I used the regular expression capability of Notepad++ it’s search and replace function.

  • Press “Ctrl + h” to bring up the search and replace window.
  • Replace the single quote at the beginning of the line by using ” ^’ “
  • Replace the single quote at the end of the line by using ” ‘$ “

Screenshots from the operation

The data in it’s original format

Replacing the first quote mark

Data after the first search and replace operation

Replacing the second quote mark

Data in the final format

HOW TO : Load/Stress test a Linux based server

We ran into an issue at work recently, which prompted us to do some performance testing on some of our Linux servers. The requirement was to stress test the key components of the server (CPU, RAM, HDD, Network) and prove that different servers with the same configuration were capable of performing identically. Pretty simple right :).. The challenge was to find tools that could be run to stress test each of the components. There were a lot of tools for CPU and memory (RAM) testing, but not a lot for network and hard drive (HDD) testing. After searching high and low, we found a couple of tools, that I wanted to document here for future reference.

HDD Testing :

I found a pretty interesting tool called Iozone written by William Norcott (Oracle) and Don Capps. You can get the source code and builds for major OSs at http://iozone.org . Despite installing the program using RPM, we were not able to  run the program without specifying the complete path.

There are a ton of options for the program, but the easiest method to run it was in automated mode with the output going to an Excel spreadsheet (more like a glorified CSV file πŸ™‚ ). Here is the command we used

/opt/iozone/bin/iozone -a -Rb output_excel_file.xls

The “-a” is to tell the program in automated mode and the “-Rb” is to tell the program to format the output in Excel format. And you can then go ahead and open the spreadsheet in Excel and create 3D graphs to check and compare the output.

Network Testing :

Most of the information out there in terms of testing the network stack of a machine is either to copy large files over a network share or via FTP. We didn’t find that was enough to really max out a Gigport since there were protocol limitations that didn’t allow us to saturate the network port. After some searching, we stumbled across a tool called “ettcp” on Sourceforge. ettcp itself is an offshoot of ttcp. ttcp (stands for test tcp) was created to test network performance between two nodes. I couldn’t find any place to download ttcp itself, but you can download ettcp at http://ettcp.sourceforge.net/.

We used a server, to act as a common receiver for all the servers we intended to do a performance test on. Here are the commands we used to run the test

RECEIVER (Common Server)
./ettcp -r -s -f M

The options are

  • “-r” for designating the machine as receiver
  • “-f M” for showing the output in Mega Bytes.

TRANSMITTER (Test Servers)
./ettcp -t -s receiver_hostname -n 10000000 -f M

the options are

  • “-t” for designating the machine as transmitter
  • “-s receiver_hostname” to define the receiver
  • “-n” to define the number of packets to send to the receiver