Kudithipudi.Org

May 1, 2012

2012 May Project

Filed under: Technology,Web — Vinay @ 3:40 pm

Continuing on my project/month theme.. here is what I want to accomplish in May 2012

  • Understand Google App Engine and Windows Azure Platforms
  • Write a simple application and deploy it both the platforms
  • The application that I am envisioning will display the “user agent” string of the client trying to access the application. I know there are tons of sites that already do this.. but I think this is an useful tool to have in your bag of tricks :) . It is simple enough that I think I can program it in a month.

Why am I doing this? I understand the IaaS area pretty well, but am not well versed in the PaaS arena. Hoping this adventure will teach me some new things. And yes, I do plan on documenting my journey :) .

Wish me luck :)

April 30, 2012

HOW TO : Configure Jboss to not show backend server name when proxying https (ssl) traffic

Filed under: HOWTO,Technology,Web — Tags: — Vinay @ 7:32 pm

Phew.. that was a long title :) .  Was running into an issue with the setup shown in the picture below

When we try to access the web site using https, the html content being served back was showing the app server name as the reference, rather than the web site.

So in this example, let’s say the web address was kudithipudi.org and the app server was app-server-kudithipudi, the HTML content was showing https://app-server-kudithipudi:8080 as the source.

Here’s how, we fixed it.

Edit the server.xml file found in $JBOSS_HOME/server/$JBOSS_PROFILE/deploy/jboss-web.deployer and update the HTTPS connector to use the web address (kudithipudi.org) as the proxyName.

BEFORE

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
maxThreads="250" scheme="https" secure="true"
clientAuth="false" sslProtocol="TLS"
keystoreFile="/opt/jboss/jboss-as/server/kudithipudi/conf/ssl/kudithipudi.keystore"
keystorePass="xxxxxx" />

AFTER

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
maxThreads="250" scheme="https" secure="true"
clientAuth="false" sslProtocol="TLS"
proxyName="kudithipudi.org" proxyPort="443"
keystoreFile="/opt/jboss/jboss-as/server/kudithipudi/conf/ssl/kudithipudi.keystore"
keystorePass="xxxxxx" />

April 18, 2012

Project Uptime : Progress Report 7 : Putting the finishing touches

Filed under: HOWTO,Technology,Web — Vinay @ 1:24 am

We finally come to one of the last posts of Project Uptime. Now that all the components have been setup, I finally copied the wordpress directory from my old server to the new one. The only changes, I had to make after copying the files were

  1. Configure Apache to have the wordpress folder as the default directory. I did this by changing the DocumentRoot option in the vhost
  2. Changed the permissions on the wordpress directories (so that wordpress can make rewrite rule changes on the fly)
sudo chmod -v 664 $WORDPRESS_DIRECTORY/.htaccess

sudo chmod 755 $WORDPRESS_DIRECTORY/wp-content 

April 17, 2012

HOW TO : Configure Jboss to use hugepages in RHEL/CentOS

Filed under: HOWTO,Linux,Technology,Web — Vinay @ 5:23 pm

Most of us worry about paging to disk (swap), but if you are running a transaction intensive application the paging that happens in RAM also starts to impact the application performance. This happens due to the size of the “block” that is used to store data in memory. Hugepages allows you to store the data in bigger blocks, hence reducing the need to page while interacting with the data.

Here is how you can enable hugepages and configure jboss (actually any Java app) to use hugepages on a RHEL/CentoOS system.

OS CONFIGURATION

  1. Check if your system is capable of supporting hugepages by running
    grep HUGETLB /boot/config-`uname -r`

    If you see the response as below, you should be good

    CONFIG_HUGETLBFS=y
    CONFIG_HUGETLB_PAGE=y
    
  • Next check if huge pages are already being used by running
    cat /proc/sys/vm/nr_hugepages 
  1. If the response is anything other than 0, that means hugepages have already been configured.
  • Find the block size for hugepages by running
    cat /proc/meminfo | grep -i hugepagesize 
  • Calculate the amount of memory you want to dedicate to hugepages. (note: memory allocated to hugepages cannot be used by other processes in the system, unless they are configured to use it)
  1. For example, I want to dedicate 3GB of RAM for hugepages. So the number of hugepages would be
    (3*1024*1024)/2048
  • Configure the number of hugepages on the system by editing the /etc/sysctl.conf and adding the option
    vm.nr_hugepages = 1536

    (note: I put in 1536 since that was the value I got from the above example)

  • Restart the server and check if hugepages has been enabled by running
    cat /proc/meminfo | grep -i huge 
  1. You should see something like this
    AnonHugePages:    839680 kB
    HugePages_Total:    1500
    HugePages_Free:     1500
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB
    

JBOSS CONFIGURATION

  1. At this point your system is configured with hugepages and any application that is configured to use them can leverage them.  In this example, we want to configure Jboss to utilize these hugepages
  2. Add the groupid of the user that Jboss is running under to the /etc/sysctl.conf file. In my case, the jboss user group had a GID of 505, so I added this line to /etc/sysctl.conf
    vm.hugetlb_shm_group = 505 
  3. Next allocate the memory to the user by editing /etc/security/limits.conf and allocating the memory. Again, in my case, I added the following to /etc/security/limits.conf
    # Allocate memory for Jboss user to take advantage of hugepages
    jboss   soft    memlock 1500
    jboss   hard    memlock 1500
    
  4. Finally add the following to the Jboss startup parameters. I edited the $JBOSS_HOME/bin/run.sh file. (note: the startup file can be different based on your config) with the option
     -XX:+UseLargePages
  5. Restart Jboss and you are good to go

note : A lot articles that I read online say that hugepages are effective when you are allocating large amounts of RAM to the application. The use case of just using 3GB above was just that.. a use case.

While I cannot personally vouch for it, a lot of users have noted that they saw >2 fold increase in performance.

April 14, 2012

Project Uptime : Progress Report 6 : Tweaking Varnish

Filed under: Technology,Web — Vinay @ 12:54 am

The server has held up pretty well, since the installation of varnish. Based on this wiki post, I added the following to /etc/varnish/default.vcl

<pre>
# Drop any cookies sent to WordPress.
sub vcl_recv {
        if (!(req.url ~ "wp-(login|admin)")) {
                unset req.http.cookie;
        }
}

# Drop any cookies WordPress tries to send back to the client.
sub vcl_fetch {
        if (!(req.url ~ "wp-(login|admin)")) {
                unset beresp.http.set-cookie;
        }
}

I think the comments are pretty self explanatory.

April 11, 2012

Using Apache mod_rewrite for enhancing your application security

Filed under: HOWTO,Web — Vinay @ 8:50 am

For my own notes.. very nice post on perishablepress.com regarding using the different capabilities of mod_rewrite to secure your website (application)

http://perishablepress.com/eight-ways-to-blacklist-with-apaches-mod_rewrite/

April 9, 2012

HOW TO : Redirect web traffic based on URL patterns in Apache

Filed under: HOWTO,Web — Vinay @ 6:27 pm

Apache configuration to redirect traffic to a particular URL based on the pattern in the URL (AKA URI). In this particular example, I want to redirect any traffic that does not have the URL starting with /application or /content to redirect to https://domain_name/application

  • Enable the rewrite module in Apache
  • Add the following conditions in the conf file
    RewriteCond %{REQUEST_URI} !^/(application|content) [NC]
    RewriteRule ^/(.*) https://%{HTTP_HOST}/application [R,L]
    

Explanation of the rule

  • ! implies match if the string is not found
  • ^ implies start of string
  • | implies OR
  • [NC] implies not case sensitive (no case)
  • The rule will be triggered if the conditions match
  • [R,L] means external (client side) redirection and last rule to process

April 4, 2012

HOW TO : Configure Jboss to send log messages to syslog

Filed under: Technology,Web — Vinay @ 10:27 am

Jboss uses the log4j framework for providing logging services. log4j is a very flexible framework and can do a lot of things. One of the features provided by log4j is to send log messages to multiple destinations. Here is a quick how to on configuring Jboss to send log messages using the syslog protocol to a syslog server. This is pretty useful, when you are trying to consolidate logs from multiple sources into a central location.

First, some background about how log4j is configured in Jboss

The log4j configuration in Jboss is managed by the file jboss-log4j.xml located at $JBOSS_HOME/server/$JBOSS_PROFILE/conf.

There are three parts to this configuration file

  1. Appenders
    • An appender is a way to define a particular logging method. By default, Jboss provides a bunch of appenders in this config file, but only the FILE and CONSOLE appenders are enabled. The FILE appender writes the log messages to a log file and rotates them based on the criteria in the appender. The CONSOLE appender just sends messages to the console. This will come into picture, when you are not running Jboss as a service. In addition, there are appenders for syslog, snmp, email that are commented out.
  2. Categories
    • A category is where you define the class you want to log  messages for and which appender it should use. If you don’t specify an appender or the threshold for the logging level, logging for this class will be done at the default log levels and by the appender specified by the default (root) category.
  3. Default (root) Category
    • As mentioned above, this is the catch all for classes that are not specified specifically in the categories section.

So pictorially, it would look like this

Getting back to the reason for this post, here is how you would enable the syslog appender and then configure a category to use this appender. For this example, we will use a class names org.kudithipudi

  1. Enable the syslog appender by un-commenting the following section in the jboss-log4j.xml file
       <!-- Syslog events -->
    <appender name="SYSLOG">
    <errorHandler/>
    <param name="Threshold" value="ERROR"/>
    <param name="Facility" value="LOCAL7"/>
    <param name="FacilityPrinting" value="true"/>
    <param name="SyslogHost" value="localhost"/>
    <layout>
    <param name="ConversionPattern" value="[%d{ABSOLUTE},%c{1}] %m%n"/>
    </layout>
    </appender>
    
  2. Add a new category to use this appender
       <category name="org.kudithipudi">
    <priority value="INFO" />
    <appender-ref ref="SYSLOG"/>
    </category> 
  3. Restart Jboss and you should see messages from Jboss being sent to the syslog server

Couple of notes..

  • Even though we are specifying the threshold of INFO in the category, because we specified a threshold of ERROR in the appender, only message of ERROR type will be sent to the syslog server. This is actually pretty useful when you want to specify two appenders to a category and log them at different levels. You can set another appender to INFO level and add it to this category. And in essence, the appender will log everything of INFO and higher, while the syslog appender will only process ERROR messages.
  • The destination for the syslog messages is the SysLogHost parameter. In this example, I just used localhost.

April 2, 2012

Project Uptime : Progress Report 5 : Getting ready for Reddit and Hacker News

Filed under: Databases,HOWTO,Linux,Technology,Web — Vinay @ 9:03 pm

A very timely post on Hacker News by Ewan Leith about configuring a low end server to take ~11million hits/per month gave me some more ideas on optimizing the performance of this website. Ewan used a combination of nginx and varnish to get the server to respond to such traffic.

From my earlier post, you might recall, that I planned on checking out nginx as the web server, but then ended up using Apache. My earlier stack looked like this Based on the recommendations from Ewan’s article, I decided to add Varnish to the picture. So here is how the stack looks currently

And boy, did the performance improve or what. Here are some before and after performance charts based on a test run from blitz.io. The test lasted for 60 seconds and was for 250 simultaneous connections.

BEFORE

  • Screenshot of Response times and hit rates. Note that the server essentially stopped responding 25 minutes into the test.
  • Screenshot of the analysis summary. 84% error rate!!

AFTER

  • Screenshot of response times and hit rates
  • Screenshot of summary of Analysis. 99.98% success rate!!

 

What a difference!!.. The server in fact stopped responding after the first test and had to be hard rebooted.  So how did I achieve it? By mostly copying the ideas from Ewan :) . The final configuration for serving the web pages looks like this on the server end

Varnish (listens on TCP 80) –> Apache (listens on TCP 8080)

NOTE : All the configuration guides (as with the previous entries of the posts in this series) are specific to Ubuntu.

  1. Configure Apache to listen on port 8080
    1. Stop Apache
       sudo service apache2 stop 
    2. Edit the following files to change the default port from 80 to 8080
      1. /etc/apache2/ports.conf
        1. Change
          NameVirtualHost *:80
          Listen 80
          
        2. to
          NameVirtualHost *:8080
          Listen 8080
          
      2. /etc/apache2/sites-available/default.conf (NOTE: This is the default sample site that comes with the package. You can create a new one for your site.  If you do so, you need to edit your site specific conf file)
        1. Change
           <VirtualHost *:80> 
        2. To
          <VirtualHost *:8080> 
    3. Restart apache and ensure that it is listening on port 8080 by using this trick.
  2. Install Varnish and configure it to listen on port 80
    1. Add the Varnish repository to the system and install the package
      sudo curl http://repo.varnish-cache.org/debian/GPG-key.txt | apt-key add -
      sudo echo "deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-3.0" >> /etc/apt/sources.list
      sudo apt-get update
      sudo apt-get install varnish
      
    2. Configure Varnish to listen on port 80 and use 64Mb of RAM for caching. (NOTE: Varnish uses port 8080 to get to the backend, in this case Apache, by default. So there is no need to configure it specifically).
      1. Edit the file /etc/default/varnish
        1. Change
          DAEMON_OPTS="-a :6081 \
          -T localhost:6082 \
          -f /etc/varnish/default.vcl \
          -S /etc/varnish/secret \
          -s malloc,256m"
          
        2. To
           DAEMON_OPTS="-a :80 \
          -T localhost:6082 \
          -f /etc/varnish/default.vcl \
          -S /etc/varnish/secret \
          -s malloc,64m"
          
    3. Restart Varnish
      sudo service varnish restart

      and you are ready to rock and roll.

There are some issues with this setup in terms of logging. Unlike your typical web server logs, where every request is logged, I noticed that not all the requests were being logged. I guess, that is because varnish is serving the content from cache. I have to figure out how to get that working. But that is for another post :) .

March 31, 2012

HOW TO : Perform OCR on PDF files for free

Filed under: HOWTO,Technology,Web — Vinay @ 8:12 am

I had to convert a scanned PDF file into an editable document recently. You can do this using OCR and there is a ton of software out there, that does this. There are even web based services that do this. But each of them had limitations (either had to buy the software or limit in the number of pages that can be scanned). I didn’t want to buy the license, since this is not something I would be doing regularly and the document I had to convert was 61 pages, so none of the online services allowed me to do it. I remembered reading that Google Docs, added this (OCR) capability a while ago and since I have a Google Apps account, I decided to give it a try.

Google also has a limit of 2 pages per OCR conversion. So after some brainstorming, I came up with this quick hack to use Google Docs for converting large PDF files into editable content.

  1. Split the PDF file into two page documents using PDFsam (Open Source PDF Split and Merge Tool).
  2. Log into your Google Docs interface at http://docs.google.com . All you need is a Google Account to use this feature
  3. Create a folder (collection) to organize your files. This is not required, but it will make searching for the files a lot easier
  4. Check the settings to convert PDF files to editable
  5. Upload the PDF files you created in step 1.
  6. As you upload the files, Google creates an editable document with the text from the PDF files. You can then create a new document and copy/paste the content from all the smaller files.

I think someone with more programming chops than me can improve this by using the Google API to do the copy/paste from the smaller docs into the final document :) .

Older Posts »

Powered by WordPress