For my own notes.. very nice post on perishablepress.com regarding using the different capabilities of mod_rewrite to secure your website (application)
http://perishablepress.com/eight-ways-to-blacklist-with-apaches-mod_rewrite/
For my own notes.. very nice post on perishablepress.com regarding using the different capabilities of mod_rewrite to secure your website (application)
http://perishablepress.com/eight-ways-to-blacklist-with-apaches-mod_rewrite/
Apache configuration to redirect traffic to a particular URL based on the pattern in the URL (AKA URI). In this particular example, I want to redirect any traffic that does not have the URL starting with /application or /content to redirect to https://domain_name/application
Explanation of the rule
Jboss uses the log4j framework for providing logging services. log4j is a very flexible framework and can do a lot of things. One of the features provided by log4j is to send log messages to multiple destinations. Here is a quick how to on configuring Jboss to send log messages using the syslog protocol to a syslog server. This is pretty useful, when you are trying to consolidate logs from multiple sources into a central location.
First, some background about how log4j is configured in Jboss
The log4j configuration in Jboss is managed by the file jboss-log4j.xml located at $JBOSS_HOME/server/$JBOSS_PROFILE/conf.
There are three parts to this configuration file
So pictorially, it would look like this
Getting back to the reason for this post, here is how you would enable the syslog appender and then configure a category to use this appender. For this example, we will use a class names org.kudithipudi
Couple of notes..
A very timely post on Hacker News by Ewan Leith about configuring a low end server to take ~11million hits/per month gave me some more ideas on optimizing the performance of this website. Ewan used a combination of nginx and varnish to get the server to respond to such traffic.
From my earlier post, you might recall, that I planned on checking out nginx as the web server, but then ended up using Apache. My earlier stack looked like this Based on the recommendations from Ewan’s article, I decided to add Varnish to the picture. So here is how the stack looks currently
And boy, did the performance improve or what. Here are some before and after performance charts based on a test run from blitz.io. The test lasted for 60 seconds and was for 250 simultaneous connections.
BEFORE
AFTER
What a difference!!.. The server in fact stopped responding after the first test and had to be hard rebooted. So how did I achieve it? By mostly copying the ideas from Ewan :). The final configuration for serving the web pages looks like this on the server end
Varnish (listens on TCP 80) –> Apache (listens on TCP 8080)
NOTE : All the configuration guides (as with the previous entries of the posts in this series) are specific to Ubuntu.
and you are ready to rock and roll.
There are some issues with this setup in terms of logging. Unlike your typical web server logs, where every request is logged, I noticed that not all the requests were being logged. I guess, that is because varnish is serving the content from cache. I have to figure out how to get that working. But that is for another post :).
I had to convert a scanned PDF file into an editable document recently. You can do this using OCR and there is a ton of software out there, that does this. There are even web based services that do this. But each of them had limitations (either had to buy the software or limit in the number of pages that can be scanned). I didn’t want to buy the license, since this is not something I would be doing regularly and the document I had to convert was 61 pages, so none of the online services allowed me to do it. I remembered reading that Google Docs, added this (OCR) capability a while ago and since I have a Google Apps account, I decided to give it a try.
Google also has a limit of 2 pages per OCR conversion. So after some brainstorming, I came up with this quick hack to use Google Docs for converting large PDF files into editable content.
I think someone with more programming chops than me can improve this by using the Google API to do the copy/paste from the smaller docs into the final document :).
Things have been a bit hectic at work.. so didn’t get a lot of time to work on this project. Now that that the new server has been setup and the kernel updated, we get down to the mundane tasks of installing the software.
One of the first things I do, when configuring any new server is to restrict root user from logging into the server remotely. SSH is the default remote shell access method nowadays. Pls don’t tell me you are still using telnet :).
And before restricting the root user for remote access, add a new user that you want to use for regular activities, add the user to sudo group and ensure you can login and sudo to root as this user. Here are the steps I follow to do this on a Ubuntu server
Add a new user
[code]useradd xxxx [/code]
Add user to sudo group
[code]usermod -G sudo -a xxxx[/code]
Check user can sudo to gain root access
[code]sudo su – xxxx
su – [/code]
Now moving into the software installation part
Install Mysql
[code]sudo apt-get install mysql-server [/code]
you will be prompted to set the root user during this install. This is quite convenient, unlike the older installs, where you had to set the root password later on.
Install PHP
[code]sudo apt-get install php5-mysql [/code]
In addition to installing the PHP5-mysql, this will also install apache. I know, I mentioned, I would like to try out the new version of Apache. But it looks like Ubuntu, doesn’t have a package for it yet. And I am too lazy to compile from source :).
With this you have all the basic software for wordpress. Next, we will tweak this software to use less system resources.
If you want to check the SSL certificate validation (expiry time, hostname match, self signed etc) using curl, you can do it by running
[code]curl -cacert URL_ADDRESS [/code]
Example : If you want to check the SSL certificate of GoDaddy
[code]curl -cacert https://www.godaddy.com [/code]
One of the capabilities of Jboss is that it can serve HTTP traffic. By default Jboss does not log any of the HTTP traffic in it’s log files. Here is a quick howto on enabling this logging. This post is specific to Jboss 4.x (ancient!!) and I will post another one soon on how do it in version 5.x and newer.
Edit the server.xml file located in $JBOSS_HOME/servers/$PROFILE/deploy/jboss-web.deployer and replace the commented out access logger section as such
FROM
[code]<!–
<Valve className="org.apache.catalina.valves.AccessLogValve"
prefix="localhost_access_log." suffix=".log"
pattern="common" directory="${jboss.server.log.dir}"
resolveHosts="false" />
–> [/code]
TO
[code]<Valve className="org.apache.catalina.valves.AccessLogValve"
prefix="localhost_access_log." suffix=".log"
pattern="common" directory="${jboss.server.log.dir}"
resolveHosts="false" /> [/code]
This will start creating a file with the format localhost_access_log.CURRENT_DATE.log in the $JBOSS_HOME/server/$PROFILE/log folder
But it isn’t fun if you just leave the default logging right :). The pattern formats of common and combined are similar to the standard apache logging options. But if you wanted to have certain content and format in the log files, you have a lot of options. Jboss community has documented all the data that is exposed through this valve at http://docs.jboss.org/jbossweb/latest/api/org/apache/catalina/valves/AccessLogValve.html
So say, I want to log the referrer header, user agent and the value of a cookie called JSESSONID and log all this data into a file called jboss_web_access_log, I setup the options as such
[code]<Valve className="org.apache.catalina.valves.AccessLogValve"
prefix="jboss_web_access_log." suffix=".log"
pattern="%h %p %l %u %t %r %s %b ‘%{Referer}i’ ‘%{User-Agent}i’ ‘%{JSESSIONID}c’"
directory="${jboss.server.log.dir}"
resolveHosts="false" /> [/code]
The uptime of this blog has been really bad recently. I switched to hosting it on a Rackspace virtual server last year and went with the cheapest option. A 256MB Linux virtual server that was costing me ~$12/month. I never got around to tuning the OS, so the server was always using swap and would go down pretty much every day. Last week, I upgraded the plan and moved to a 512MB server. But the uptime hasn’t been any better. Here’s a report from Pingdom (which by the way is a great service to track the uptime and responsiveness of your website) showing the availability of the site over the last year 96%!!.. And for someone that has been working in the operations and infrastructure world, that is unacceptable :). So my new goal is to maintain at least 99.5% uptime. Here is my plan to achieve this
I plan to blog the progress and learnings as I implement this plan.
Say you want to find out how many hits you are getting t0 a specific page from a particular source IP, you can use this quick collection of Linux tools to get this data
[code]grep -i "URL_TO_CHECK" PATH_TO_APACHE_ACCESS_LOG | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]
You are using
Example :
[code]grep -i "GET /" /opt/apache/logs/access_log | cut -d’ ‘ -f 1 -| sort |uniq -c | sort -rn > ~/ip_report.txt[/code]
gets you the report of hits to the index page.