Databases

HOWTO : Query json data in SQLite

A self note for querying json data in SQLite. BTW, I think SQLite is an under utilized and under appreciated swiss army tool for data storage and manipulation. And thanks to Richard Hipp, it is free.

If you have a column defined as a json type in your SQLite database, quickest way to search for the data is json_extract. A full set of functions available are documented at https://www.sqlite.org/json1.html

If you have a column named family_details in a table family with the following json in it as an example

{
	"father": {
		"name": "dad",
		"birthday": "1/1/2000",
		"pet_name": "daddy"
	},
	"mother": {
		"name": "mom",
		"birthday": "1/1/2001",
		"pet_name": "mommy"
	},
	"sons": [
		{
			"name": "son_one",
			"birthday": "1/2/2020",
			"pet_name": "sonny_one"
		},
		{
			"name": "son_two",
			"birthday": "1/2/2021",
			"pet_name": "sonny_two"
		}
	],
	"daughters": [
		{
			"name": "princess_one",
			"birthday": "1/2/2020",
			"pet_name": "princy_one"
		},
		{
			"name": "princess_two",
			"birthday": "1/2/2021",
			"pet_name": "princy_two"
		}
	]
}

and you want to print the name of the father, you can use

select json_extract(family_details, '$.father.name') as father_name
from family

json_extract uses the name of the column and the json node as parameters. In this case, we used $(which denotes the root), father and name (under father) as the json node.

HOW TO : Search for a record in MongoDB based on length

Quick entry for my own records.

MongoDB is one of the popular open source document database that is part of the nosql movement. One of the applications we deployed at work uses MongoDB as an internal storage engine. We ran into an issue where MongoDB was trying to replicate data to MySQL and the replication stopped because of a size mismatch for an object between MongoDB and MySQL. Essentially MongoDB was trying to insert a record into MySQL that was larger than the defined length.

Here is the query we used to find the culprit objects. We used the awesome Robomongo client to connect to the MongoDB instance.

[code]db.some_table_to_search.find({$where:"this.some_column_to_search.length > 40"})[/code]

Breaking down the command

db -> Specifies the database you are trying to search

some_table_to_search -> Specifie the table you are trying to search

some_column_to_search -> Specified the particular column you are trying to search.

In this specific example, we were looking for entries longer than 40 characters for this column.

If you come from the traditional RDBMS world, here is a link from MongoDB comparing terminology between RDBMS and MongoDB.

http://docs.mongodb.org/manual/reference/sql-comparison/

Project Uptime : Progress Report 5 : Getting ready for Reddit and Hacker News

A very timely post on Hacker News by Ewan Leith about configuring a low end server to take ~11million hits/per month gave me some more ideas on optimizing the performance of this website. Ewan used a combination of nginx and varnish to get the server to respond to such traffic.

From my earlier post, you might recall, that I planned on checking out nginx as the web server, but then ended up using Apache. My earlier stack looked like this Based on the recommendations from Ewan’s article, I decided to add Varnish to the picture. So here is how the stack looks currently

And boy, did the performance improve or what. Here are some before and after performance charts based on a test run from blitz.io. The test lasted for 60 seconds and was for 250 simultaneous connections.

BEFORE

  • Screenshot of Response times and hit rates. Note that the server essentially stopped responding 25 minutes into the test.
  • Screenshot of the analysis summary. 84% error rate!!

AFTER

  • Screenshot of response times and hit rates
  • Screenshot of summary of Analysis. 99.98% success rate!!

 

What a difference!!.. The server in fact stopped responding after the first test and had to be hard rebooted.  So how did I achieve it? By mostly copying the ideas from Ewan :). The final configuration for serving the web pages looks like this on the server end

Varnish (listens on TCP 80) –> Apache (listens on TCP 8080)

NOTE : All the configuration guides (as with the previous entries of the posts in this series) are specific to Ubuntu.

  1. Configure Apache to listen on port 8080
    1. Stop Apache [code] sudo service apache2 stop [/code]
    2. Edit the following files to change the default port from 80 to 8080
      1. /etc/apache2/ports.conf
        1. Change [code]NameVirtualHost *:80
          Listen 80
          [/code]
        2. to [code]NameVirtualHost *:8080
          Listen 8080
          [/code]
      2. /etc/apache2/sites-available/default.conf (NOTE: This is the default sample site that comes with the package. You can create a new one for your site.  If you do so, you need to edit your site specific conf file)
        1. Change [code] <VirtualHost *:80> [/code]
        2. To [code]<VirtualHost *:8080> [/code]
    3. Restart apache and ensure that it is listening on port 8080 by using this trick.
  2. Install Varnish and configure it to listen on port 80
    1. Add the Varnish repository to the system and install the package[code]sudo curl http://repo.varnish-cache.org/debian/GPG-key.txt | apt-key add –
      sudo echo "deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-3.0" >> /etc/apt/sources.list
      sudo apt-get update
      sudo apt-get install varnish
      [/code]
    2. Configure Varnish to listen on port 80 and use 64Mb of RAM for caching. (NOTE: Varnish uses port 8080 to get to the backend, in this case Apache, by default. So there is no need to configure it specifically).
      1. Edit the file /etc/default/varnish
        1. Change [code]DAEMON_OPTS="-a :6081 \
          -T localhost:6082 \
          -f /etc/varnish/default.vcl \
          -S /etc/varnish/secret \
          -s malloc,256m"
          [/code]
        2. To [code] DAEMON_OPTS="-a :80 \
          -T localhost:6082 \
          -f /etc/varnish/default.vcl \
          -S /etc/varnish/secret \
          -s malloc,64m"
          [/code]
    3. Restart Varnish [code]sudo service varnish restart[/code]

      and you are ready to rock and roll.

There are some issues with this setup in terms of logging. Unlike your typical web server logs, where every request is logged, I noticed that not all the requests were being logged. I guess, that is because varnish is serving the content from cache. I have to figure out how to get that working. But that is for another post :).

Project Uptime : Progress Report – 3

Things have been a bit hectic at work.. so didn’t get a lot of time to work on this project. Now that that the new server has been setup and the kernel updated, we get down to the mundane tasks of installing the software.

One of the first things I do, when configuring any new server is to restrict root user from logging into the server remotely. SSH is the default remote shell access method nowadays. Pls don’t tell me you are still using telnet :).

And before restricting the root user for remote access, add a new user that you want to use for regular activities, add the user to sudo group and ensure you can login and sudo to root as this user. Here are the steps I follow to do this on a Ubuntu server

Add a new user

[code]useradd xxxx [/code]

Add user to sudo group

[code]usermod -G sudo -a xxxx[/code]

Check user can sudo to gain root access

[code]sudo su – xxxx
su – [/code]

Now moving into the software installation part

Install Mysql

[code]sudo apt-get install mysql-server [/code]

you will be prompted to set the root user during this install. This is quite convenient, unlike the older installs, where you had to set the root password later on.

Install PHP

[code]sudo apt-get install php5-mysql [/code]

In addition to installing the PHP5-mysql, this will also install apache. I know, I mentioned, I would like to try out the new version of Apache. But it looks like Ubuntu, doesn’t have a package for it yet. And I am too lazy to compile from source :).

With this you have all the basic software for wordpress. Next, we will tweak this software to use less system resources.

HOW TO : Dowload content from Oracle Metalink (Support) using wget

The usual process for a DBA to download files from Oracle Metalink (support) site is

  • Login to Metalink from his/her workstation
  • Download the file
  • Upload the file to the database server
  • Use the file

Say your database is in a data center and your workstation doesn’t have high speed connectivity to the data center, you can use the following trick to download content to a l[u]inux server in the data center that has Internet connectivity (and hopefully it is not your database server 🙂 ).

  • Log into Metalink from your workstation
  • Grab the link to the file/content you want to download (for example, we recently tried to download clusterware for Oracle 11G, and the link was http://download.oracle.com/otn/linux/oracle11g/linux.x64_11gR1_clusterware.zip)
  • Log into a server in your data center (it should have connectivity to the Internet and also to your database server)
  • Download the file using wget

[bash]wget http://download.oracle.com/otn/linux/oracle11g/linux.x64_11gR2_clusterware.zip –user ORACLE_ID –password ORACLE_ID_PASSWORD[/bash]

  • Replace the link with the link to your content and use your Oracle ID and password.
  • The file downloaded will have a strange name since wget  appends the sessionID to the end of the file. In the example I used above, the name of the file was “linux.x64_11gR2_clusterware.zip\?e\=1297470492\&h\=a66b265cc967a68c611052cb8e54356f
  • Rename the file and strip off the unnecessary data in the name using mv

Lessons of the Trade : Purging Databases

We ran into an interesting issue at work recently. Documenting the solution for my records..

BACKGROUND : We had a table in one of our databases that served as a “hopping” point for some jobs. Data was inserted into this table and at jobs get kicked off at periodic intervals to “process” the data and delete it.

CURRENT METHOD : Launch multiple jobs to process the data and delete the rows as soon as the data is processed. This is causing locks on the table because there are multiple delete operations occurring at the same time. Which in turn means that the jobs cannot complete processing the data causing the table to grow in size.

PROPOSED METHOD : Add a new column to the table called “PROCESSED_STATE” and modify the “processing” jobs to set a flag “Y” in this column as soon as the data is processed. Create a new job that will be launched periodically, which checks the PROCESSED_STATE column and if the flag is set to “Y”, deletes the row.

Morale of the story.. 🙂 .. Multiple deletes on a table are bad. Better way is to have multiple updates and one delete.

HOW TO : Pass environment variables when using sudo

Say you need to sudo as a particular user and run a command and at the same time you need to pass an environmental variable to the command, you can do it by passing the command in doublequotes.

For example, I want to start Oracle while I am logged in as another user (vinay), I can start the database using dbstart by issues

"sudo su - oracle -c "dbstart /$ORACLE_HOME"

$ORACLE_HOME is an environmental variable listed under user oracle’s environment.

Needless to say, you need to ensure that you have sudo configured to allow your userID to su to oracle.

Lessons of the trade : Troubleshooting database perfromance

If you have ever worked in an IT shop, you will know that the one thing you cannot escape from is issues related to Database performance. Now, I am no DBA in any way or fashion, but thought I should record some of the common issues and ways they have been overcome in my career so far. More for self records than trying to teach someone :).

  • Network Related
    • Check if the NIC (network interface) on the DB server has any speed mis match with the network device (most probably a switch) that it is connected to.
    • Check the latency between the application and the DB (pertains to applications connecting to DB over WAN links)
  • System Related
    • Check if a rouge process is using all the system resources.
    • Check if the disk sub system is performing optimally
      • Recommend using RAID 10 for transactional systems.
      • Don’t forget to check those batteries on the RAID controllers 🙂
    • Check if the DB is just running out of gas.
      • Long term capacity trending records come in handy here.
  • Database Related
    • Views are evil.. if created the wrong way. Esp, on data that is accessed frequently. Remember that you are now making two calls to the database. One to read the data, One to create the view.
    • Ensure your logs are not being writted ot (thx Ray for pointing out the typo) written to the same disk subsystem as your data files.
    • Indexes are good.. But only to an extent. If the size of your indexes is twice the size of your data.. you have an issue.
    • Check for invalid objects. You will be surprised how many times, people overlook this.
    • Sometimes, it helps to flush the SGA (Oracle specific). Be aware that it will slow down the response time for a while (until the cache gets populated again).
    • Avoid excessive monitoring. Esp. with tools that query the system tables quite frequently. This has a negative impact on the database performance.

Did you run into any strange situations and figured out a solution? Please feel free to add your comments below…

Lessons of the trade : Data purge in databases..

Quick note to myself.. If you have a high volume transactional database and are looking to purge data from a table(s).. make sure you purge the data in small chunks. If you purge the data in larger chunks (rows), other processes trying to access the data on those tables have to go to the redo logs to access the data, since the purge job will put a lock on the table. This obviously adds latency to the queries.. So purge the data in smaller chunks (rows), forcing the database to flush the redo logs.