February 2013

HOW TO : Use grep and awk to find count of unique entries

I have use grep extensively before to analyze data in log files before. A good example is this post about using grep and sort to find the unique hits to a website. Here is another way to do it using grep and awk.

Say the log file you are analyzing is in the format below and you need to get the unique number of BundleIDs

[code]2013-02-25 12:00:06,684 ERROR [com.blahblah.sme.command.request.CustomCommand] Unable to execute AssignServiceCommand, request = ‘<AssignServiceToRequest><MemberId>123456</MemberId><OrderBundle><BundleId>5080</BundleId></OrderBundle></AssignServiceToRequest>'[/code]

you can use grep and awk to find the number of times a unique bundleID appears by running

[code]grep -i bundleID LOG_FILE_NAME | awk ‘{ split ($11,a,">"); print a[6]}’ | sort | uniq -c | sort -rn [/code]

breaking down the commands

grep -i : tells grep to only show the lines from the file (LOG_FILE_NAME) containing the text bundleID and makes the search case insensitive

awk ‘{ split ($11,a,”>”); print a[6]}’ : tells awk to grab the input from grep and take the 11th item (by default awk separates content with a space) and split the string into an array (a) using > as a delimiter. And finally print out the value of the array a’s sixth member

sort : sorts the output from awk into ascending order

uniq -c : takes the output from sort and counts uniq items

sort -qn : takes the output from uniq and does a reverse order sort

The output looked like this

[code]
173 5080</BundleId
12 5090</BundleId
8 2833</BundleId
1 2412</BundleId
1 2038</BundleId
1 1978</BundleId
1 1924</BundleId
[/code]

HOW TO : Configure tcpdump to rotate capture files based on size

quick note for self. If you are capturing traffic using tcpdump, you can rotate the capture files based on size

[code]sudo tcpdump -i INTERFACE_TO_CAPTURE_TRAFFIC_ON -C 10 -s0 -W NO_OF_FILES_TO_ROTATE_THROUGH -w /PATH_TO_CAPTURE_FILE [/code]

explanation of the options used

-i : specify the interface you want to capture the traffic on. If  not specified, tcpdump will listen on the lowest numbered interface. i.e. eth0

-C : specify the size of the file multiplied by 1000000 bytes. In this example, the file created would be 10000000 bytes. Or ~9.8MB

-s : specify the packet length to capture. 0 (zero) tells tcpdump to capture the entire packet

-W : specify the number of files to rotate through once the files size specified in -C is reached. The files keep rotating throughout the capture

-w : Specify the path to the capture file. tcpdump appends an integer to the end of the file based on the number of files it has to rotate through.