Technology

HOW TO : Add commas (,) at the end of every line using Notepad++

Thanks for this great nugget from Sumama Waheed

Many a time, you get some data as a CSV file and need to copy some of that data and include it in a SQL statement. For instance one of the rows in the CSV was first name in the format below

employee_id
1234
8765
9808
1235
8734
6723

And you need to put it in a SQL statement as below

SELECT * FROM employee_table
WHERE employee_table.employee_id IN (1234,
8765,
9808,
1235,
8734,
6723)

That’s a lot of adding commas (,) at the end of every line. You can do it quickly in Notepad++ (you can do the same in any editor that supports regex) using the regex capability in search and replace using ($) as the search string and $, as the replace string.

HOW TO : Configure nginx to use URI for modifying response content

That was a pretty long title for the post :). I love nginx for it’s flexibility and ease of use. It is like a swiss army knife.. can do a lot of things :).

We needed to serve some dynamic content for one of our use cases. If user visits a site using the following URL format http://example.com/23456789/678543 , we want to respond with some html content that is customized using the 23456789 and 678543 strings.

A picture might help here

Here’s how this was achieved

  • Define a location section in the nginx config to respond to the URL path specified and direct it to substitute content
    location ~ "^/(?<param1>[0-9]{8})/(?<param2>[0-9]{6})" {

            root /var/www/html/test/;
            index template.html;
            sub_filter_once off;
            sub_filter '_first_param_' '$param1';
            sub_filter '_second_param_' '$param2';
            rewrite ^.*$ /template.html break;
    }

create a file named template.html with the following content in /var/www/html/test

Breaking down the config one line at a time

location ~ "^/(?<param1>[0-9]{8})/(?<param2>[0-9]{6})" : The regex is essentially matching for the first set of digits after the / and adding that as the value for variable $param1. The first match is a series of 8 digits with each digit in the range 0-9. The second match is for a series of 6 digits with each digit in the range 0-9 and it will be added as the value for variable $param2

root /var/www/html/test/; : Specifying the root location for the location.

index template.html; : Specifying the home page for the location.

sub_filter_once off; : Specify to the sub_filter module to not stop after the first match for replacing response content. By default it processes the first match and stops.

sub_filter 'first_param' '$param1'; : Direct the sub_filter module to replace any text matching first_param in the response html with value in variable $param1.

sub_filter 'second_param' '$param2'; : Direct the sub_filter module to replace any text matching second_param in the response html with value in variable $param1.

rewrite ^.*$ /template.html break; : Specify nginx to server template.html regardless of the URI specified.

Big thanks to Igor for help with the configs!!

Why ADP?

ADP is a $70B+ (by market cap as of August 2019) company and yet cannot get a simple redirect correct. If someone that is asked to use it’s employee performance management system types in tms.adp.com (like most people would do), they get this nice friendly error

If by some magical and mystical reason, they type in https://tms.adp.com, they get this login page

I find it mind boggling that such a mature company cannot figure out

  1. Customer experience
  2. 301/302 http redirects
  3. HTTP Strict Transport Security (HSTS)

End Rant and sorry to all my friends that work at ADP 🙂

Optimizing cache infrastructure

I love when engineering teams share their tricks of trade for other organizations to benefit. While this might seem counter-intuitive, sharing knowledge makes the entire ecosystem better.

Etsy‘ engineering team does a great job of publishing their architecture, methodologies and code at https://codeascraft.com.

This particular article on how they optimize their caching infrastructure (https://codeascraft.com/2017/11/30/how-etsy-caches/) is pretty enlightening. I always thought the best method to load balance objects (app hits, cache requests, queues etc) to hosts was to use mod operations. In this blog post Etsy’ team talk about using consistent hashing instead of modulo hashing.

At a high level, it allows cache nodes to fail and not impact the overall performance of the application drastically in addition to making it easy to scale the number of nodes. This method is useful when you have a large amount of cache nodes.

More reference links

  • http://www.tom-e-white.com/2007/11/consistent-hashing.html
  • https://www.toptal.com/big-data/consistent-hashing
  • https://en.wikipedia.org/wiki/Consistent_hashing

 

DID YOU KNOW : Advanced Search in Microsoft Explorer

I was trying to search for some files on my laptop today and wanted to filter the search for filed modified in the last few weeks. Like, show me all files that contain the word “American” and modified in the last 2 weeks. Doing this on a Linux machine would have been a simple filter using find. But this is Microsoft :).

Thanks to some Googling, I ran across something called “Advanced Query Syntax” that is a core part of Microsoft’ ecosystem (OS, Office etc).

So the same search ended up being

American datemodified:this month

There are a lot of cool ways you can filter your queries using the other keywords in AQS.

How data streams work (AKA queue design)

Good blog post by Timothy Downs on how queues and data streams work with a layman example at https://hackernoon.com/introduction-to-redis-streams-133f1c375cd3

Quoting the example here

We have a very long book which we would like many people to read. Some can read during their lunch hour, some read on Monday nights, others take it home for the weekend. The book is so long that at any point in time, we have hundreds of people reading it.

Readers of our book need to keep track of where they are up to in our book, so they keep track of their location by putting a bookmark in the book. Some readers read very slow, leaving their bookmark close to the beginning. Other readers give up halfway, leaving theirs in the middle and never coming back to it.

To make matters even worse, we are adding pages to this book every day. Nobody can actually finish this book.

Eventually our book fills up with bookmarks, until finally one day it is too heavy to carry and nobody can read it any more.

A very clever person then decided that readers should not be allowed to place bookmarks inside the book, and must instead write down the page they are up to on their diary.

This is the design of Apache Kafka, and it is a very resilient design. Readers are often not responsible citizens and often will not clean up after themselves, and the book may be the log of all the important events that happen in our company.