Interesting (infrastructure) tidbits about Groupon

I am attending the Camp DevOps conference in Chicago over the weekend and one of the speakers was Zack Steinkamp. Zack manages the operations tools group in addition to information security at Groupon. He spoke about a custom configuration management tool called “roller” (http://steinkamp.us/campdevops.pdf) that is used at Groupon. He said the tool is scheduled to be open sourced soon. roller is very similar to puppet, chef, bcfg2 etc. I am not sure if we need yet another configuration management tool, but Zack made a good point for why there is a need for a simpler and secure configuration management tool.

Anyways.. this post is not about roller, but rather about some tidbits that Zack shared about Groupon’s infrastructure in the talk

  • Groupon started out with ~100 servers
    • The operations function was outsourced to a third party
    • No automation in place.. all servers were “handcrafted”
  • Currently running ~1000 servers in 6 locations (globally)
    • Building their own data center
  • Running 4 different Linux distros in prod
  • Currently using Amazon and another cloud provider
  • Not a hugh believer in public cloud for future expansion
    • Zack spoke about how the lack of consistency in the IO/CPU performance is an issue on the public clouds
  • Does not heavily use virtualization in production
  • Uses Nagios for monitoring
  • SW Architecture
    • Started out as a “wordpress” blog
    • Then migrated into a Rails App
    • Currently the Rails App is huge
    • MySQL is the DB