Friday, July 24, 2015

Non-Production Environments: Anti-Patterns



We all know that clean, separated environments are critical for developing and testing our software.  And yet somehow we tend to get it wrong, often.  The Continuous Delivery movement puts this at the forefront of its concerns, which is great.  So if you're headed down the CD path, chances are you are tackling or are about to tackle the "environment problem".  Because it's a tough thing to get right and because it involves up-front and continuous investment, it tends to be de-emphasized in the feature-crazed Agile world.  But unless you're just doing a proof-of-concept, not solving the environment problem will slow you down, hurt quality and become harder to fix over time.  There are a few anti-patterns that I see over and over:
  1. Not giving proper attention to the data tier(s) of the stack (e.g. relational database, NoSQL database, text/search database, file systems used for storage, etc.)
  2. Not extending environments "all the way out".  You should be able to have environments that go up and down: CDN->Load Balancer->Web Server->App Server(s)->DB Server(s), etc.
  3. Assuming that the number of non-production environments that you need is fixed and can be predetermined.
So, let's talk a little about these a bit, one at a time...

Not giving proper attention to the data tier(s) of the stack


The reason this gets skimped on is because it's hard!  Some issues:
  • ORMs have encouraged ignorance of relational databases and people don't work hard at things they do not understand.  If you don't understand a relational database, what are the chances you're going to work at getting a synch / replication / sub-setting process built?  Most developers who have come up in the last 10 years have a small fraction of understanding of relational databases when compared to those who came up before that.  Before ORMs, you had to understand relational databases in detail to build a complex application.  ORMs have been both a blessing and a curse in some ways.  Many would say only a curse.  One of the "curse" aspects is that they encourage ignorance where ignorance is to our detriment.  Despite the emergence of other data stores, relational databases will survive for many years for certain use cases.  The ACID and strong consistency guarantees are very convenient for certain problems.  If distributed ACID emerges (e.g. FoundationDb, NuoDb attempt this) then that could move some of those use cases away from the relational model.  But most (balanced) experts agree that for the foreseeable future, relational databases remain an essential aspect of storage even as NoSQL takes over certain use cases.  One notable exception is Michael Stonebraker, inventor of PostgreSQL, VoltDb, recent Turing award winner and certainly smarter man than me.  He predicts that column stores will obviate relational databases within the data warehouse arena and that distributed ACID systems like VoltDb will take over what remains of their share in the OLTP market.  However VoltDb is an in-memory database and he doesn't (as far as I can tell) talk about distributed ACID that's too big to fit in memory.  That's what FoundationDb, etc. are trying to address.  He is a guru but also extreme in his views.  I agree that relational databases can (and perhaps will) be completely overtaken - but distributed ACID large-scale databases need to be built for that to happen.  It's interesting to see that Google has moved away from BigTable and built a distributed ACID system for its developers. (Spanner, F1)  Unfortunately it depends upon specialized hardware at the data center (atomic & GPS clocks) that is not yet available to the average consumer of the public cloud.  Maybe CaaS (clocks as a service) will be added by AWS someday.
  • Increasingly, production systems are bigger and bigger.  So having any production-like data means having to copy around TBs of data.  To get it right you must often subset the data.  Both dealing with TBs and sub-setting TBs is hard work.  Sub-setting, for example, requires refinement and continuous maintenance as you add new types of data.  You can't approach the problem generically.
  • Given the explosion of database options ("polyglot persistence") it's common to have disparate types of databases for different use cases.  Each of these has to be well-understood to have good data environments.  This brings complexity.  For example, at LeadiD, we like Couchbase for a doc store and separately for a cache/key-value store, Elasticsearch for text/search and logging, PostgreSQL for relational, Azure and S3 for file storage, Kafka for messaging and Graphite for metrics.  That's a lot of things to get right and we're fairly careful about bringing in another data store.


Not extending environments "all the way out"


An environment should extend across all layers and tiers of a system.  For example, the CDN tier, the load-balancer tier, web server tier, app server(s) tier, data tiers, etc.  If you have access to a cloud infrastructure, this is fairly straightforward.  It just takes foresight and discipline.  With REST APIs for everything needed such as DNS management, CDN management, virtualization, Docker, etc., the tools are there to get this right.  It requires discipline but is nowhere near as hard as getting the data stack right.


Assuming that the number of non production environments that you need is fixed


You cannot know the number of non-production environments you need in advance and yet people make the big mistake of hard-coding and only configuring 3-4 environments total (e.g. Dev/QA/UAT, Dev/QA/UAT/Staging, Dev/QA/Staging).  There are two problems with this:
  1. You need one of these for every independent code path (e.g. might be a sub-team or a long-lived branch [sometimes you have no choice])
  2. You need more than these for some code paths - and you can't always predict which ones you'll need.  For example, it's common to have a "performance" environment that has enough data to tune nasty queries.  It's common to have a "prod repro" environment that has a continually-upkept database (maybe nightly) to be able to reproduce transaction-specific bugs that were recently reported.  There are environments to deal with automated testing, environments to practice data loads and environments for a particularly invasive feature a developer might be working on.  You cannot predict the number of independent code paths.  And you cannot predict the number of environments you'll need to function optimally along a given code path.
One manifestation of this cardinal sin is code (even if test code) that hard-codes understanding of fixed non-production environments or assumes that only certain of these exist.  (e.g. "if (env == 'QA') then ...")  This is bad.  Ideally your code should be completely environment-agnostic.  If your back's up against the wall you can code for detecting production vs. non-production.  But that's it.




While every setting I've worked in has had variations of these problems I have tried hard to at least deprecate these practices and make headway toward reversing them.  But of course there is no such thing as true greenfield unless you're employee number 1.  And if you're employee number 1 it's irresponsible to be spending this much time thinking about these things because who knows if you even have a product?  And so it comes down to knowing up-front what best practices are, accepting that in the beginning you'll incur some tech debt but moving to good environment practices as soon as you can tell that you have a product.  At LeadiD we have many of these problems just like anyone else.  We're tackling them now as we embark upon the journey of Continuous Delivery.  Fingers crossed...