Friday, July 24, 2015

Non-Production Environments: Anti-Patterns



We all know that clean, separated environments are critical for developing and testing our software.  And yet somehow we tend to get it wrong, often.  The Continuous Delivery movement puts this at the forefront of its concerns, which is great.  So if you're headed down the CD path, chances are you are tackling or are about to tackle the "environment problem".  Because it's a tough thing to get right and because it involves up-front and continuous investment, it tends to be de-emphasized in the feature-crazed Agile world.  But unless you're just doing a proof-of-concept, not solving the environment problem will slow you down, hurt quality and become harder to fix over time.  There are a few anti-patterns that I see over and over:
  1. Not giving proper attention to the data tier(s) of the stack (e.g. relational database, NoSQL database, text/search database, file systems used for storage, etc.)
  2. Not extending environments "all the way out".  You should be able to have environments that go up and down: CDN->Load Balancer->Web Server->App Server(s)->DB Server(s), etc.
  3. Assuming that the number of non-production environments that you need is fixed and can be predetermined.
So, let's talk a little about these a bit, one at a time...

Not giving proper attention to the data tier(s) of the stack


The reason this gets skimped on is because it's hard!  Some issues:
  • ORMs have encouraged ignorance of relational databases and people don't work hard at things they do not understand.  If you don't understand a relational database, what are the chances you're going to work at getting a synch / replication / sub-setting process built?  Most developers who have come up in the last 10 years have a small fraction of understanding of relational databases when compared to those who came up before that.  Before ORMs, you had to understand relational databases in detail to build a complex application.  ORMs have been both a blessing and a curse in some ways.  Many would say only a curse.  One of the "curse" aspects is that they encourage ignorance where ignorance is to our detriment.  Despite the emergence of other data stores, relational databases will survive for many years for certain use cases.  The ACID and strong consistency guarantees are very convenient for certain problems.  If distributed ACID emerges (e.g. FoundationDb, NuoDb attempt this) then that could move some of those use cases away from the relational model.  But most (balanced) experts agree that for the foreseeable future, relational databases remain an essential aspect of storage even as NoSQL takes over certain use cases.  One notable exception is Michael Stonebraker, inventor of PostgreSQL, VoltDb, recent Turing award winner and certainly smarter man than me.  He predicts that column stores will obviate relational databases within the data warehouse arena and that distributed ACID systems like VoltDb will take over what remains of their share in the OLTP market.  However VoltDb is an in-memory database and he doesn't (as far as I can tell) talk about distributed ACID that's too big to fit in memory.  That's what FoundationDb, etc. are trying to address.  He is a guru but also extreme in his views.  I agree that relational databases can (and perhaps will) be completely overtaken - but distributed ACID large-scale databases need to be built for that to happen.  It's interesting to see that Google has moved away from BigTable and built a distributed ACID system for its developers. (Spanner, F1)  Unfortunately it depends upon specialized hardware at the data center (atomic & GPS clocks) that is not yet available to the average consumer of the public cloud.  Maybe CaaS (clocks as a service) will be added by AWS someday.
  • Increasingly, production systems are bigger and bigger.  So having any production-like data means having to copy around TBs of data.  To get it right you must often subset the data.  Both dealing with TBs and sub-setting TBs is hard work.  Sub-setting, for example, requires refinement and continuous maintenance as you add new types of data.  You can't approach the problem generically.
  • Given the explosion of database options ("polyglot persistence") it's common to have disparate types of databases for different use cases.  Each of these has to be well-understood to have good data environments.  This brings complexity.  For example, at LeadiD, we like Couchbase for a doc store and separately for a cache/key-value store, Elasticsearch for text/search and logging, PostgreSQL for relational, Azure and S3 for file storage, Kafka for messaging and Graphite for metrics.  That's a lot of things to get right and we're fairly careful about bringing in another data store.


Not extending environments "all the way out"


An environment should extend across all layers and tiers of a system.  For example, the CDN tier, the load-balancer tier, web server tier, app server(s) tier, data tiers, etc.  If you have access to a cloud infrastructure, this is fairly straightforward.  It just takes foresight and discipline.  With REST APIs for everything needed such as DNS management, CDN management, virtualization, Docker, etc., the tools are there to get this right.  It requires discipline but is nowhere near as hard as getting the data stack right.


Assuming that the number of non production environments that you need is fixed


You cannot know the number of non-production environments you need in advance and yet people make the big mistake of hard-coding and only configuring 3-4 environments total (e.g. Dev/QA/UAT, Dev/QA/UAT/Staging, Dev/QA/Staging).  There are two problems with this:
  1. You need one of these for every independent code path (e.g. might be a sub-team or a long-lived branch [sometimes you have no choice])
  2. You need more than these for some code paths - and you can't always predict which ones you'll need.  For example, it's common to have a "performance" environment that has enough data to tune nasty queries.  It's common to have a "prod repro" environment that has a continually-upkept database (maybe nightly) to be able to reproduce transaction-specific bugs that were recently reported.  There are environments to deal with automated testing, environments to practice data loads and environments for a particularly invasive feature a developer might be working on.  You cannot predict the number of independent code paths.  And you cannot predict the number of environments you'll need to function optimally along a given code path.
One manifestation of this cardinal sin is code (even if test code) that hard-codes understanding of fixed non-production environments or assumes that only certain of these exist.  (e.g. "if (env == 'QA') then ...")  This is bad.  Ideally your code should be completely environment-agnostic.  If your back's up against the wall you can code for detecting production vs. non-production.  But that's it.




While every setting I've worked in has had variations of these problems I have tried hard to at least deprecate these practices and make headway toward reversing them.  But of course there is no such thing as true greenfield unless you're employee number 1.  And if you're employee number 1 it's irresponsible to be spending this much time thinking about these things because who knows if you even have a product?  And so it comes down to knowing up-front what best practices are, accepting that in the beginning you'll incur some tech debt but moving to good environment practices as soon as you can tell that you have a product.  At LeadiD we have many of these problems just like anyone else.  We're tackling them now as we embark upon the journey of Continuous Delivery.  Fingers crossed...


  1. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.

    software testing course in chennai

    1. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important. Project Center in Chennai It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

      Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms.

      Specifically, Spring Framework provides various tasks are geared around preparing data for further analysis and visualization. Spring Training in Chennai

      The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

  2. Get out of the office and back in the classroom teaching yoga by saving time with the right yoga studio management software. Find out what to look for when choosing the ultimate yoga studio business software. xero accounting singapore

  3. This is an advanced yoga breathing technique to raise energy levels with rapid, controlled breathing. It also has the effect of increasing spiritual awareness over sustained practice. mental health quotes

  4. I can see that you are an expert at your field! I am launching a website soon, and your information will be very useful for me.. Thanks for all your help and wishing you all the success in your business. corel draw x7 activation code

  5. I recommend this blog for tips on employee time tracking software, time tracking apps, employee timesheets, time card calculators, time clocks, and workforce management software solutions to boost employee productivity and cut payroll costs. workforce management

  6. It additionally opens the entryway for administrative exchange: firms can progressively pick which assess specialist and different controls apply. Freelance Automation QA Engineer

  7. I’ve been searching for some decent stuff on the subject and haven't had any luck up until this point, You just got a new biggest fan!.. .Net FrameWork Offline Installer

  8. شركة السالم لخدمات التنظيف ومكافحة الحشرات ونقل العفش مع الفك والتركيب بالطائف يتم العمل لدينا من خلال فريق وعماله فنيه مدربه في غاية الاتقان ومن خلال احدث المعدات والاداوات مع شركة السالم فانت دائما في راحة تامه وامن مستمر
    شركة تنظيف بالطائف
    شركة تنظيف مجالس بالطائف
    شركة تنظيف خزانات بالطائف
    شركة مكافحة حشرات بالطائف
    شركة رش مبيدات بالطائف
    شركة عزل اسطح بالطائف
    شركة تسليك مجاري بالطائف
    شركة نقل اثاث بالطائف

  9. It can likewise allude to the assortment of such instruments, including hardware, alterations, courses of action and methodology. best microphone for streaming

  10. British Dissertation Consultants offer you with the highest quality of Medical Dissertation Services to undergraduates, Master's and Ph.D. students.

  11. British Dissertation Consultants provides effective and detail oriented dissertation proofreading services of your dissertation to ensure that your dissertation is globally acceptable.