Software Engineering is becoming more and more complex day by day and software framework creators are trying to make the adoption of their frameworks as easy as only possible. This is a noble decision. However, as no good deed goes unpunished, they also hide a lot of configuration options in front of the adopters to soften the complexity blow. As much as that helps the creators win over audience, it does not really help address engineering problems well as complexity rises (e.g. performance-oriented systems).
Allured by the ease of use, software associates (often even experienced ones!) get lulled by that illusion and fall into a trap of unknown unknowns or, better said, unknown defaults. What is worse, since 3rd parties control the defaults, new versions of software may evolve the defaults over time, which may unexpectedly lead to a change in your system behaviour. In worst case scenario, they will destabilise it in a hard-to-spot way.
Once having taken you through common examples showcasing the above thesis, I will offer to you a simple take-away to help you safely navigate through the treacherous waters of the defaults and ultimately become a better software engineer.
Communication libraries are probably the most prominent example of where Zero Defaults Configuration should be applied.
HTTP client libraries allow you to configure a number of timeouts related t o connection handling such:
- Connection Pool Size
- Connection Timeout
- Read Timeout
- SSL Timeout
- Max Idle Timeout
- Max Life Timeout
to name a few. It it not usual that Read Timeout defaults to infinite time. This is bound to cause problems when handling lots of connections leading to resource drain, cascading delays in your processing times and in effect, breaching your SLA targets.
Connection pools is a great concept since it relies on reusing existing connection to your resources such as Relational Databases or HTTP targets. However, they often don’t work in isolation. Connection pools often work with thread pools and it is worth checking if the thread pool in use is not the same thread pool that your server uses, for example. Sharing thread pools is quite often a culprit of nasty performance problems.
Also, in the cloud era is common to scale your applications at the platform level. Combined with connection pool it may be so that the number of connections that your application can generate in total may far exceed the capacity of your target resource.
The connection time-to-live and how the connection is considered stale is also important as when not taken proper care of some of your calls may fail instantly due to attempting to jump on the connection that is considered stale.
Resilience frameworks such as Resilience4J are great as they give ability to protect our systems from transient misconduct. However, they may also lead to undesirable side-effects. Imagine a situation when you apply a default configuration and it turns out that as part of synchronous interaction 3 attempts and with an exponential back-off policy is applied. That is unlikely to serve you well and quickly lead to thread drainage and unresponsiveness.
Circuit breaker is another gadget that will come useful to avoid further strain on your system. However, is the default of 50% suitable to your use case? Is 60 seconds valid for your interactions to consider them slow?
With great power comes great responsibility. Study your resilience instrumentation well, so it will pay you back well.
Software engineering is nothing but a continuous learning and remaining humble is a most useful lesson one can learn. While approaching a framework, a software library or a web service, the first action we should consider is to understand what configuration options they have and make best efforts to understand the impact on the behaviour they relate to. This can save a lot of stress and reduce the software production cost by shortening the Time-To-Market. Sounds important, doesn’t it ?
It turns out that as we get close to production, the ease of adding more and more components to the system was nothing but an illusion. We tend to realise the amount of knowledge needed to get our system ready to launch is immense and we need to understand a lot more concepts that we initially thought.
Experienced engineers will agree software should be built in a conscious manner. It turns out this process can be significantly improved by following the principle below:
Zero/Explicit Defaults Configuration — make sure you surface all default values so they can be configured at deployment or runtime.
It is trivial, agreed. Remember though that most failures in the current world come from simple principles not being followed.
Diligence and discipline in pursuing it will make the system better documented, their contributors much more aware of the internals and how their software works top-to-bottom in general. Moreover, what is really important for long-living systems, it will protect their ever coming and going contributors from knowledge being worn out over time.