My career over time has transitioned from Marketing Services to Financial Services. The scope of products and how robust and secure they must be varies greatly. Marketing content is not typically something that needs to have a secure ecosystem around it aside from capturing information about leads. In the financial service world I am in now the wide ecosystem of systems I need to interact with is much greater than I have had to deal with. The unifying characteristic of systems I have enjoyed working with are those that have a dependable and repeatable build and deployment pipeline.
The greatest frustrations and flaws in systems I have repeatedly dealt with have been related to system environment configuration. Environment configuration can vary widely but it is not typically related to the code of programs written. We have different environments for testing such as development and staging. If your environments have diverged then you are better for testing in production. You might catch code related bugs but there is a swath of environment issues that can completely hose your next release. If you cannot test in production your systems better have great monitoring and observability.
Issues I have dealt with due to environment configuration: * Webserver Configuration * Operating System Configuration * Deploying a code base from an old server to a new server * Hard coded application specific configuration files required by a code base * Likely much more!
Over the lifetime of an application webserver tweaks occur. Each website has its own concerns with operating efficiently. Whether this is Apache/Nginx/IIS there are loads and loads of settings to configure which can greatly affect your applications. In apache memory settings and upload limits can reveal issues with code. In IIS there is a default AppPool timeout set to 20 minutes. If you are not a shared hosting provider you unlikely need to care for this default setting that may make the experience for a customer negative waiting for slow requests.
Though rare there are default settings at os, application, and code levels that limit the amount of network connections on windows and linux. If you are running a network service you may eventually hit a peak where requests stop being served or time out. In windows the servicepointmanager has a default of 2 connections. In linux there are various connection limits ulimit being one of them. On one team we had a couchdb server in production causing many timeouts during high load days. After exhaustive searching we found that erlang has a default connection limit ERL_MAX_PORTS that needs to be tweaked.
Moving a code base that has been sitting on one server for many years? Good luck! One such server I ran into was a beefy 16 core windows 2008 server. We moved a few services off this box onto several other servers behind a load balancer and saw worse performance! There were years of tweaks and modifications that had occurred that we did not run into on the huge box but everything was compounded on the distributed machines. We ultimately mitigated most of the issues but a lot of knowledge of the individual tweaks were certainly lost.
Every time a new team member joined our team they would attempt to run our applications and run into cryptic errors. Eventually someone would remember that there is a hard coded text file that needs to exist on any machines running specific applications. Also this same requirement was placed in various libraries in use in other systems causing this issue to compound at random times.
There are multiple types of technologies available that can help mitigate many of these issues. Configuration Management software (Desired State Configuration, Puppet, Chef, Salt, Ansible, more?) can be a blessing to teams. All changes can be tracked through version control and updated. Build/Release pipeline tools (TFS, Jenkins, Gradle, Bazel, much more) to record the exact steps and tasks required for build and deployment. Once the human element is taken out far less bugs and downtime occurs. Orchestration Utilities (Service Fabric, Kubernetes) this may involve moving around Docker containers or similar container technologies or packaged actor services. Scalibility and growth of an application is an issue to be mindful of throughout the lifecycle of an application. At any time an application may need to be split out of a monolith or need to scale greatly. Leveraging an orchestration tool can build a base level of modularity and scalability that prevents future frustration with the limits of an application. For example using Azure Functions/AWS Lambda for stateless apis by minimizing the need of maintaining a server and being able to use built in scalability.
Two technologies that will have a huge impact on the future of safe scaled systems are Powershell and Desired State Configuration.
These technologies are open source, cross platform and highly accessible.
Developers, System Administrators, Quality Assurance Engineers and more can benefit by adding these tools to their toolbox.
The convergence on these technologies is going to benefit all three groups. Developers spend less time dealing with Ops details. Ops spends less time maintaining and digging around. QA does not have to deal with strange intermittent bugs that are environment specific.
I highly recommend everyone to take a look at the cross-platform direction that powershell is going. Start looking at using configuration management for your systems it will save you and those that inherit your application a lot of pain.