Professional Services, Managed Services, Cloud Platform Services

DevOps and SRE's are the new Linux Admins. But What the heck does this mean?

July 2, 2019

You have probably heard these buzzwords already. With the current boom in cloud computing, the “Linux admins” are going out of fashion and “DevOps Engineers” and “SRE specialists” are popping out all around the globe. But what do these terms, or positions actually incorporate? Should employees with above-mentioned titles work side by side, or should they be afraid of being replaced by one another? Let’s bring some light into this topic.

DevOps

Let’s start with DevOps. A community-driven approach to organisational structure, practices, and tools, which:

Speeds up the delivery of application and thus reduces time to market
Implements small, gradual changes, rather than big rollouts
Brings focus on tooling and automation

DevOps is an abstract concept, a culture, an ideology or philosophy if you will. It was developed as a reaction to typical organisational problems. One of the biggest obstacles in traditional development flow was the so-called “siloed” teams. Imagine a giant wall, where folks from the Development and Operation are sitting on either side of this wall. They can’t see each other, they can’t communicate with each other. A typical deployment process looked like this: Developers created a new version of a product, and then threw it over the wall to Ops folks to “take care of it”

The main problem here (apart from lack of communication and shared knowledge) is, that these two groups have vastly different goals. Developers want to build new features and updates and ship them ASAP, whereas the OPs folks approach things in “If it ain't broken don’t fix it” kind of way. They are concerned with stability more than anything else, and with each new feature or update, they can already see yet another sleepless night.

DevOps methodology transforms this concept. It tears down the wall and brings both teams together, often into a single person. Thus the name, DevOps. Developers + Operations working together, deploying, maintaining and automating stuff.

Nowadays, the trends are to bring other teams to direct cooperation as well. A good example would be connecting sales and marketing departments, making the most out of different competencies and shared knowledge to bring less siloed, more fluent workflow. But the term DevOps is probably here to stay as something like “DevPMOpSale” doesn’t sound as fancy anymore does it…

SRE

We’ve talked about DevOps as an abstract ideology. SRE, or Site Reliability Engineering, on the other hand, is a concrete, prescriptive way of implementing very similar principles. It was designed by Google for its internal purposes. It was created parallelly and around the same time as DevOps principles. Later on, Google decided that SRE principles should be shared with the public and become a common practice. They even published an SRE book that you can read online for free.

SRE actually covers most of the DevOps ideology with concrete principles. As google folks like to say, the class SRE implements DevOps. If you dig deep into SRE books, you will find principles of shared ownership (mixed teams), toil & automation as well as an emphasis on small increments.

SRE also brings additional terms into the mix, such as:

SLI: Service-Level Indicator
- Measurement of a service behaviour i.e. Is the latency of request below 300ms?
SLO: Service-Level Objective
- The target of the exact amount (percentage) of the SLI that must be “healthy” e.g. 99.9% of the requests will have a latency below 300ms.
SLA: Service-Level Agreement
- Agreement between the service provider and the client on what happens when the defined SLO is not met e.g. compensation for the loss of profit.
PM: Post Mortem
- a blameless evaluation of service incidents with analysis of the root cause and the next steps to prevent from happening again.

SRE relies on the method called Error Budget. With this method, whenever you are about to, or already did, break your defined SLO, all deployments of new features, updates or maintenance windows are postponed, and all work is focused on reliability until you have enough error budget again or at the end of the SLO period.

So in summary, SRE and DevOps are not enemies, they are not even friends, completing each other. It's more like one guy talking about a dream house and the other guy actually picking up the tools and starting to build.