The Blog

Catchpoint publishes the SRE Survey Report every year and therefore the latest one is the 2019 publication. SRE is the new buzzword and everybody is trying to leap thereon. Tons of individuals are getting it right and lots of more missing the purpose.

Let us check out the key findings of the Catchpoint SRE Survey Report 2019.

Key Findings:

  • Site Reliability Engineering remains an emerging practice.
  • Incident Resolution may be a very large part of the SRE Role.
  • Resolving Incidents is sort of stressful.
  • If there’s a supporting team, it aids in reducing the strain of incident resolution.

Once we get into the details, we see the following:

In the Roles, we see 45% of individuals have “SRE” in their titles. It increases to 49% when the Manager, the Director levels are included. 29% are Senior positions like Lead Architect, etc., and 16% are in Leadership positions like the Directors.

We can see that the number of SREs in a corporation is between 1 to 100+ with a maximum between 2 to 10. This shows that SRE practice is emerging. We will also see that the practice is usually within 3 years. My personal take on this is often that organizations try to get into SRE but not really taking it in the right approach and also not giving the due focus. I am more convinced with our take as we see tons of organizations conveyed that they only renamed the team or the chief sponsor said that they’re “doing SRE now”, etc.

Toil – it’s any operational task that’s mundane, repetitive and can be automated. The SRE Report shows that 59% of individuals are stating that there’s a lot of toil in their organization and enough automation has not been executed. 48% disagreed or strongly disagreed that their organization has automated to reduce the toil.

The top sources of toil are:

  • 30% said maintenance tasks.
  • 27% said non-urgent service-related messages.
  • 16% said releases.
  • 15% said on-call notifications.
  • 7% said non-service-related messages.

My take is two-fold. First, the automation done is less in many organizations and the other is that whatever automation is completed isn’t executed properly. Automation must be done after properly considering the process, fixing problems and complexities of the process, then accurately architecting the toolchain. Product vendors still influence automation than business requirements. This trend needs to change and people need to be aware and cognizant of this.

SLO – Service Level Objective is one of the key requirements of SRE. The necessity to set the SLO and monitoring it to make sure that it’s well-maintained is a must-do aspect of SRE. It’s surprising to see that only 30% of the respondents agree or strongly agree that they have SLO. Others strongly disagree or disagree or are neutral.

The SRE Survey shows the following as the distinct SLOs they cover:

  • Availability – 72%
  • Response Time – 47%
  • Latency – 46%
  • Do not have SLOs – 27%

My take often on this is that if SLO is not there, then the organizations are not following SRE. They may have done a few automation and nothing more. Such a half-hearted approach might be a waste of money and resource. This may not result in any improvement that the organization is expecting to accomplish.

The following is the various business impacts that the survey shows on missing SLOs:

  • 86% says a drop in Customer Satisfaction.
  • 70% says Loss of Revenue.
  • 57% says drop in Employee Productivity.
  • 49% says, Lost Customer.
  • 36% says Social Media Backlash.

These are all very critical impacts and directly affecting the business. Social Media Backlash may be a major one, albeit the share is less than the others. However, this can lead to a higher effect on the other impacts SLO, Error Budget, Error Budget Policies, and SLIs are very important aspects of SRE and needs to be implemented properly and managed effectively for successful business outcomes.

SRE has got to be understood properly and implemented alongside DevOps and ITSM to achieve proper business outcomes. Automation also will be needed largely. The breaking down of Silos and making both Dev and Ops work together, having an equal focus to Operations and Development is important. DevOps focuses more on the Dev side while SRE focuses more on the Ops side where the value is created.

We hope you benefit the most from the SRE survey report of 2019!

Leave a Comment