Site Reliability Engineering: Measuring and Managing Reliability

Course Modality

Instructor-led (classroom)

Course Level

Advanced

Course Time

13 hours

Course Language

English

Course Overview

This course teaches the theory of Service Level Objectives (SLOs), a principled way of describing and measuring the desired reliability of a service. Upon completion, learners should be able to apply these principles to develop the first SLOs for services they are familiar with in their own organizations.

Learners will also learn how to use Service Level Indicators (SLIs) to quantify reliability and Error Budgets to drive business decisions around engineering for greater reliability.  The learner will  understand the components of a meaningful SLI and walk through the process of developing SLIs and SLOs for an example service.

Prerequisites

There are no prerequisites for this course.

Intended Audience

The primary audiences for this course are business decision makers and technical roles within the game development vertical.
Technical roles include, but are not limited to:

    • Game designers
    • Game developers
    • Game artists
    • Game producers
    • Game administrators

Why The DataTech Labs ?

Self-Paced Online Video

A 360-degree learning approach that you can adapt to your learning style

A 360-degree learning approach that you can adapt to your learning style

Engage and learn more with these live and highly-interactive classes alongside your peers

24/7 Teaching Assistance

24/7 Teaching Assistance Keep engaged with integrated teaching

Online Practice Labs

Projects provide you with sample work to show prospective employers.

Applied Projects

Real-world projects relevant to what you’re learning throughout the program

Learner Social Forums

A support team focused on helping you succeed alongside a peer community

Skill Covered

In this course, students learn how to:

  • How to make systems reliable
  • Understanding SLIs, SLOs and SLAs
  • Quantifying risks to and consequences of SLOs

Get In Touch

Course Curriculum

Introduction to SRE

This module is intended to bring you up to speed on the concepts underpinning SRE, CRE, and SLOs. If you’re already familiar with these concepts, you may still find new information and perspectives in this module, but it is not necessary to complete it. 

Targeting Reliability

In this module we’re going to talk about how you measure the desired reliability of a service. We will address what to consider when setting SLOs for your application within your organization. We’ll look at the three principles we use to measure the desired reliability of a service: figuring out what you want to promise and to whom, figuring out the metrics you care about that make your service reliability “good”, and finally, deciding how much reliability is good enough.

Choosing a Good SLI

In this module we will start off by taking a look at some characteristics of monitoring metrics that can make them useful as SLIs and contrast these against other metrics that are less useful. Because the choice of where to measure an SLI is a key variable, we’ll cover the five main ways you can measure an SLI and compare their pros and cons.

Quantifying Risks to SLOs

In this module we’ll be taking a critical look at the availability risks for our example service. We want to answer the question: “are our SLO targets and error budgets realistic?”

Consequences of SLO Misses

In this module, we’ll cover best practices for documenting your SLOs, the rationale behind a formal error budget policy and how best to create one and finally, we’ll look at an example error budget policy in order to understand the trade-offs and incentives that play out during negotiations when trying to write an error budget policy.

Recommended Exams

Cloud-certificationbadge-CloudDevOpsEngineer

Professional Cloud DevOps Engineer

A Professional Cloud DevOps Engineer is responsible for efficient development operations that can balance service reliability and delivery speed. They are skilled at using Google Cloud Platform to build software delivery pipelines, deploy and monitor services, and manage and learn from incidents.

Reviews

Get in touch, enquire now!


By submitting this form, I consent to the processing of the personal data that I provide The Data Tech Labs Inc. in accordance with and as described in the Privacy Policy.

© 2020 The Data Tech Labs Inc. All rights reserved.

TDTL-arrow
[glt language="Arabic" label="Arabic" image="yes" text="yes" image_size="24"]
[glt language="English" label="English" image="yes" text="yes" image_size="24"]
[glt language="French" label="French" image="yes" text="yes" image_size="24"]
[glt language="German" label="German" image="yes" text="yes" image_size="24"]
[glt language="Hindi" label="Hindi" image="yes" text="yes" image_size="24"]
[glt language="Marathi" label="Marathi" image="yes" text="yes" image_size="24"]
[glt language="Spanish" label="Spanish" image="yes" text="yes" image_size="24"]

Microsoft Power Platform App Maker

Designing & Implementing Azure AI Solution

Microsoft Azure Administrator

Developing Solutions For Microsoft Azure

Microsoft Azure Architect Design Exam

Implementing Azure Data Solution

Administering Relational Databases On Microsoft Azure