Site Reliability Engineering Training

Duration: 40-45 Hours

 

Introduction to Site Reliability Engineering

  • Thorough understanding of Site Reliability Engineering
  • Understand the core principles of Site Reliability Engineering, and how cloud computing enables this
  • DevOps vs SRE

Public Cloud Overview and Linux Basics

  • Public Cloud Overview – Compute, Containers, Storage and Observability
  • Characteristics of a good SRE and SRE Foundational Skillset
  •  Linux, Automation, IP Address Subnetting, VI Editor

Application Deployment

  • Setup CI/CD Pipeline
  • Infrastructure as a Code using Terraform
  •  Build Infra, Deploy app and Implement Observability
  •  Deploy a simple Microservice application

Application Monitoring and performance tuning

  • Install a monitoring solution to monitor cluster and application resources
  • Check vulnerabilities in Terraform code and Kubernetes cluster
  • Understand the concept of reliability and its significance in ensuring system stability and performance

Service Level Objectives (SLO), Service Level Indicators (SLI) and Error Budgeting

  • Identify different types of Service Level Indicators (SLIs) and their role in measuring system performance
  • Define Service Level Objectives (SLOs) and recognize various types along with best practices for setting them effectively
  • Gain proficiency in managing Error Budgets and implementing Error Budget Policies to maintain service reliability within defined thresholds
  • Differentiate between SLIs, SLOs, and Error Budget Policies, and articulate their importance in ensuring system resilience
  • Explore Non-functional requirements and their impact on system design and performance
  • Discover the concept of observability and familiarize yourself with monitoring tools essential for maintaining system health
  • Apply theoretical knowledge to practical scenarios by analyzing examples of SLIs and SLOs in real-world contexts.
  • Identify key roles that contribute significantly to ensuring system reliability and understand their responsibilities in fostering a culture of reliability.

Prerequisites:

  • Understandings of Cloud computing
  • Technical education

Suitable for:

  • Anyone – wanting to kickstart a career in SRE
  • Software Engineers
  • Platform Engineers
  • System Admins
  • DevOps Engineers