Job Openings AWS Site Reliability Engineer

About the job AWS Site Reliability Engineer

Role Purpose

We are looking for a Site Reliability Engineer with strong AWS expertise to help design, build, and operate secure, scalable, and resilient cloud platforms. This role is suited for someone who enjoys working across infrastructure, software engineering, and operations with a particular passion for automation.

As an SRE, you'll play a key role in shaping the cloud environment, mentoring colleagues, and ensuring workloads remain reliable, cost-efficient, and compliant.

Key Responsibilities

Reliability & Uptime

  • Design and maintain highly available AWS infrastructure.

  • Monitor system health, ensure SLA adherence, and resolve production incidents with root cause fixes.

Automation & Scalability

  • Use Infrastructure as Code (Terraform, AWS CDK, CloudFormation) to automate deployments, scaling, and recovery.

  • Reduce manual operations by building automated workflows.

  • Maintain CI/CD pipelines for consistent delivery.

Monitoring & Observability

  • Implement observability tools (CloudWatch, Grafana, ELK/OpenSearch).

  • Define and measure SLIs/SLOs.

  • Develop proactive alerting and anomaly detection.

Security & Compliance

  • Apply AWS security best practices across IAM, secrets management, and encryption.

  • Support compliance with frameworks such as ISO27001, SOC2, PCI-DSS, GDPR, and POPIA.

  • Conduct periodic security and compliance audits.

Performance & Cost Optimisation

  • Analyse usage and optimise resources for efficiency and cost.

  • Recommend reserved/spot instances and provide visibility into cloud spend.

Incident & Problem Management

  • Lead post-incident reviews and maintain runbooks.

  • Build fault-tolerant, self-healing systems.

Collaboration & Continuous Improvement

  • Partner with developers to embed reliability and observability into applications.

  • Mentor engineers and promote best practices in SRE and AWS operations.

Required Experience

  • 5+ years experience with AWS (EC2, ECS/EKS, Lambda, RDS, DynamoDB, S3, CloudFront, VPC, Route 53, IAM).

  • Skilled in Infrastructure as Code (Terraform, AWS CDK, CloudFormation).

  • Proficient with observability tools (CloudWatch, Grafana, ELK/OpenSearch).

  • Experience with CI/CD pipelines (GitHub Actions, GitLab CI, AWS CodePipeline).

  • Solid knowledge of containers and orchestration (Docker, Kubernetes, ECS, EKS).

  • Strong coding/scripting (Python, Bash, Go).

  • Incident management and on-call support experience.

Qualifications

  • AWS Professional certifications (preferred).

  • Hands-on production experience with Kubernetes/EKS.

  • Knowledge of security and compliance frameworks.

Key Competencies

  • Strong problem-solving with focus on root cause and prevention.

  • Excellent communicator, able to collaborate across teams.

  • Prioritises reliability, scalability, and performance.

  • Continuous improvement mindset, with a drive for automation.