About the job Cloud SRE Manager
Responsibilities:
️ Lead a team of Cloud SREs to maintain and optimize mission-critical AWS cloud infrastructure
️ Develop and implement strategies to improve reliability, efficiency, and cost-effectiveness of cloud platforms
Collaborate with development teams to ensure adoption of cloud-native architectures (infrastructure as code, containerization, automated deployments)
Establish and track key reliability metrics and SLIs/SLOs to monitor cloud service performance
Lead incident response and root cause analysis to minimize downtime and drive continuous improvement
Participate in capacity planning and scaling to support business growth and new product launches
Required Skills:
5+ years of experience in Site Reliability Engineering or DevOps, preferably in fintech
️ Expertise in cloud infrastructure (AWS is a must) and container orchestration (Kubernetes, Docker)
️ Proficiency in scripting and automation tools (Terraform, Ansible, CI/CD pipelines)
Hands-on experience with monitoring tools like Prometheus, Grafana, and Datadog
Excellent problem-solving and analytical skills
Proven experience managing and mentoring a team of SREs