Job Openings
DevOps Engineer
About the job DevOps Engineer
Responsibilities
- Design, implement, and support scalable, reliable infrastructure to power production and development environments.
- Manage and enhance our container orchestration systems, with a focus on Kubernetes (EKS), while maintaining a balanced view of other critical AWS services such as EC2, ALB, IAM, and VPC networking.
- Build and maintain automation for application and infrastructure deployment, scaling, and lifecycle management.
- Partner with software engineering teams to improve build, release, and deployment processes across CI/CD pipelines.
- Monitor and improve system availability, latency, and performance across the full stack from cloud infrastructure to backend services.
- Develop internal tools and scripts to enhance operational efficiency, resilience, and security.
- Play a key role in incident response efforts, including root cause analysis and long-term remediation.
- Participate in architecture reviews and help guide decisions on infrastructure design, resilience, and observability.
- Stay informed on industry trends in reliability engineering, cloud-native tooling, and DevOps practices, and integrate improvements into our operational playbook.
- Champion security, scalability, and cost-efficiency in all infrastructure decisions.
Requirements
- 5+ years of experience in a DevOps, SRE, or infrastructure engineering role supporting production systems at scale.
- Hands-on experience managing containerized applications using Kubernetes, preferably AWS EKS, but with understanding of broader infrastructure ecosystems.
- Strong knowledge of AWS services and how they integrate to support modern cloud architectures.
- Proficiency with Infrastructure as Code (IaC) tools such as Terraform, and configuration management tools.
- Experience designing and supporting CI/CD pipelines (e.g., Jenkins, GitHub Actions, ArgoCD, etc.).
- Scripting or programming skills in Python, Go, or similar languages, used for automation and tooling.
- Deep understanding of systems observability, including logging, metrics, and tracing (e.g., Prometheus, Grafana, CloudWatch).
- Ability to diagnose and troubleshoot complex issues across distributed systems, including performance bottlenecks and availability challenges.
- Familiarity with security best practices for cloud and containerized environments.
- Clear and proactive communicator, comfortable working cross-functionally in a fast-paced environment.
Set up: Remote
Shift: Night Shift
By Applying, you give consent to collect, store, and/or process personal and/or sensitive information for the purpose of recruitment and employment may it be internal to Cobden & Carter International and/or to its clients.