Job Openings
Senior DevOps Engineer
About the job Senior DevOps Engineer
About the Role
We are looking for a Senior DevOps Engineer with deep expertise in platform infrastructure to support a commercial SaaS product. You will be responsible for designing, building, and maintaining secure, scalable cloud infrastructure with a strong emphasis on Kubernetes orchestration, identity management, and security best practices. This role offers the opportunity to work on cutting-edge infrastructure supporting AI/ML workloads and GPU-accelerated computing.
Key Responsibilities
- Design, deploy, and manage production Kubernetes clusters across cloud environments (AWS EKS, GCP GKE, Azure AKS, Native)
- Implement and maintain Infrastructure as Code using Terraform.
- Architect and implement authentication and authorization systems (OAuth 2.0, OIDC, SAML, RBAC)
- Design and enforce security policies, network segmentation, and zero-trust architecture
- Build and optimize CI/CD pipelines for automated testing, security scanning, and deployment
- Implement secrets management solutions
- Monitor infrastructure health, performance, and security using observability tools
- Manage cloud costs and optimize resource utilization
- Document infrastructure architecture, runbooks, and operational procedures
- Collaborate with development teams to ensure smooth deployments and platform reliability
Required Qualifications
- 6+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
- Strong hands-on experience with Kubernetes (cluster administration, Helm, operators, CRDs)
- Proficiency with at least one major cloud provider (AWS, GCP, or Azure) and associated services
- Experience implementing authentication/authorization systems (OAuth 2.0, OIDC, SAML, JWT)
- Solid understanding of cloud security principles, IAM policies, and network security
- Advanced proficiency using mulitple AI Coding Assistant to develop software with guardrails to ensure high quality.
- Experience with Infrastructure as Code (Terraform strongly preferred)
- Proficiency in scripting languages (Bash, Python, Go)
- Experience with CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
- Familiarity with container security, vulnerability scanning, and compliance frameworks
- Strong troubleshooting skills and experience with production incident management
Preferred Qualifications
- Experience with GPU workloads and AI/ML infrastructure (NVIDIA GPU Operator, CUDA, vGPU)
- Familiarity with ML platforms and model serving infrastructure (Kubeflow, MLflow, Ray, Triton)
- Experience supporting commercial SaaS products with high availability requirements
- Knowledge of service mesh technologies (Istio, Linkerd, Cilium)
- Experience with identity providers and SSO integration (Okta, Auth0, Keycloak)
- Familiarity with compliance frameworks (SOC 2, HIPAA, GDPR, FedRAMP)
- Experience with GitOps workflows and tools (ArgoCD, Flux)
- Cloud certifications (AWS Solutions Architect, CKA/CKAD, GCP Professional Cloud Architect)
- Multi-cloud or hybrid cloud architecture experience
Technical Environment
- Cloud Platforms: AWS (primary), GCP, Azure
- Orchestration: Kubernetes (EKS/GKE), Helm, Kustomize, ArgoCD
- Infrastructure as Code: Terraform, Pulumi, CloudFormation
- Security & Identity: OAuth 2.0, OIDC, Vault, AWS IAM, Cert-Manager
- Observability: Prometheus, Grafana, DataDog, ELK Stack, OpenTelemetry
- Networking: VPC, Load Balancers, Ingress Controllers, Service Mesh
- AI/ML (Preferred): NVIDIA GPU Operator, Kubeflow, Ray, Triton Inference Server