Job Openings Senior DevOps Engineer

About the job Senior DevOps Engineer

About the Role

We are looking for a Senior DevOps Engineer with deep expertise in platform infrastructure to support a commercial SaaS product. You will be responsible for designing, building, and maintaining secure, scalable cloud infrastructure with a strong emphasis on Kubernetes orchestration, identity management, and security best practices. This role offers the opportunity to work on cutting-edge infrastructure supporting AI/ML workloads and GPU-accelerated computing.

Key Responsibilities

  • Design, deploy, and manage production Kubernetes clusters across cloud environments (AWS EKS, GCP GKE, Azure AKS, Native)
  • Implement and maintain Infrastructure as Code using Terraform.
  • Architect and implement authentication and authorization systems (OAuth 2.0, OIDC, SAML, RBAC)
  • Design and enforce security policies, network segmentation, and zero-trust architecture
  • Build and optimize CI/CD pipelines for automated testing, security scanning, and deployment
  • Implement secrets management solutions
  • Monitor infrastructure health, performance, and security using observability tools
  • Manage cloud costs and optimize resource utilization
  • Document infrastructure architecture, runbooks, and operational procedures
  • Collaborate with development teams to ensure smooth deployments and platform reliability

Required Qualifications

  • 6+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
  • Strong hands-on experience with Kubernetes (cluster administration, Helm, operators, CRDs)
  • Proficiency with at least one major cloud provider (AWS, GCP, or Azure) and associated services
  • Experience implementing authentication/authorization systems (OAuth 2.0, OIDC, SAML, JWT)
  • Solid understanding of cloud security principles, IAM policies, and network security
  • Advanced proficiency using mulitple AI Coding Assistant to develop software with guardrails to ensure high quality.
  • Experience with Infrastructure as Code (Terraform strongly preferred)
  • Proficiency in scripting languages (Bash, Python, Go)
  • Experience with CI/CD platforms (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
  • Familiarity with container security, vulnerability scanning, and compliance frameworks
  • Strong troubleshooting skills and experience with production incident management

Preferred Qualifications

  • Experience with GPU workloads and AI/ML infrastructure (NVIDIA GPU Operator, CUDA, vGPU)
  • Familiarity with ML platforms and model serving infrastructure (Kubeflow, MLflow, Ray, Triton)
  • Experience supporting commercial SaaS products with high availability requirements
  • Knowledge of service mesh technologies (Istio, Linkerd, Cilium)
  • Experience with identity providers and SSO integration (Okta, Auth0, Keycloak)
  • Familiarity with compliance frameworks (SOC 2, HIPAA, GDPR, FedRAMP)
  • Experience with GitOps workflows and tools (ArgoCD, Flux)
  • Cloud certifications (AWS Solutions Architect, CKA/CKAD, GCP Professional Cloud Architect)
  • Multi-cloud or hybrid cloud architecture experience

Technical Environment

  • Cloud Platforms: AWS (primary), GCP, Azure
  • Orchestration: Kubernetes (EKS/GKE), Helm, Kustomize, ArgoCD
  • Infrastructure as Code: Terraform, Pulumi, CloudFormation
  • Security & Identity: OAuth 2.0, OIDC, Vault, AWS IAM, Cert-Manager
  • Observability: Prometheus, Grafana, DataDog, ELK Stack, OpenTelemetry
  • Networking: VPC, Load Balancers, Ingress Controllers, Service Mesh
  • AI/ML (Preferred): NVIDIA GPU Operator, Kubeflow, Ray, Triton Inference Server