About the job DevSecOps / Platform Engineer (Kubernetes/Terraform) — Full Remote Portugal
ABOUT THE OPPORTUNITY
Join a technically ambitious organisation operating at the intersection of infrastructure, security, and cutting-edge AI/ML workloads. This is a senior individual contributor role where your work directly shapes the reliability, security posture, and scalability of critical platform infrastructure. You'll be embedded in a high-trust engineering culture that values ownership, technical depth, and continuous improvement — not ticket-pushing.
The environment is complex, the challenges are real, and the impact is immediate. If you thrive in platforms where GPU-accelerated workloads, zero-trust security, and cloud-native tooling converge, this role was built for you.
PROJECT & CONTEXT
You will be working on a mature but evolving internal platform that supports advanced data and ML workloads running on Kubernetes across both cloud (AKS) and on-premises environments. The platform serves cross-functional engineering teams and demands the highest standards in security hardening, observability, and operational resilience.
The stack is modern, the team is senior, and the expectations are high — in the best possible way. You'll be operating in a GitOps-first, security-first culture where Infrastructure as Code is the norm and every change is deliberate and auditable.
WHAT WE'RE LOOKING FOR (Required)
Experience
- 7+ years in DevOps / Platform Engineering
- 2+ years in a dedicated DevSecOps capacity
Kubernetes & Infrastructure
- Deep expertise in Kubernetes — AKS, upstream K8s, or enterprise distributions
- Hands-on experience with on-premises Kubernetes: RKE2, K3s, or OpenShift
- IaC proficiency: Terraform, Helm, Kustomize, YAML, GitOps workflows
- CI/CD pipeline ownership: Azure DevOps and/or GitHub Actions
GPU & ML Workloads
- Experience deploying and managing GPU-accelerated workloads using NVIDIA operators, GPU device plugins, and/or Run:AI
Security (Non-Negotiable)
- CIS-hardened Kubernetes environments
- Zero Trust network architecture principles
- Container security tooling: Trivy, Aqua, Prisma, or equivalent
- SAST/DAST/SCA toolchain implementation
- RBAC, NetworkPolicies, PodSecurityAdmission configuration
- Encryption at rest and in transit; certificate lifecycle management
- Secrets management: HashiCorp Vault and/or Azure Key Vault
- Keycloak configuration and identity management
Observability
- Prometheus and Grafana — deployment, configuration, and dashboard ownership
Networking & Linux
- Strong Linux fundamentals and networking depth: TLS, DNS, Ingress, OAuth/OIDC, VNet, Peering, VPN/Jump Host configuration
Data & Storage
- Experience managing MinIO, MLflow, and PostgreSQL (HA configurations and backup strategies)
Scripting & Languages
- Strong scripting in Python and Bash (required)
NICE TO HAVE (Preferred)
- Experience supporting ML platforms: Kubeflow, MLflow, KServe
- Knowledge of distributed storage systems: Ceph or NetApp
- Background in regulated industries — automotive, aerospace, medical, energy, or railway
- Scripting experience in Go