ML/MLOps Engineer (Mid-Senior)

Cape Town, Western Cape, South Africa

Or refer someone

Job Openings ML/MLOps Engineer (Mid-Senior)

About the job ML/MLOps Engineer (Mid-Senior)

Function: Data and AI Delivery

Reports to: Head of Data and AI Delivery

Type: Contract (12 months with possibility of extension)

Location: Cape Town, Northern Suburbs (Hybrid)

COMPANY:

Vito Solutions is a Data and AI Intelligence firm founded in 2013, with offices in Cape Town and New York. We build production-grade Data and AI systems for clients across South Africa, the UK, Europe, and the USA. Our team works on revenue-generating use cases including fraud detection, churn modelling, real-time analytics, AI agents, and full data platform builds. Data and AI Intelligence is what we do, and it is how we deliver on every client engagement.

THE ROLE:

We are hiring a Senior ML/ MLOps Engineer to take technical ownership of the ML infrastructure underpinning our client engagements. This is a platform and systems role, not a research role. We want a software engineer who has chosen to specialise in ML systems, not a data scientist who has drifted into infrastructure.

You will be the person clients depend on to keep their ML platforms stable, scalable, and cost-efficient in production. You will set the engineering bar for how Vito delivers ML systems, and you will lift the delivery teams around you to hit that bar.

WHAT YOU WILL DO:

Own the architecture of ML platforms on client engagements, including API design, deployment topology, and cloud infrastructure on a major hyperscaler (GCP)
Automate provisioning and environment management using Infrastructure as Code (Terraform), and ship code through modern CI/CD pipelines
Design and build internal frameworks and tooling that let client data scientists and engineers move models into production safely and quickly
Hold the line on production reliability, security, and scale across every ML system Vito delivers
Run architectural reviews on client work, enforcing clean code, SOLID principles, and pragmatic engineering trade-offs
Identify and remove cost waste across ML, AI, and data workloads, and report savings back to the client as a measurable outcome

WHAT YOU NEED (MUST HAVE):

Bachelor's degree in Computer Science, Software Engineering, or a closely related field
At least 4 years in production software engineering, MLOps, or platform engineering
Strong hands-on architecture experience on at least one major cloud (GCP, AWS, or Azure), specifically running ML workloads in production
Production track record on a managed ML platform (Vertex AI, SageMaker, Bedrock, Azure ML, or Databricks)
Practical experience with managed container or serverless compute services (Cloud Run, AWS Fargate, ECS, Azure Container Apps, or equivalent)
Working knowledge of a cloud data warehouses like (BigQuery, Snowflake, Redshift, Databricks SQL, MS Fabric, Azure Synapse)
Solid CI/CD experience with one or more of GitLab CI, GitHub Actions, Jenkins, Azure DevOps, CircleCI, or Harness
Strong Infrastructure as Code experience with Terraform, Pulumi, CloudFormation, or Bicep
Advanced Python for backend and platform work, plus advanced SQL for data work
Production experience with Docker and Kubernetes
Track record of building and operating ML pipelines and observability tooling (Prometheus, Grafana, Datadog, OpenTelemetry, or similar)

BONUS POINTS FOR:

Direct production experience on Google Cloud Platform
Specific exposure to Vertex AI, Cloud Run, and BigQuery in production
CI/CD work with GitLab and Harness
Consulting or client-facing delivery background
Domain exposure to retail, banking, insurance, or asset management data

WHAT WE LOOK FOR:

Systems engineer by instinct. You think about ML in terms of contracts, interfaces, failure modes, and observability, not notebooks
Mentor by default. You raise the level of the engineers around you, especially through documentation and shared tooling
Calm under complexity. You can untangle messy production pipelines across unfamiliar cloud setups and explain what is broken in plain language
Tool-agnostic. You have a preferred stack but you can read a client's environment and adapt

**Please note: If you have not heard from us within 2 weeks, please consider your application unsuccessful.

Or refer someone