MLOps Engineer / AI Infrastructure Specialist OC-29
Job Description:
Technologies: Kubernetes, AWS SageMaker, MLflow
Our partner is looking for talented professionals ready for the next step in their careers. This role offers a collaborative environment with meaningful challenges and rewarding growth opportunities.
As a MLOps Engineer / AI Infrastructure Specialist, you'll support multiple projects, collaborate with cross-functional teams, and communicate progress transparently. Ideal candidates enjoy solving complex problems, helping teams succeed, and pushing themselves to deliver high-impact infrastructure.
Job Summary
Join an advanced AI/ML team where youll architect, automate, and scale machine learning infrastructure. This position is perfect for someone passionate about MLOps, DevOps, and production-grade AI systems.
Responsibilities
- Design, implement, and maintain scalable MLOps pipelines for training, evaluation, and deployment.
- Automate workflows using CI/CD tools (GitLab, Jenkins, GitHub Actions).
- Manage containerized environments with Docker and orchestrate deployments via Kubernetes.
- Partner with data scientists and engineers to streamline experimentation and productionize ML models.
- Deploy, monitor, and manage models on cloud ML platforms (AWS SageMaker, Azure ML, Vertex AI).
- Ensure reliability, monitoring, versioning, and automated rollback of ML systems.
- Maintain reliability and punctuality in remote team environments.
Requirements
- English proficiency B2+ (written and spoken).
- 8+ years of experience as an MLOps Engineer / AI Infrastructure Specialist.
- Strong punctuality and reliability for meetings.
- Proficient in Python, with experience deploying models using TensorFlow and/or PyTorch.
- Hands-on experience with Docker and Kubernetes.
- Strong background in CI/CD pipeline implementation.
- Proven experience with cloud ML platforms (AWS SageMaker, Azure ML, or Vertex AI).
Nice to Have
- Experience with workflow orchestration tools (Kubeflow, Airflow) or platforms like Databricks.
- Familiarity with monitoring and IaC tools (Prometheus, Grafana, Terraform).
- Experience with data versioning tools such as DVC or LakeFS.
Position Details
- Type: Full-time consultancy
- Hours: Up to 40 hrs/week
- Location: 100% remote (LATAM)
- Schedule: Flexible core hours
Required Skills:
Grafana PyTorch Offers TensorFlow Pipelines CI/CD Azure Gitlab DevOps Reliability AWS Machine Learning Infrastructure Kubernetes Jenkins Github Docker Design Python English Training