Lima, Peru

MLOps Engineer / AI Infrastructure Specialist OC-29

 Job Description:

Technologies: Kubernetes, AWS SageMaker, MLflow

Our partner is looking for talented professionals ready for the next step in their careers. This role offers a collaborative environment with meaningful challenges and rewarding growth opportunities.

As a MLOps Engineer / AI Infrastructure Specialist, you'll support multiple projects, collaborate with cross-functional teams, and communicate progress transparently. Ideal candidates enjoy solving complex problems, helping teams succeed, and pushing themselves to deliver high-impact infrastructure.

Job Summary

Join an advanced AI/ML team where youll architect, automate, and scale machine learning infrastructure. This position is perfect for someone passionate about MLOps, DevOps, and production-grade AI systems.

Responsibilities

  • Design, implement, and maintain scalable MLOps pipelines for training, evaluation, and deployment.
  • Automate workflows using CI/CD tools (GitLab, Jenkins, GitHub Actions).
  • Manage containerized environments with Docker and orchestrate deployments via Kubernetes.
  • Partner with data scientists and engineers to streamline experimentation and productionize ML models.
  • Deploy, monitor, and manage models on cloud ML platforms (AWS SageMaker, Azure ML, Vertex AI).
  • Ensure reliability, monitoring, versioning, and automated rollback of ML systems.
  • Maintain reliability and punctuality in remote team environments.

Requirements

  • English proficiency B2+ (written and spoken).
  • 8+ years of experience as an MLOps Engineer / AI Infrastructure Specialist.
  • Strong punctuality and reliability for meetings.
  • Proficient in Python, with experience deploying models using TensorFlow and/or PyTorch.
  • Hands-on experience with Docker and Kubernetes.
  • Strong background in CI/CD pipeline implementation.
  • Proven experience with cloud ML platforms (AWS SageMaker, Azure ML, or Vertex AI).

Nice to Have

  • Experience with workflow orchestration tools (Kubeflow, Airflow) or platforms like Databricks.
  • Familiarity with monitoring and IaC tools (Prometheus, Grafana, Terraform).
  • Experience with data versioning tools such as DVC or LakeFS.

Position Details

  • Type: Full-time consultancy
  • Hours: Up to 40 hrs/week
  • Location: 100% remote (LATAM)
  • Schedule: Flexible core hours


  Required Skills:

Grafana PyTorch Offers TensorFlow Pipelines CI/CD Azure Gitlab DevOps Reliability AWS Machine Learning Infrastructure Kubernetes Jenkins Github Docker Design Python English Training