About the job AI Operations Engineer
About BukuWarung
BukuWarung’s vision is to empower 60mn MSMEs in Indonesia to become financially aware and enable them to manage and grow their business using our technology platform from bookkeeping and digital payments to AI-driven merchant operations.
As part of our next growth phase, we are expanding our AI Platform and Operations function to build scalable, intelligent systems that accelerate product development, automate operations, and make infrastructure self-optimizing.
We are looking for an AI Engineer (AI Operations Engineer) who can bridge the gap between AI product development and infrastructure management designing, deploying, and maintaining AI systems that power both internal tools and production workloads.
Key Responsibilities
1) AI Product Development
Design, train, and deploy machine learning or LLM-based models that solve core operational and product problems (e.g., anomaly detection, classification, forecasting, and conversational AI).
Build modular APIs and microservices for inference, data processing, and automation.
Collaborate with product teams to prototype, test, and iterate on AI-first user experiences.
Convert experimental notebooks into production-grade pipelines and scalable services.
2) AI Infrastructure & Reliability
Design and maintain scalable ML infrastructure across training, deployment, and monitoring workflows.
Build CI/CD pipelines for model delivery, manage containerized inference systems, and ensure production reliability.
Implement observability for AI models tracking drift, latency, performance, and cost.
Collaborate with DevOps and platform engineering to optimize compute utilization, GPU scheduling, and cost management.
3) Automation & AIOps
Automate workflows for model retraining, deployment, and validation.
Build systems for intelligent alerting, anomaly detection, and auto-remediation of AI services.
Integrate AI pipelines into existing DevOps and monitoring tools for proactive issue management.
4) Data Pipeline & Tooling
Develop robust data ingestion and processing pipelines (structured/unstructured).
Manage feature stores, vector databases, and embeddings pipelines for retrieval-augmented generation (RAG) systems.
Build internal developer tools and utilities for faster experimentation and monitoring.
5) Collaboration & Governance
Partner closely with AI researchers, backend engineers, and product managers to translate business needs into reliable AI systems.
Contribute to MLOps best practices, documentation, and standardization.
Ensure compliance with BukuWarung’s data security, audit, and ethical AI frameworks.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
5+ years of hands-on experience in backend or ML engineering roles
- Strong programming skills in Python (FastAPI, Flask) and familiarity with microservice design
- Experience deploying and monitoring ML/LLM workloads in production (batch and real-time)
- Proficiency with:
ML/AI frameworks (PyTorch, TensorFlow, Hugging Face, LangChain)
Infrastructure tools (Docker, Kubernetes, Terraform, Airflow)
Cloud platforms (GCP, AWS, or Azure)
Observability stack (Prometheus, Grafana, ELK, OpenTelemetry
Experience managing GPU-based workloads and cost optimization
- Excellent problem-solving, debugging, and automation skills
- Familiarity with vector databases (Pinecone, Weaviate, FAISS) and RAG pipeline architecture
Preferred Experience
Built and deployed AI-powered automation systems or developer tools
Experience with LLM fine-tuning, embedding generation, or prompt engineering
Exposure to distributed systems and scalable API design
Understanding of data governance, security, and compliance in AI workflows
Previous experience in fintech, SaaS, or infrastructure-heavy products