Job Openings
ML Engineer — Runtime
About the job ML Engineer — Runtime
Role Overview
Youll be a key part of the team building our Legal Intelligence runtime stack — helping us serve real-time speech recognition, retrieval, and summarization in low-bandwidth, resource-constrained environments.
This is a full-time position focused on making our ML models fast, lightweight, and deployable across thousands of Indian courtrooms — from remote district courts to the Supreme Court. As an early member of the team, you will:
- Collaborate closely with the founding team to enhance model performance, enabling seamless operation for judges and stenographers.
- Identify and implement innovative solutions to optimize machine learning models for various hardware architectures, including CPUs and GPUs.
- Work in close collaboration with cross-functional partners in design, backend, and frontend functions.
- Solve complex problems related to model efficiency and scalability.
- Build cost-effective and scalable systems that can operate efficiently in resource-constrained environments.
Key Responsibilities
- Design and optimize speech and text pipelines — especially for Indic languages.
- Implement compiler-aware workflows that reduce latency, memory, and energy usage.
- Apply compression techniques (quantization, pruning, distillation) to deploy models on diverse and constrained hardware.
- Collaborate with hardware teams to leverage new CPU/GPU/accelerator features via MLIR, LLVM, or ONNX.
- Benchmark, debug, and stress-test inference across thousands of hours of real-world audio and documents.
- Build infrastructure for scalable, cost-efficient inference under heavy workloads.
About You
You dont need to meet every single qualification — we value diverse backgrounds and non-linear paths.
- Educational Background:
- Bachelors or Masters degree in Computer Science, Computer Engineering, or a related field from leading institutions.
- Professional Experience:
- 4+ years of experience in machine learning optimization, model compression, compiler development, or related areas.
- Technical Skills:
- Strong programming skills in Python or C/C++
- Experience with deep learning frameworks (PyTorch or TensorFlow)
- Strong understanding of compiler architectures, including front-end and middle-end optimizations, scheduling, and code generation.
- Familiarity with compiler frameworks such as LLVM or MLIR.
- Hands-on experience with model optimization techniques, including quantization (e.g., Post-Training Quantization, Quantization-Aware Training), pruning, and distillation.
- Knowledge of hardware architectures and experience deploying ML systems in resource-constrained environments
- Additional Qualifications (Preferred):
- Experience with advanced batching strategies and efficient inference engines for large language models.
- Familiarity with retrieval-augmented generation (RAG), graph neural networks (GNNs), and agentic frameworks.
- Experience contributing to research communities, including publications at conferences and/or journals.
What You Will Achieve in a Year
- Optimized our end-to-end ML stack for 5,000+ courtrooms running 10–12 hours daily.
- Solved some of the toughest runtime challenges in our stack — from dialect variability to model drift in noisy courtroom settings.
- Delivered state-of-the-art performance in legal speech and text understanding running on real-world hardware.