Job Openings ML Engineer — Runtime

About the job ML Engineer — Runtime

Role Overview

Youll be a key part of the team building our Legal Intelligence runtime stack — helping us serve real-time speech recognition, retrieval, and summarization in low-bandwidth, resource-constrained environments.

This is a full-time position focused on making our ML models fast, lightweight, and deployable across thousands of Indian courtrooms — from remote district courts to the Supreme Court. As an early member of the team, you will:

  • Collaborate closely with the founding team to enhance model performance, enabling seamless operation for judges and stenographers.
  • Identify and implement innovative solutions to optimize machine learning models for various hardware architectures, including CPUs and GPUs.
  • Work in close collaboration with cross-functional partners in design, backend, and frontend functions.
  • Solve complex problems related to model efficiency and scalability.
  • Build cost-effective and scalable systems that can operate efficiently in resource-constrained environments.

Key Responsibilities

  • Design and optimize speech and text pipelines — especially for Indic languages.
  • Implement compiler-aware workflows that reduce latency, memory, and energy usage.
  • Apply compression techniques (quantization, pruning, distillation) to deploy models on diverse and constrained hardware.
  • Collaborate with hardware teams to leverage new CPU/GPU/accelerator features via MLIR, LLVM, or ONNX.
  • Benchmark, debug, and stress-test inference across thousands of hours of real-world audio and documents.
  • Build infrastructure for scalable, cost-efficient inference under heavy workloads.

About You

You dont need to meet every single qualification — we value diverse backgrounds and non-linear paths.

  • Educational Background:

    • Bachelors or Masters degree in Computer Science, Computer Engineering, or a related field from leading institutions.
  • Professional Experience:

    • 4+ years of experience in machine learning optimization, model compression, compiler development, or related areas.
  • Technical Skills:

    • Strong programming skills in Python or C/C++
    • Experience with deep learning frameworks (PyTorch or TensorFlow)
    • Strong understanding of compiler architectures, including front-end and middle-end optimizations, scheduling, and code generation.
    • Familiarity with compiler frameworks such as LLVM or MLIR.
    • Hands-on experience with model optimization techniques, including quantization (e.g., Post-Training Quantization, Quantization-Aware Training), pruning, and distillation.
    • Knowledge of hardware architectures and experience deploying ML systems in resource-constrained environments
  • Additional Qualifications (Preferred):

    • Experience with advanced batching strategies and efficient inference engines for large language models.
    • Familiarity with retrieval-augmented generation (RAG), graph neural networks (GNNs), and agentic frameworks.
    • Experience contributing to research communities, including publications at conferences and/or journals.

What You Will Achieve in a Year

  • Optimized our end-to-end ML stack for 5,000+ courtrooms running 10–12 hours daily.
  • Solved some of the toughest runtime challenges in our stack — from dialect variability to model drift in noisy courtroom settings.
  • Delivered state-of-the-art performance in legal speech and text understanding running on real-world hardware.