Job Openings Founding Engineer – Full Stack ML DevTools & Systems

About the job Founding Engineer – Full Stack ML DevTools & Systems

Founding Engineer – Full Stack ML DevTools & Systems

Location: San Francisco, CA
Type: Full-Time
Base Compensation: $150,000 – $250,000
Equity: Competitive Series A Equity Package

Overview

This is a founding-level engineering role within a Series A AI infrastructure company building core developer tools and platform primitives for post-training, evaluation, and reinforcement learning workflows.

The platform enables ML engineers and researchers to:

  • Create structured training data

  • Run reinforcement fine-tuning workflows

  • Evaluate model performance reliably and reproducibly at scale

This is a high-ownership role at the center of the product. You will operate across the Python SDK, backend systems, infrastructure, and developer experience—partnering directly with frontier labs, enterprise AI teams, and AI-native startups.

This is not a narrow feature role. You will shape foundational platform architecture and developer workflows that power advanced model training systems.

Core Responsibilities

Platform & Backend Systems

  • Design and implement backend systems supporting post-training workflows, dataset primitives, run tracking, and artifact management

  • Build reliable execution and orchestration systems with strong isolation and reproducibility

  • Improve observability, debugging capabilities, and performance across job execution and distributed data pipelines

  • Contribute to containerized infrastructure and Kubernetes-based deployment patterns

Python SDK & Developer Experience

  • Own and evolve the Python SDK with clean APIs, strong documentation, intuitive defaults, and extensibility

  • Design developer-friendly abstractions for reinforcement learning, evaluation loops, and training workflows

  • Develop evaluation-native workflows connecting capability measurement, data creation, training, and re-evaluation loops

  • Improve CLI tools, developer interfaces, and local-to-cloud workflows

Infrastructure & Cloud Systems

  • Work across compute, networking, storage, and IAM configurations

  • Design systems that are scalable, reproducible, and secure

  • Collaborate on distributed systems design and execution infrastructure

Customer & Research Collaboration

  • Partner directly with ML engineers and researchers to translate real-world workflows into platform improvements

  • Incorporate structured customer feedback into roadmap decisions

  • Operate at the intersection of research needs and production reliability

Requirements

  • Strong production experience in Python

  • Comfort operating across the stack, including APIs, backend systems, data systems, and frontend integration

  • Deep understanding of Docker and Linux environments

  • Cloud fundamentals: compute, networking, storage, IAM

  • Strong product instincts with a bias toward shipping

  • Demonstrated end-to-end ownership of production systems


Required Candidate Q&A

  1. LinkedIn Profile

  2. GitHub URL

  3. Publications URL (Google Scholar or similar, if applicable)

Interview Process

  1. Initial Screen

  2. Technical Evaluation

  3. Work Trial

  4. Final Discussion

  5. Offer Decision