Founding Engineer – Full Stack ML DevTools & Systems

San Francisco, California, United States

$ 150,000.00 - 250,000.00 (US Dollar)

Job Openings Founding Engineer – Full Stack ML DevTools & Systems

About the job Founding Engineer – Full Stack ML DevTools & Systems

Founding Engineer – Full Stack ML DevTools & Systems

Location: San Francisco, CA
Type: Full-Time
Base Compensation: $150,000 – $250,000
Equity: Competitive Series A Equity Package

Overview

This is a founding-level engineering role within a Series A AI infrastructure company building core developer tools and platform primitives for post-training, evaluation, and reinforcement learning workflows.

The platform enables ML engineers and researchers to:

Create structured training data
Run reinforcement fine-tuning workflows
Evaluate model performance reliably and reproducibly at scale

This is a high-ownership role at the center of the product. You will operate across the Python SDK, backend systems, infrastructure, and developer experience—partnering directly with frontier labs, enterprise AI teams, and AI-native startups.

This is not a narrow feature role. You will shape foundational platform architecture and developer workflows that power advanced model training systems.

Core Responsibilities

Platform & Backend Systems

Design and implement backend systems supporting post-training workflows, dataset primitives, run tracking, and artifact management
Build reliable execution and orchestration systems with strong isolation and reproducibility
Improve observability, debugging capabilities, and performance across job execution and distributed data pipelines
Contribute to containerized infrastructure and Kubernetes-based deployment patterns

Python SDK & Developer Experience

Own and evolve the Python SDK with clean APIs, strong documentation, intuitive defaults, and extensibility
Design developer-friendly abstractions for reinforcement learning, evaluation loops, and training workflows
Develop evaluation-native workflows connecting capability measurement, data creation, training, and re-evaluation loops
Improve CLI tools, developer interfaces, and local-to-cloud workflows

Infrastructure & Cloud Systems

Work across compute, networking, storage, and IAM configurations
Design systems that are scalable, reproducible, and secure
Collaborate on distributed systems design and execution infrastructure

Customer & Research Collaboration

Partner directly with ML engineers and researchers to translate real-world workflows into platform improvements
Incorporate structured customer feedback into roadmap decisions
Operate at the intersection of research needs and production reliability

Requirements

Strong production experience in Python
Comfort operating across the stack, including APIs, backend systems, data systems, and frontend integration
Deep understanding of Docker and Linux environments
Cloud fundamentals: compute, networking, storage, IAM
Strong product instincts with a bias toward shipping
Demonstrated end-to-end ownership of production systems

Required Candidate Q&A

LinkedIn Profile
GitHub URL
Publications URL (Google Scholar or similar, if applicable)

Interview Process

Initial Screen
Technical Evaluation
Work Trial
Final Discussion
Offer Decision

Or refer someone