Job Openings Software Engineer (Full‑Stack / Infrastructure) — Frontier AI Evaluation

About the job Software Engineer (Full‑Stack / Infrastructure) — Frontier AI Evaluation

About the Team

We build the data, evaluation, and experimentation infrastructure powering next‑generation agentic AI systems. Our work directly supports all five leading AI labs and focuses on the hardest problems in LLM reasoning, RL environments, and human‑in‑the‑loop workflows.

We're a fast‑moving, talent‑dense team with backgrounds in quant finance, top‑tier startups, and elite engineering orgs. Revenue is already in the 8‑figure range with a steep growth curve and a major Series A on the way.

The Role

This is a broad, high‑ownership engineering role — not a narrow feature lane.

You'll work across research, infra, product, and data, owning systems end‑to‑end. Expect to touch everything from RL environments to distributed infra to full‑stack dashboards.

A typical month might include:

  • Prototyping a new RL environment from a research paper
  • Deploying distributed experiments on Kubernetes
  • Improving reliability of Next.js dashboards
  • Building a Kafka pipeline for annotator analytics

You'll shape core systems used by frontier AI labs from day one.

What You'll Do

  • Build scalable systems: RL environments, APIs, human‑in‑the‑loop platforms
  • Collaborate with research, product, and design to ship quickly
  • Write clean, maintainable code with strong documentation
  • Participate in architecture discussions and code reviews
  • Solve real‑world scalability and reliability challenges
  • Contribute to the infrastructure powering frontier AI evaluation

Who We're Looking For

We're looking for early‑career engineers who have already shown they can thrive in fast‑moving, high‑ownership environments and want to work on some of the most challenging problems in AI.

Experience

  • 1–3 years as a full‑stack software engineer
  • Background at a high‑growth startup, top quantitative trading firm, or experience as a founding engineer at a company with meaningful early traction
  • If your experience is primarily big tech, we look for a strong CS foundation (e.g., top‑tier CS programs such as Berkeley, CMU, MIT, Stanford)

Bonus Experience

  • Time spent at companies focused on human‑in‑the‑loop AI, data labeling, or AI evaluation (e.g., Surge AI, Snorkel, Scale, Labelbox, Micro1, Mercor)
  • Exposure to fast‑paced environments where you shipped features end‑to‑end and owned outcomes

What Matters Most

  • You've built real systems — not just maintained them
  • You take ownership, move quickly, and enjoy solving hard technical problems
  • You're comfortable working directly with researchers, product teams, and customers
  • You thrive in environments where the roadmap changes based on what you learn

Technical Skills

  • Full‑stack: Next.js / React, Node.js / Python
  • Infra: Kubernetes, Kafka, Redis, Elasticsearch
  • Ability to build end‑to‑end systems with high ownership

Soft Skills

  • Strong ownership and bias toward shipping
  • Comfortable being client‑facing with AI lab researchers
  • Thrives in fast‑paced, high‑iteration environments

Work Environment

5 days/week onsite in Financial District
Flexible hours 
Optional half‑day or remote on Sundays
Tight‑knit, high‑trust, high‑velocity team

Why Join

  • Work directly with frontier AI labs
  • Solve the hardest problems in AI evaluation
  • Massive ownership and impact from day one
  • Build at a scale most AI startups never reach
  • Join a team of elite engineers and operators

.