About the job Research Engineer — Judgment Labs
Research Engineer — Judgment Labs
Location: Chinatown, San Francisco, CA (Onsite, 5.5 days/week)
Compensation: $225,000 – $400,000 base + competitive equity
Visa Sponsorship: H-1B supported
Experience Level: 1–4 years
Employment Type: Full-Time
About Judgment Labs
Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM), helping organizations evaluate and monitor AI agent performance in production environments. Their platform identifies behavioral anomalies such as instruction drift, retrieval degradation, and reliability failures across complex workflows.
The company has raised more than $30M from investors including Lightspeed, SV Angel, Valor Equity Partners, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, and Kevin Hartz.
About the Role
Judgment Labs is seeking Research Engineers to build AI systems focused on analyzing agent interaction data, evaluating long-running agent behaviors, and improving autonomous systems through feedback and optimization workflows. This is a highly hands-on applied AI engineering role focused on production systems rather than pure academic research. Engineers will work directly with real-world agent data and deploy systems into production environments supporting finance, legal, operations, and other high-stakes domains.
What You'll Own
- Build systems to aggregate, index, and analyze large-scale agent interaction data
- Develop agent-based systems for evaluating long-running agent behaviors
- Design post-training and optimization workflows for AI agents
- Build tooling and infrastructure for experimentation, analysis, and training
- Work with retrieval systems, evaluation harnesses, and production AI infrastructure
- Own projects end to end with significant autonomy
- Collaborate closely with engineering and research teams
Requirements
- 1–4 years of industry experience in applied AI or generative AI
- Experience building and evaluating AI agents in production
- Strong problem-solving ability with high agency and intellectual curiosity
- Comfortable handling large-scale, messy, real-world datasets
- Experience with retrieval systems, search algorithms, or evaluation harnesses
- Ability to work onsite in San Francisco 5.5 days per week
Nice to Have
- Experience with sandboxed or autonomous evaluation environments
- Agent trajectory analysis or long-running behavior evaluation
- Self-improving or continual learning systems
- Experience at fast-moving AI startups or applied AI organizations
- Reinforcement learning or machine learning systems expertise
This Role Is NOT For
- Those who require heavily structured task management
- Profiles with limited production AI experience
- Pure research backgrounds without shipped systems
- Candidates who cannot relocate or work onsite in San Francisco
Interview Process
- Initial approval review
- Founder vibe check and technical discussion
- Technical interview and problem-solving round
- Work trial
- Offer stage
Logistics
- Role is onsite in San Francisco, 5.5 days/week — please only apply if you can commit to this
- H-1B visa sponsorship available
Shortlisted candidates will be contacted by David Joseph & Co., the recruiting partner managing this search on behalf of Judgment Labs.