Remote | PhD Rater — Up to $100/hr

San Francisco, California, United States

Job Openings Remote | PhD Rater — Up to $100/hr

About the job Remote | PhD Rater — Up to $100/hr

We are sharing a specialised remote opportunity for experienced researchers and technical experts to support a frontier-model evaluation initiative focused on advanced STEM reasoning and agentic workflows.

This project focuses on designing and validating complex benchmark tasks across domains such as data science, machine learning, finance, and coding. The role involves building real-world evaluation tasks, implementing them in Python-based environments, and analysing how advanced AI systems perform in solving complex technical problems.

Key Responsibilities

Design challenging real-world STEM problems for model evaluation
Implement benchmark tasks inside agentic development environments using Python
Create reproducible tasks with executable tests and clearly defined specifications
Analyse model and agent outputs to identify reasoning gaps and failure modes
Evaluate how AI systems perform on complex data science, machine learning, finance, and coding tasks
Document benchmark tasks, environments, and evaluation outcomes

Ideal Profile

Strong candidates may have:

Active or recently completed PhD from a top-tier U.S.-based university
Deep expertise in data science, machine learning, finance, and/or Python-based programming
Strong research background in advanced STEM domains
Experience designing complex technical problems or research benchmarks
Ability to analyse model reasoning traces and diagnose deeper system behaviour issues
Strong analytical and research documentation skills

Educational Background:

PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields

Nice to Have

Experience working with agentic frameworks or LLM tooling ecosystems
Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems
Contributions to open-source software or research projects
Experience analysing complex model behaviour or agent workflows

Why This Opportunity

Contribute directly to frontier AI model evaluation and benchmarking efforts
Work on advanced research challenges in agentic AI systems and STEM reasoning tasks
Collaborate with leading AI labs and technical researchers
Help identify and improve limitations in next-generation AI systems

Contract Details

Independent contractor role
Fully remote with flexible scheduling
Part-time research engagement with expected availability of 30+ hours per week
Competitive rates between $50–$100/hour depending on expertise
Weekly payments via Stripe or Wise
Projects may extend or adjust depending on scope and performance
No access to confidential or proprietary information from employers or institutions

About the Platform

This opportunity is available through a leading AI-driven work platform.

Or refer someone