Job Openings
Senior Software Engineer – LLM Evaluation - RibbitZ
About the job Senior Software Engineer – LLM Evaluation - RibbitZ
Our client RibbitZ is looking for Senior Software Engineer-LLM Evaluation to work remotely.
As a Software Engineering evaluator, you will create cutting-edge datasets for training, benchmarking, and advancing large language models, collaborating closely with researchers. This includes curating code examples, providing precise solutions, and making corrections in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go; evaluating and refining AI-generated code for efficiency, scalability, and reliability; and working with cross-functional teams to enhance enterprise-level AI-driven coding solutions.
What Does a Typical Day Look Like?
- Working on AI model training initiatives by curating code examples, building solutions, and correcting code in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go.
- Evaluate and refine AI-generated code to ensure that it is efficient, scalable, and reliable.
- Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks.
- Build agents that can verify the quality of the code and identify error patterns.
- Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them
- Design verification mechanisms that can automatically verify a solution to a software engineering task.
Required Skills
- Several years of software engineering experience (+5 years), including, 2+years of continuous full-time experience at a top-tier product company (e.g., Google, Stripe, Amazon, Apple, Meta, Netflix, Microsoft, Datadog, Dropbox, Shopify, PayPal, IBM Research).
- Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools.
- Deep understanding of software architecture, design, development, debugging, and code quality/review assessment.
- Excellent oral and written communication skills for clear, structured evaluation rationales.
Eligibility (Strictly Enforced):
- Software Engineering profiles only
- Candidates must be based in the US
- 5+ years of relevant experience
- Immediate assessment availability
Top companies:
- Google (Alphabet)
- Apple
- Amazon
- Meta (Facebook)
- Netflix
- Microsoft
- Tesla
- NVIDIA
- Adobe
- Salesforce
- Github
- Atlassian
- hashiCorp
- Databricks
- Snowflake
- Cloudflare, DigitalOcean, MongoDB
- Elastic, Confluent, Airbnb, Dropbox
- Stripe, Palantir, Uber, Lyft
- Square (Block), Twilio, Snap Inc.
- Pinterest, Figma, Oracle, Cisco
- Paypal, Doordash, Rivian, Reddit, Coinbase, Splunk
- Spotify, Goldman Sachs, Morgan Stanley
- JP Morgan Chase, Capital One
- Plaid, Shopify, Intuit, Workday, ServiceNow
- Hugging Face, VMware, Brex, Wise
- Epic Games, Unity Technologies
- Activision Blizzard, Riot Games, Valve
- Huawei, Bloomberg, ByteDance
- Alibaba, Baidu, Notion, Klarna
- Instacart, Zillow.