Job Openings
G02 - Data Scientist
About the job G02 - Data Scientist
Role Overview:
You own the data & model lifecycle for evaluation, guardrail, and testing features: turning exploratory research into production services, designing evaluation harnesses, curating / generating datasets, and instrumenting continuous risk & quality monitoring across Litmus and Sentinel. You are the bridge between rapid Responsible AI experimentation and reliable platform delivery.
Job Responsibilities:
- Productise research prototypes (guardrails, detectors, evaluators) into performant, observable services & APIs.
- Design/maintain evaluation pipelines (batch & on-demand) for safety, robustness, fairness, leakage, and regression drift.
- Implement prompt / model optimisation strategies (quantisation, caching, dynamic routing, selective execution) to hit latency & cost budgets.
- Develop automated benchmarking harnesses integrating internal & external suites (jailbreak, prompt injection, harassment, PII, off-topic, leakage).
- Define graduation criteria + sign-off checklist for moving a prototype to GA (coverage, bias metrics, drift tolerance, alert thresholds).
- Build monitoring & alerting (precision / recall, calibration, drift, FP/FN balance, cost, latency) and drive remediation playbooks.
- Contribute reusable internal / open-source evaluation & guardrail components; document patterns & anti-patterns.
- Support CI/CD with golden sets, seeded adversarial test packs, and safety regression gates blocking non-compliant releases.
Qualifications:
- 3+ years (or demonstrably equivalent) delivering end-to-end ML / data science solutions (scoping data modelling deployment monitoring).
- Strong Python (data tooling, modern packaging, async patterns) plus PyTorch / TensorFlow or equivalent.
- Hands-on with LLM integration patterns (prompt engineering, evaluation, fine-tuning / adapters, or RAG pipelines).
- Applied understanding of Responsible AI (safety, robustness, fairness, privacy) and how to operationalise metrics (e.g. drift, guardrail precision/recall).
- Cloud + container (AWS / GCP / Azure, Docker).
- Familiarity with vector stores, embedding generation, and prompt/output tracing or observability frameworks.
- Sound experimental design (statistical validity, variance reduction, confidence thresholds).
- Ability to write production-quality, testable code; effective code review participation.
- Clear communication of technical risk & metric trade-offs to product stakeholders.
Preferred / Bonus Qualifications:
- Experience implementing guardrail or policy frameworks (e.g. Llama Guard, NeMo Guardrails, LLM-Guard, custom classifiers).
- Prior work on adversarial / red-teaming datasets, jailbreak detection, or toxicity / leakage mitigation.
- Knowledge of model compression (quantisation, distillation) and GPU / accelerator optimisation.
- Hands-on with RLHF / DPO / preference optimization or synthetic data generation pipelines.