Job Openings G02 - Data Scientist

About the job G02 - Data Scientist

Role Overview:

You own the data & model lifecycle for evaluation, guardrail, and testing features: turning exploratory research into production services, designing evaluation harnesses, curating / generating datasets, and instrumenting continuous risk & quality monitoring across Litmus and Sentinel. You are the bridge between rapid Responsible AI experimentation and reliable platform delivery.

Job Responsibilities:

  • Productise research prototypes (guardrails, detectors, evaluators) into performant, observable services & APIs.
  • Design/maintain evaluation pipelines (batch & on-demand) for safety, robustness, fairness, leakage, and regression drift.
  • Implement prompt / model optimisation strategies (quantisation, caching, dynamic routing, selective execution) to hit latency & cost budgets.
  • Develop automated benchmarking harnesses integrating internal & external suites (jailbreak, prompt injection, harassment, PII, off-topic, leakage).
  • Define graduation criteria + sign-off checklist for moving a prototype to GA (coverage, bias metrics, drift tolerance, alert thresholds).
  • Build monitoring & alerting (precision / recall, calibration, drift, FP/FN balance, cost, latency) and drive remediation playbooks.
  • Contribute reusable internal / open-source evaluation & guardrail components; document patterns & anti-patterns.
  • Support CI/CD with golden sets, seeded adversarial test packs, and safety regression gates blocking non-compliant releases.

Qualifications:

  • 3+ years (or demonstrably equivalent) delivering end-to-end ML / data science solutions (scoping data modelling deployment monitoring).
  • Strong Python (data tooling, modern packaging, async patterns) plus PyTorch / TensorFlow or equivalent.
  • Hands-on with LLM integration patterns (prompt engineering, evaluation, fine-tuning / adapters, or RAG pipelines).
  • Applied understanding of Responsible AI (safety, robustness, fairness, privacy) and how to operationalise metrics (e.g. drift, guardrail precision/recall).
  • Cloud + container (AWS / GCP / Azure, Docker).
  • Familiarity with vector stores, embedding generation, and prompt/output tracing or observability frameworks.
  • Sound experimental design (statistical validity, variance reduction, confidence thresholds).
  • Ability to write production-quality, testable code; effective code review participation.
  • Clear communication of technical risk & metric trade-offs to product stakeholders.

Preferred / Bonus Qualifications:

  • Experience implementing guardrail or policy frameworks (e.g. Llama Guard, NeMo Guardrails, LLM-Guard, custom classifiers).
  • Prior work on adversarial / red-teaming datasets, jailbreak detection, or toxicity / leakage mitigation.
  • Knowledge of model compression (quantisation, distillation) and GPU / accelerator optimisation.
  • Hands-on with RLHF / DPO / preference optimization or synthetic data generation pipelines.