Job Openings GenAI SME - UOB

About the job GenAI SME - UOB

Core Responsibilities of a Gen AI Testing SME

1. Test Strategy & Planning

  • Define comprehensive testing strategies tailored for Gen AI models (LLMs, diffusion models, etc.).
  • Identify key testing dimensions: accuracy, relevance, coherence, bias, toxicity, hallucination, and safety.
  • Develop test plans for different stages: pre-training, fine-tuning, prompt engineering, and deployment.

2. Test Case Design & Automation

  • Design test cases for both deterministic and non-deterministic outputs.
  • Create benchmark datasets and golden sets for evaluation.
  • Develop automated testing pipelines using tools like LangChain, PromptLayer, or custom frameworks.

3. Evaluation Metrics & Analysis

  • Define and apply appropriate evaluation metrics (e.g., BLEU, ROUGE, perplexity, factual consistency).
  • Analyze model outputs for hallucinations, bias, and harmful content.
  • Conduct A/B testing and human-in-the-loop evaluations.

4. Prompt & Scenario Testing

  • Test prompt robustness across variations, edge cases, and adversarial inputs.
  • Validate prompt templates and chaining logic in RAG or agent-based systems.
  • Ensure consistency and reliability across different user intents and contexts.

5. Risk & Compliance Testing

  • Validate adherence to responsible AI principles (fairness, transparency, accountability).
  • Test for compliance with data privacy laws (e.g., GDPR, PDPA) and industry regulations.
  • Identify and mitigate risks related to model misuse or unintended behavior.

6. Tooling & Infrastructure

  • Set up and maintain testing environments for Gen AI models (cloud-based or on-prem).
  • Integrate testing into CI/CD pipelines for continuous validation.
  • Leverage synthetic data generation and simulation tools for scalable testing.

7. Collaboration & Reporting

  • Work closely with Testing Domain Lead, data scientists, ML engineers, product teams.
  • Document test results, issues, and recommendations clearly.
  • Provide feedback loops to improve model training, fine-tuning, and deployment.