Job Openings
GenAI SME - UOB
About the job GenAI SME - UOB
Core Responsibilities of a Gen AI Testing SME
1. Test Strategy & Planning
- Define comprehensive testing strategies tailored for Gen AI models (LLMs, diffusion models, etc.).
- Identify key testing dimensions: accuracy, relevance, coherence, bias, toxicity, hallucination, and safety.
- Develop test plans for different stages: pre-training, fine-tuning, prompt engineering, and deployment.
2. Test Case Design & Automation
- Design test cases for both deterministic and non-deterministic outputs.
- Create benchmark datasets and golden sets for evaluation.
- Develop automated testing pipelines using tools like LangChain, PromptLayer, or custom frameworks.
3. Evaluation Metrics & Analysis
- Define and apply appropriate evaluation metrics (e.g., BLEU, ROUGE, perplexity, factual consistency).
- Analyze model outputs for hallucinations, bias, and harmful content.
- Conduct A/B testing and human-in-the-loop evaluations.
4. Prompt & Scenario Testing
- Test prompt robustness across variations, edge cases, and adversarial inputs.
- Validate prompt templates and chaining logic in RAG or agent-based systems.
- Ensure consistency and reliability across different user intents and contexts.
5. Risk & Compliance Testing
- Validate adherence to responsible AI principles (fairness, transparency, accountability).
- Test for compliance with data privacy laws (e.g., GDPR, PDPA) and industry regulations.
- Identify and mitigate risks related to model misuse or unintended behavior.
6. Tooling & Infrastructure
- Set up and maintain testing environments for Gen AI models (cloud-based or on-prem).
- Integrate testing into CI/CD pipelines for continuous validation.
- Leverage synthetic data generation and simulation tools for scalable testing.
7. Collaboration & Reporting
- Work closely with Testing Domain Lead, data scientists, ML engineers, product teams.
- Document test results, issues, and recommendations clearly.
- Provide feedback loops to improve model training, fine-tuning, and deployment.