About the job AI QA Specialist (Conversational & LLM Systems)
About Welvaart
At Welvaart, we create technology solutions that put people at the center.
Our close leadership style and flexible culture of growth empower our teams and elevate the quality of our delivery. We combine rigor, innovation, and empathy to drive projects that transform businesses and build lasting relationships of trust.
We complement this vision with a performance‑driven Digital Marketing offering, helping companies strengthen their visibility, enhance their online presence, and accelerate growth through smart, measurable strategies.
Project
You will define and maintain evaluation strategies for AI and LLM systems, creating and managing versioned datasets that cover core scenarios, edge cases, negative paths, and safety conditions. You will validate conversational behavior end to end—from intent recognition and slot extraction to state transitions, business rules, and tool or function-calling correctness.
You will play a key role in detecting regressions and evaluation drift as models or prompts evolve, defining meaningful metrics and thresholds (accuracy, precision, recall, F1), and providing clear quality signals to support release decisions. Working closely with QA and ML teams, you will integrate evaluation practices into CI/CD and help bring structure and determinism to inherently non-deterministic systems.
Role
- Define and maintainevaluation strategiesfor AI / LLM systems.
- Create and manageversioned datasets(core, edge, negative, safety cases).
- Validate conversational behavior:
- intent matching and slot extraction.
- state transitions and business rules.
- tool / function calling correctness.
- Detectregressions and evaluation driftacross model or prompt changes.
- Define metrics and thresholds (accuracy, precision, recall, F1).
- Support release decisions withquality signals and reports.
- Collaborate with QA and ML teams to integrate evaluation into CI/CD.
We are looking for
- 4+ yearsof experience in QA, data, ML, or AI-adjacent roles.
- Hands-on experience testingAI / LLM or NLU-based systems.
- Strong understanding ofnon-determinism and evaluation challenges.
- Experience withstructured outputs(JSON schemas, tool/function calling).
- Strong analytical mindset and test data design skills.
- Ability to definedeterministic validationwhere possible.
- Experience testingvoice or conversational systems.
- Background indata quality, analytics, or ML pipelines.
- Experience withobservability or monitoring.
- Automotive / embedded / AAOS experience.
- Scripting skills (Python preferred).
What you can discover with us?
- Be part of a tech start-up
- Different scopes of project in different sectors
- Structure of fairness and equity salary (Consultant Profile)
- Training & Certification
- Career Path management
- More than 30 Partnerships
UNLEASH THE POWER OF YOUR CAREER