Job Openings
AI Researcher: Speech
About the job AI Researcher: Speech
AI Researcher: Speech
Key Responsibilities
- Design new ASR and TTS model architectures or extend existing ones (Conformer variants, RNN-T improvements, VITS, diffusion TTS, multilingual encoders).
- Build novel methods for low resource languages, including self supervised pretraining, data efficient learning, and cross linguistic transfer.
- Develop custom phoneme inventories, grapheme to phoneme systems, lexicons, and alignment strategies for underrepresented languages.
- Prototype new approaches for tone modeling, prosody control, accent preservation, and style transfer.
- Conduct controlled research experiments, ablations, and internal benchmarking studies.
- Evaluate models with MOS, WER/CER, prosody diagnostics, acoustic metrics, and perceptual tests.
- Investigate failure modes such as alignment collapse, instability, hallucinations, and language drift.
- Collaborate with ASR/TTS engineers to transition research prototypes into production pipelines.
- Publish internal technical reports, maintain reproducible research pipelines, and track emerging literature.
- Explore advanced generative models like diffusion, flow matching, and large multimodal speech language models.
- Build and test custom training recipes for multilingual, multispeaker, and multi style models.
- Work with data engineering to design labeling strategies, data augmentation, and quality control for research datasets.
Person Profile
- MSc or PhD in Speech Processing, Machine Learning, Computational Linguistics, Signal Processing, or a related field.
- Equivalent research experience is acceptable if the candidate has built ASR or TTS systems from scratch.
- Deep understanding of acoustic modeling, phonetics, prosody, and generative speech architectures.
- Demonstrated experience designing or extending models such as Conformer, RNN-T, wav2vec 2.0, HuBERT, VITS, Tacotron, FastSpeech, or diffusion based TTS systems.
- Skilled in PyTorch with experience running large scale distributed training and profiling.
- Strong background in text normalization, phoneme modeling, G2P, MFA alignment, and multilingual speech processing.
- Experience working with low resource languages, tonal systems, dialect variation, and orthographic inconsistencies.
- Able to create custom loss functions, training strategies, conditioning mechanisms, and model components.
- Strong experimentation discipline, including baselines, ablations, reproducibility, and rigorous documentation.
- Able to move between deep research and fast prototyping, with clear communication of results.
- Bonus: experience in African languages, accent modeling, speech enhancement, or perceptual evaluation design.