Job Openings AI Researcher: Speech

About the job AI Researcher: Speech

AI Researcher: Speech

Key Responsibilities

  • Design new ASR and TTS model architectures or extend existing ones (Conformer variants, RNN-T improvements, VITS, diffusion TTS, multilingual encoders).
  • Build novel methods for low resource languages, including self supervised pretraining, data efficient learning, and cross linguistic transfer.
  • Develop custom phoneme inventories, grapheme to phoneme systems, lexicons, and alignment strategies for underrepresented languages.
  • Prototype new approaches for tone modeling, prosody control, accent preservation, and style transfer.
  • Conduct controlled research experiments, ablations, and internal benchmarking studies.
  • Evaluate models with MOS, WER/CER, prosody diagnostics, acoustic metrics, and perceptual tests.
  • Investigate failure modes such as alignment collapse, instability, hallucinations, and language drift.
  • Collaborate with ASR/TTS engineers to transition research prototypes into production pipelines.
  • Publish internal technical reports, maintain reproducible research pipelines, and track emerging literature.
  • Explore advanced generative models like diffusion, flow matching, and large multimodal speech language models.
  • Build and test custom training recipes for multilingual, multispeaker, and multi style models.
  • Work with data engineering to design labeling strategies, data augmentation, and quality control for research datasets.

Person Profile

  • MSc or PhD in Speech Processing, Machine Learning, Computational Linguistics, Signal Processing, or a related field.
  • Equivalent research experience is acceptable if the candidate has built ASR or TTS systems from scratch.
  • Deep understanding of acoustic modeling, phonetics, prosody, and generative speech architectures.
  • Demonstrated experience designing or extending models such as Conformer, RNN-T, wav2vec 2.0, HuBERT, VITS, Tacotron, FastSpeech, or diffusion based TTS systems.
  • Skilled in PyTorch with experience running large scale distributed training and profiling.
  • Strong background in text normalization, phoneme modeling, G2P, MFA alignment, and multilingual speech processing.
  • Experience working with low resource languages, tonal systems, dialect variation, and orthographic inconsistencies.
  • Able to create custom loss functions, training strategies, conditioning mechanisms, and model components.
  • Strong experimentation discipline, including baselines, ablations, reproducibility, and rigorous documentation.
  • Able to move between deep research and fast prototyping, with clear communication of results.
  • Bonus: experience in African languages, accent modeling, speech enhancement, or perceptual evaluation design.