AI Researcher: Speech

Ikorodu, Lagos, Nigeria

Job Openings AI Researcher: Speech

AI Researcher: Speech

Key Responsibilities

Design new ASR and TTS model architectures or extend existing ones (Conformer variants, RNN-T improvements, VITS, diffusion TTS, multilingual encoders).

Build novel methods for low resource languages, including self supervised pretraining, data efficient learning, and cross linguistic transfer.
Develop custom phoneme inventories, grapheme to phoneme systems, lexicons, and alignment strategies for underrepresented languages.
Prototype new approaches for tone modeling, prosody control, accent preservation, and style transfer.
Conduct controlled research experiments, ablations, and internal benchmarking studies.
Evaluate models with MOS, WER/CER, prosody diagnostics, acoustic metrics, and perceptual tests.
Investigate failure modes such as alignment collapse, instability, hallucinations, and language drift.
Collaborate with ASR/TTS engineers to transition research prototypes into production pipelines.
Publish internal technical reports, maintain reproducible research pipelines, and track emerging literature.
Explore advanced generative models like diffusion, flow matching, and large multimodal speech language models.
Build and test custom training recipes for multilingual, multispeaker, and multi style models.
Work with data engineering to design labeling strategies, data augmentation, and quality control for research datasets.

Person Profile

MSc or PhD in Speech Processing, Machine Learning, Computational Linguistics, Signal Processing, or a related field.
Equivalent research experience is acceptable if the candidate has built ASR or TTS systems from scratch.
Deep understanding of acoustic modeling, phonetics, prosody, and generative speech architectures.
Demonstrated experience designing or extending models such as Conformer, RNN-T, wav2vec 2.0, HuBERT, VITS, Tacotron, FastSpeech, or diffusion based TTS systems.
Skilled in PyTorch with experience running large scale distributed training and profiling.
Strong background in text normalization, phoneme modeling, G2P, MFA alignment, and multilingual speech processing.
Experience working with low resource languages, tonal systems, dialect variation, and orthographic inconsistencies.
Able to create custom loss functions, training strategies, conditioning mechanisms, and model components.
Strong experimentation discipline, including baselines, ablations, reproducibility, and rigorous documentation.
Able to move between deep research and fast prototyping, with clear communication of results.
Bonus: experience in African languages, accent modeling, speech enhancement, or perceptual evaluation design.

Or refer someone