About the job Senior Machine Learning / AI Engineer
About the Role
We in Tekhqs are seeking an elite ML/AI Engineer with deep theoretical and engineering mastery in deep learning, LLMs, and generative architectures. You must have trained, fine-tuned, benchmarked, and deployed transformer-based models at scale, understand the research behind every paper you cite, and have shipped production-grade ML systems across distributed GPU environments.
This is a zero-to-one role: no prebuilt dataset, no precleaned labels, no off-the-shelf pipelines. Just research, code, and compute.
Key Responsibilities
- Designing and training foundation models (e.g., GPT, T5, Mistral, LLaMA) from scratch on multi-node GPU clusters.
- Leading full-stack GenAI architecture: tokenizer design, attention variants, pretraining schemes, alignment (RLHF), quantization, serving.
- Conducting frontier research in model optimization, mixture-of-experts (MoE), context length extrapolation, prompt tuning, and memory efficiency.
- Implementing and extending SOTA methods from arXiv (e.g., Phi-3, Gemini, FlashAttention-2, GQA, LLaVA, Orca, DPO, ZeRO-Infinity).
- Managing model lifecycle: data curation pretraining finetuning evaluation quantization deployment postmortems.
- Working with multi-modal inputs (text, image, video, audio, embeddings) for cross-domain GenAI systems.
- Driving production-grade optimization: low-latency inference, batching strategies, CUDA kernel debugging, memory offloading, model sharding.
Requirements
Core ML/DL:
- Transformers, self-attention, residual connections, GELUs, normalization strategies (RMSNorm, LayerNorm, etc.)
- LLM scaling laws, curriculum learning, token sampling strategies (Top-k, Top-p, Temperature, Mirostat, nucleus filtering)
- Contrastive learning, masked modeling, autoregressive generation, denoising diffusion
Training Infrastructure:
- PyTorch Lightning, DeepSpeed, HuggingFace Accelerate, Fully Sharded Data Parallel (FSDP), ZeRO-3
- GPU/TPU cluster management, distributed checkpointing, mixed precision (fp16, bfloat16), quantization-aware training
Data streaming at scale with WebDataset, TFRecord, Parquet
Model Optimization & Serving:
Quantization: GPTQ, AWQ, SmoothQuant, LLM.int8(), QLoRA
Compilers: TensorRT, Torch-TensorRT, TVM, ONNX Runtime, XLA, GGUF
Model serving: Triton Inference Server, vLLM, TGI, HuggingFace Text
Generation Inference
GenAI Systems:
RLHF pipelines: reward modeling, PPO, DPO, ORPO, RLAIF
Retrieval-Augmented Generation (RAG): hybrid semantic search, vector
DB integration, prompt compositionTokenizer development: SentencePiece, Tiktoken, Byte Pair Encoding
(BPE), Unigram LM
Software Engineering:
Clean, modular, testable Python code with CI/CD pipelines
Profiling tools: PyTorch Profiler, Nsight, nvtop, nvprof, memory profiler
Containerization: Docker, NVIDIA Container Toolkit, Kubernetes for distributed training
Mathematical Rigor:
- Optimization: AdamW, Lion, RMSProp, gradient clipping, learning rate schedulers
- Loss functions: Cross-entropy, KL divergence, cosine similarity, contrastive losses
- Strong command of linear algebra, probability, statistics, and numerical methods
Experience: 4+ years hands-on (Minimum 3+ years in GenAI & Transformers)
Job Type: Hybrid
Job Time: 3pm to 9pm from office and 11pm to 1 am from home
Location: DHA Phase 6 Lahore
About Us:
TEKHQS is a global technology solutions provider headquartered in Lake Forest, California, with an offshore team of 300+ experts based in Pakistan. We specialize in Web 2.0 (Web & Mobile App Development), Web 3.0 (Blockchain & Crypto Platform Development), AI/ML Solutions, and ERP services as a certified partner of SAP S/4HANA, Oracle NetSuite, and Microsoft Dynamics 365 Business Central. Our expertise includes implementation, training, customization, integration, support, IT staff augmentation, and certified ERP consultancy.