Senior AI/ML Engineer

Lahore, Pakistan

Job Openings Senior AI/ML Engineer

About the job Senior AI/ML Engineer

About the Role

We in Tekhqs are seeking an elite ML/AI Engineer with deep theoretical and engineering mastery in deep learning, LLMs, and generative architectures. You must have trained, fine-tuned, benchmarked, and deployed transformer-based models at scale, understand the research behind every paper you cite, and have shipped production-grade ML systems across distributed GPU environments.
This is a zero-to-one role: no prebuilt dataset, no precleaned labels, no off-the-shelf pipelines. Just research, code, and compute.

Key Responsibilities

Designing and training foundation models (e.g., GPT, T5, Mistral, LLaMA) from scratch on multi-node GPU clusters.
Leading full-stack GenAI architecture: tokenizer design, attention variants, pretraining schemes, alignment (RLHF), quantization, serving.
Conducting frontier research in model optimization, mixture-of-experts (MoE), context length extrapolation, prompt tuning, and memory efficiency.
Implementing and extending SOTA methods from arXiv (e.g., Phi-3, Gemini, FlashAttention-2, GQA, LLaVA, Orca, DPO, ZeRO-Infinity).
Managing model lifecycle: data curation pretraining finetuning evaluation quantization deployment postmortems.
Working with multi-modal inputs (text, image, video, audio, embeddings) for cross-domain GenAI systems.
Driving production-grade optimization: low-latency inference, batching strategies, CUDA kernel debugging, memory offloading, model sharding.

Requirements

Core ML/DL:

Transformers, self-attention, residual connections, GELUs, normalization strategies (RMSNorm, LayerNorm, etc.)
LLM scaling laws, curriculum learning, token sampling strategies (Top-k, Top-p, Temperature, Mirostat, nucleus filtering)
Contrastive learning, masked modeling, autoregressive generation, denoising diffusion

Training Infrastructure:

PyTorch Lightning, DeepSpeed, HuggingFace Accelerate, Fully Sharded Data Parallel (FSDP), ZeRO-3
GPU/TPU cluster management, distributed checkpointing, mixed precision (fp16, bfloat16), quantization-aware training
Data streaming at scale with WebDataset, TFRecord, Parquet

Model Optimization & Serving:

Quantization: GPTQ, AWQ, SmoothQuant, LLM.int8(), QLoRA
Compilers: TensorRT, Torch-TensorRT, TVM, ONNX Runtime, XLA, GGUF
Model serving: Triton Inference Server, vLLM, TGI, HuggingFace Text
Generation Inference

GenAI Systems:

RLHF pipelines: reward modeling, PPO, DPO, ORPO, RLAIF
Retrieval-Augmented Generation (RAG): hybrid semantic search, vector
DB integration, prompt composition
Tokenizer development: SentencePiece, Tiktoken, Byte Pair Encoding
(BPE), Unigram LM

Software Engineering:

Clean, modular, testable Python code with CI/CD pipelines
Profiling tools: PyTorch Profiler, Nsight, nvtop, nvprof, memory profiler
Containerization: Docker, NVIDIA Container Toolkit, Kubernetes for distributed training

Mathematical Rigor:

Optimization: AdamW, Lion, RMSProp, gradient clipping, learning rate schedulers
Loss functions: Cross-entropy, KL divergence, cosine similarity, contrastive losses
Strong command of linear algebra, probability, statistics, and numerical methods

Experience: 4+ years hands-on (Minimum 3+ years in GenAI & Transformers)

Job Type: Hybrid

Job Time: 3pm to 9pm from office and 11pm to 1 am from home

Location: DHA Phase 6 Lahore

About Us:

TEKHQS is a global technology solutions provider headquartered in Lake Forest, California, with an offshore team of 300+ experts based in Pakistan. We specialize in Web 2.0 (Web & Mobile App Development), Web 3.0 (Blockchain & Crypto Platform Development), AI/ML Solutions, and ERP services as a certified partner of SAP S/4HANA, Oracle NetSuite, and Microsoft Dynamics 365 Business Central. Our expertise includes implementation, training, customization, integration, support, IT staff augmentation, and certified ERP consultancy.

Or refer someone