Job Openings Senior Machine Learning / AI Engineer

About the job Senior Machine Learning / AI Engineer

About the Role

We in Tekhqs are seeking an elite ML/AI Engineer with deep theoretical and engineering mastery in deep learning, LLMs, and generative architectures. You must have trained, fine-tuned, benchmarked, and deployed transformer-based models at scale, understand the research behind every paper you cite, and have shipped production-grade ML systems across distributed GPU environments.
This is a zero-to-one role: no prebuilt dataset, no precleaned labels, no off-the-shelf pipelines. Just research, code, and compute.

Key Responsibilities

  • Designing and training foundation models (e.g., GPT, T5, Mistral, LLaMA) from scratch on multi-node GPU clusters.
  • Leading full-stack GenAI architecture: tokenizer design, attention variants, pretraining schemes, alignment (RLHF), quantization, serving.
  • Conducting frontier research in model optimization, mixture-of-experts (MoE), context length extrapolation, prompt tuning, and memory efficiency.
  • Implementing and extending SOTA methods from arXiv (e.g., Phi-3, Gemini, FlashAttention-2, GQA, LLaVA, Orca, DPO, ZeRO-Infinity).
  • Managing model lifecycle: data curation pretraining finetuning evaluation quantization deployment postmortems.
  • Working with multi-modal inputs (text, image, video, audio, embeddings) for cross-domain GenAI systems.
  • Driving production-grade optimization: low-latency inference, batching strategies, CUDA kernel debugging, memory offloading, model sharding.

Requirements

Core ML/DL:

  • Transformers, self-attention, residual connections, GELUs, normalization strategies (RMSNorm, LayerNorm, etc.)
  • LLM scaling laws, curriculum learning, token sampling strategies (Top-k, Top-p, Temperature, Mirostat, nucleus filtering)
  • Contrastive learning, masked modeling, autoregressive generation, denoising diffusion

Training Infrastructure:

  • PyTorch Lightning, DeepSpeed, HuggingFace Accelerate, Fully Sharded Data Parallel (FSDP), ZeRO-3
  • GPU/TPU cluster management, distributed checkpointing, mixed precision (fp16, bfloat16), quantization-aware training
  • Data streaming at scale with WebDataset, TFRecord, Parquet

Model Optimization & Serving:

  • Quantization: GPTQ, AWQ, SmoothQuant, LLM.int8(), QLoRA

  • Compilers: TensorRT, Torch-TensorRT, TVM, ONNX Runtime, XLA, GGUF

  • Model serving: Triton Inference Server, vLLM, TGI, HuggingFace Text

    Generation Inference

GenAI Systems:

  • RLHF pipelines: reward modeling, PPO, DPO, ORPO, RLAIF

  • Retrieval-Augmented Generation (RAG): hybrid semantic search, vector

    DB integration, prompt composition
  • Tokenizer development: SentencePiece, Tiktoken, Byte Pair Encoding

    (BPE), Unigram LM

Software Engineering:

  • Clean, modular, testable Python code with CI/CD pipelines

  • Profiling tools: PyTorch Profiler, Nsight, nvtop, nvprof, memory profiler

  • Containerization: Docker, NVIDIA Container Toolkit, Kubernetes for distributed training

Mathematical Rigor:

  • Optimization: AdamW, Lion, RMSProp, gradient clipping, learning rate schedulers
  • Loss functions: Cross-entropy, KL divergence, cosine similarity, contrastive losses
  • Strong command of linear algebra, probability, statistics, and numerical methods

Experience: 4+ years hands-on (Minimum 3+ years in GenAI & Transformers)

Job Type: Hybrid 

Job Time: 3pm to 9pm from office and 11pm to 1 am from home

Location: DHA Phase 6 Lahore

About Us:

TEKHQS is a global technology solutions provider headquartered in Lake Forest, California, with an offshore team of 300+ experts based in Pakistan. We specialize in Web 2.0 (Web & Mobile App Development), Web 3.0 (Blockchain & Crypto Platform Development), AI/ML Solutions, and ERP services as a certified partner of SAP S/4HANA, Oracle NetSuite, and Microsoft Dynamics 365 Business Central. Our expertise includes implementation, training, customization, integration, support, IT staff augmentation, and certified ERP consultancy.