About the job AI Engineer
About Studio: Studio is an AI-powered creative platform where users generate, edit, and manage digital avatars and media. The platform serves a large userbase of creators across subscription tiers (Free/Pro/Max) with real-time job processing and advanced AI workflows. Self-hosted on DigitalOcean with 14+ production servers.
Focus: Classification systems, content tagging/organization, model training, and R&D into AI voice/audio and image intelligence
Why this role: Studio generates and manages massive volumes of AI-created media — avatars, images, audio, and video. We need an engineer who can build the intelligent layer on top of this content: classifying it, tagging it, organizing it, and making it searchable and useful at scale. We're also investing in R&D for AI voice and audio generation, and image understanding capabilities that will shape the next generation of the product.
What you'll do:
- Design, train, and deploy classification models for Studio's content pipeline — style detection, quality scoring, content moderation, filtering, and semantic categorization of generated media
- Build and maintain automated tagging and organization systems that structure our media library: extracting attributes, detecting visual features, clustering similar content, and enabling intelligent search and discovery
- Develop training data pipelines — annotation tooling, dataset curation, active learning loops, and quality assurance for labeled data
- Lead R&D into AI voice and audio generation — evaluate frontier models (voice cloning, text-to-speech, audio synthesis), prototype integrations, and develop a path from research to production-ready features
- Research and prototype image intelligence capabilities — face/body analysis, pose estimation, style transfer, image-to-image consistency, and visual similarity for avatar systems
- Build evaluation frameworks to measure classifier accuracy, generation quality, and model drift over time
- Optimize inference pipelines for cost and latency — batching, quantization, caching, and model serving strategies
- Integrate with our GPU compute infrastructure and ship models behind production APIs
What we're looking for:
- 3+ years building and deploying ML models in production, with emphasis on classification, tagging, or content understanding systems
- Hands-on experience training models — not just using APIs. You've curated datasets, experimented with architectures, tuned hyperparameters, and debugged training runs
- Strong experience with image classification and/or computer vision (CNNs, vision transformers, CLIP, or similar)
- Demonstrable interest or experience in voice/audio AI — text-to-speech, voice cloning, audio classification, or speech synthesis. This can be research, side projects, or production work
- Proficiency in Python with PyTorch or TensorFlow; comfort reading and adapting research code
- Experience building data labeling pipelines, annotation workflows, or active learning systems
- Understanding of model serving in production: REST APIs, batching, latency requirements, monitoring for drift
- Familiarity with embedding-based retrieval, vector search, or semantic similarity systems
Bonus: experience with diffusion models, GANs, or generative audio; published research; experience with ONNX/TensorRT for optimization
Tech you'll work with: Python, PyTorch, Cloud GPU clusters (EOK), REST/webhook APIs, TypeScript (for integration with NestJS backend), PostgreSQL (for metadata/labels), Redis, Temporal (for pipeline orchestration)