AI Engineer

Job Openings AI Engineer

About the job AI Engineer

About Studio: Studio is an AI-powered creative platform where users generate, edit, and manage digital avatars and media. The platform serves a large userbase of creators across subscription tiers (Free/Pro/Max) with real-time job processing and advanced AI workflows. Self-hosted on DigitalOcean with 14+ production servers.

Focus: Classification systems, content tagging/organization, model training, and R&D into AI voice/audio and image intelligence

Why this role: Studio generates and manages massive volumes of AI-created media — avatars, images, audio, and video. We need an engineer who can build the intelligent layer on top of this content: classifying it, tagging it, organizing it, and making it searchable and useful at scale. We're also investing in R&D for AI voice and audio generation, and image understanding capabilities that will shape the next generation of the product.

What you'll do:

Design, train, and deploy classification models for Studio's content pipeline — style detection, quality scoring, content moderation, filtering, and semantic categorization of generated media
Build and maintain automated tagging and organization systems that structure our media library: extracting attributes, detecting visual features, clustering similar content, and enabling intelligent search and discovery
Develop training data pipelines — annotation tooling, dataset curation, active learning loops, and quality assurance for labeled data
Lead R&D into AI voice and audio generation — evaluate frontier models (voice cloning, text-to-speech, audio synthesis), prototype integrations, and develop a path from research to production-ready features
Research and prototype image intelligence capabilities — face/body analysis, pose estimation, style transfer, image-to-image consistency, and visual similarity for avatar systems
Build evaluation frameworks to measure classifier accuracy, generation quality, and model drift over time
Optimize inference pipelines for cost and latency — batching, quantization, caching, and model serving strategies
Integrate with our GPU compute infrastructure and ship models behind production APIs

What we're looking for:

3+ years building and deploying ML models in production, with emphasis on classification, tagging, or content understanding systems
Hands-on experience training models — not just using APIs. You've curated datasets, experimented with architectures, tuned hyperparameters, and debugged training runs
Strong experience with image classification and/or computer vision (CNNs, vision transformers, CLIP, or similar)
Demonstrable interest or experience in voice/audio AI — text-to-speech, voice cloning, audio classification, or speech synthesis. This can be research, side projects, or production work
Proficiency in Python with PyTorch or TensorFlow; comfort reading and adapting research code
Experience building data labeling pipelines, annotation workflows, or active learning systems
Understanding of model serving in production: REST APIs, batching, latency requirements, monitoring for drift
Familiarity with embedding-based retrieval, vector search, or semantic similarity systems

Bonus: experience with diffusion models, GANs, or generative audio; published research; experience with ONNX/TensorRT for optimization

Tech you'll work with: Python, PyTorch, Cloud GPU clusters (EOK), REST/webhook APIs, TypeScript (for integration with NestJS backend), PostgreSQL (for metadata/labels), Redis, Temporal (for pipeline orchestration)