About the job Principal Engineer
Principal Engineer / Technical Executive Type
About Studio: Studio is an AI-powered creative platform where users generate, edit, and manage digital avatars and media. The platform serves a large userbase of creators across subscription tiers (Free/Pro/Max) with real-time job processing and advanced AI workflows. Self-hosted on DigitalOcean with 14+ production servers.
Focus: Everything. Systems architecture, AI integration, product delivery, infrastructure, and team leadership — all at once
Why this role: We're not looking for someone who leads from the whiteboard. We need a builder — someone who can mass-produce features from the UI layer down to the database, stand up infrastructure on a Friday night, prototype an AI pipeline on Monday, and refactor the billing system by Wednesday.
You have the architectural judgment of a CTO and the output of a senior IC who still loves writing code. You don't wait for specialists — you become the specialist for whatever the highest-priority problem is today.
What you'll do:
Ship features across the entire stack — React/Next.js frontend, NestJS backend, PostgreSQL data layer, Temporal workflows, GPU-backed AI pipelines — without waiting for handoffs or specialists
Make high-stakes architecture decisions in real-time: service boundaries, data modeling, caching strategy, infrastructure topology, and AI integration patterns — then implement them yourself
Own production reliability for a 14+ server deployment: diagnose and fix incidents across web servers, API servers, job workers, Temporal clusters, Redis, and PostgreSQL — you're comfortable SSHing into a box at 2 AM and figuring out why the queue is backed up
Drive AI integration from research to production: evaluate frontier models, prototype capabilities (image generation, voice synthesis, classification), build the serving infrastructure, and ship it behind production APIs with proper monitoring and fallbacks
Build and optimize data pipelines, training workflows, and inference infrastructure — you understand model serving, GPU utilization, and the tradeoffs between latency, throughput, and cost
Design and implement payment flows, subscription logic, webhook processing, and financial reconciliation — billing is one of our highest-risk systems and you won't shy away from it
Write the migration, the API endpoint, the React component, the Storybook story, the Playwright test, and the deploy script — in the same PR if needed
Establish engineering standards by example: your PRs set the bar for code quality, test coverage, and documentation
Mentor and unblock other engineers across every layer of the stack — you're the person everyone goes to when they're stuck
What we're looking for:
8+ years in software engineering with a track record of building and shipping complex systems — not managing from a distance, but personally writing the code that runs in production
True full-stack mastery: you've built production UIs (React/Next.js), production APIs (Node.js/NestJS or equivalent), managed production databases (PostgreSQL), and operated production infrastructure (Linux servers, systemd, networking)
Serious AI/ML chops: you've integrated AI models into production systems, understand inference pipelines, and can evaluate whether a frontier model is ready for your use case or needs fine-tuning. You follow the space obsessively and have strong opinions about what's real vs. hype
You've personally built systems that handle real money — payment processing, subscription management, or financial reconciliation. You understand the paranoia required when code moves dollars
Deep infrastructure instincts: you can diagnose a production outage by reading logs, traces, and metrics — connection pool exhaustion, queue backpressure, memory leaks, disk I/O stalls. You've operated systems at scale, not just deployed them
You've designed and built systems that scaled through significant growth — not theoretically, but you watched the dashboards while real traffic hit your code
Strong opinions on system design, loosely held. You can articulate trade-offs, you know when to take on tech debt vs. pay it down, and you change your mind when the evidence changes
You move fast. You're not afraid of large surface area — you see a problem across three layers and fix all three in one pass. You don't create tickets for work you can do today
Bonus: experience with Temporal or workflow orchestration, self-hosted infrastructure (not just cloud-managed), creator/media platforms, cryptopayments
Tech you'll work with (all of it): React 18, Next.js 14, TypeScript, Tailwind CSS, NestJS 11, Prisma, PostgreSQL 18, Redis 7, Temporal, Docker, Linux/systemd, DigitalOcean, Cloudflare R2, Python/PyTorch (AI pipelines), Authorize.net, Sentry, OpenTelemetry, GitHub Actions, Playwright, Storyboo
General Information:
Stack: TypeScript monorepo — React/Next.js frontend, NestJS backend, Prisma/ PostgreSQL, Redis, Temporal, Cloudflare R2
Work style:
- Small, high-ownership team — you'll own entire systems, not just tickets
- Pre-commit validation runs 19,600+ tests on every commit
- Self-hosted infrastructure (DigitalOcean), not serverless
- Staging -> production deployment flow with manual production gate is the problem
Ship working software over perfect architecture
Own your domain end-to-end (design -> implement -> deploy -> monitor)
Write tests that catch real bugs, not tests for coverage numbers
Communicate clearly when things break — incidents happen, silence