Principal Engineer

Job Openings Principal Engineer

About the job Principal Engineer

Principal Engineer / Technical Executive Type

About Studio: Studio is an AI-powered creative platform where users generate, edit, and manage digital avatars and media. The platform serves a large userbase of creators across subscription tiers (Free/Pro/Max) with real-time job processing and advanced AI workflows. Self-hosted on DigitalOcean with 14+ production servers.

Focus: Everything. Systems architecture, AI integration, product delivery, infrastructure, and team leadership — all at once

Why this role: We're not looking for someone who leads from the whiteboard. We need a builder — someone who can mass-produce features from the UI layer down to the database, stand up infrastructure on a Friday night, prototype an AI pipeline on Monday, and refactor the billing system by Wednesday.

You have the architectural judgment of a CTO and the output of a senior IC who still loves writing code. You don't wait for specialists — you become the specialist for whatever the highest-priority problem is today.

What you'll do:

Ship features across the entire stack — React/Next.js frontend, NestJS backend, PostgreSQL data layer, Temporal workflows, GPU-backed AI pipelines — without waiting for handoffs or specialists

Make high-stakes architecture decisions in real-time: service boundaries, data modeling, caching strategy, infrastructure topology, and AI integration patterns — then implement them yourself

Own production reliability for a 14+ server deployment: diagnose and fix incidents across web servers, API servers, job workers, Temporal clusters, Redis, and PostgreSQL — you're comfortable SSHing into a box at 2 AM and figuring out why the queue is backed up

Drive AI integration from research to production: evaluate frontier models, prototype capabilities (image generation, voice synthesis, classification), build the serving infrastructure, and ship it behind production APIs with proper monitoring and fallbacks

Build and optimize data pipelines, training workflows, and inference infrastructure — you understand model serving, GPU utilization, and the tradeoffs between latency, throughput, and cost

Design and implement payment flows, subscription logic, webhook processing, and financial reconciliation — billing is one of our highest-risk systems and you won't shy away from it

Write the migration, the API endpoint, the React component, the Storybook story, the Playwright test, and the deploy script — in the same PR if needed

Establish engineering standards by example: your PRs set the bar for code quality, test coverage, and documentation

Mentor and unblock other engineers across every layer of the stack — you're the person everyone goes to when they're stuck

What we're looking for:

8+ years in software engineering with a track record of building and shipping complex systems — not managing from a distance, but personally writing the code that runs in production

True full-stack mastery: you've built production UIs (React/Next.js), production APIs (Node.js/NestJS or equivalent), managed production databases (PostgreSQL), and operated production infrastructure (Linux servers, systemd, networking)

Serious AI/ML chops: you've integrated AI models into production systems, understand inference pipelines, and can evaluate whether a frontier model is ready for your use case or needs fine-tuning. You follow the space obsessively and have strong opinions about what's real vs. hype

You've personally built systems that handle real money — payment processing, subscription management, or financial reconciliation. You understand the paranoia required when code moves dollars

Deep infrastructure instincts: you can diagnose a production outage by reading logs, traces, and metrics — connection pool exhaustion, queue backpressure, memory leaks, disk I/O stalls. You've operated systems at scale, not just deployed them

You've designed and built systems that scaled through significant growth — not theoretically, but you watched the dashboards while real traffic hit your code

Strong opinions on system design, loosely held. You can articulate trade-offs, you know when to take on tech debt vs. pay it down, and you change your mind when the evidence changes

You move fast. You're not afraid of large surface area — you see a problem across three layers and fix all three in one pass. You don't create tickets for work you can do today

Bonus: experience with Temporal or workflow orchestration, self-hosted infrastructure (not just cloud-managed), creator/media platforms, cryptopayments

Tech you'll work with (all of it): React 18, Next.js 14, TypeScript, Tailwind CSS, NestJS 11, Prisma, PostgreSQL 18, Redis 7, Temporal, Docker, Linux/systemd, DigitalOcean, Cloudflare R2, Python/PyTorch (AI pipelines), Authorize.net, Sentry, OpenTelemetry, GitHub Actions, Playwright, Storyboo

General Information:
Stack: TypeScript monorepo — React/Next.js frontend, NestJS backend, Prisma/ PostgreSQL, Redis, Temporal, Cloudflare R2

Work style:

Small, high-ownership team — you'll own entire systems, not just tickets
Pre-commit validation runs 19,600+ tests on every commit
Self-hosted infrastructure (DigitalOcean), not serverless
Staging -> production deployment flow with manual production gate is the problem

What we value:
Ship working software over perfect architecture
Own your domain end-to-end (design -> implement -> deploy -> monitor)
Write tests that catch real bugs, not tests for coverage numbers
Communicate clearly when things break — incidents happen, silence