Staff DevOps Engineer

San Francisco, California, United States

Job Openings Staff DevOps Engineer

About the job Staff DevOps Engineer

At Cube, we're redefining how organizations deliver, consume, and automate data and analytics across teams, tools, and AI agents. Our mission is to enable Agentic Analytics, where AI agents work alongside humans on a shared semantic foundation

If you're fascinated by building core data and AI infrastructure, the kind that powers analytics at the world's most advanced technology companies, but want the agility and ownership of a startup, Cube is where you'll thrive.

With 19,000+ GitHub stars and 13,000+ community members, Cube is trusted by 400+ companies, including Maersk, Kimberly-Clark, Freshworks, Patagonia, Webflow, Brex, Deel, Tubi, Walmart, and Drata. Our platform empowers AI agents with a universal semantic foundation, enabling autonomous analytics at scale while maintaining consistency, security, and performance across BI tools, spreadsheets, and embedded applications.

As a Staff DevOps Engineer at Cube, you will set the technical direction for the infrastructure that runs Cube Cloud and the agentic analytics platform behind it. You'll own complex, high-impact initiatives end-to-end, from architecture to rollout, partnering with engineering leaders across the company and mentoring other engineers along the way.

You will collaborate closely with engineering, security, and core platform teams to evolve our multi-cloud architecture, harden our agentic analytics runtime for production, and turn the toughest operational challenges into elegant, automated systems.

Some of the problems you'll be working on:

Cube Cloud multi-cloud PaaS infrastructure. Cube Cloud delivers our agentic analytics platform as a managed service at production scale across AWS, GCP, and Azure. It's a sophisticated cloud-in-cloud architecture serving real-time analytics and AI workloads for enterprise customers, and it comes with a wide range of infrastructure challenges around isolation, performance, cost, and global availability that you'll help shape at a staff level.

Infrastructure for agentic analytics and AI workloads. Cube's platform powers AI agents that query the semantic layer through Semantic SQL, with strict guarantees around governance, traceability, and latency. You'll design the infrastructure that makes these AI-driven workloads fast, reliable, and safe, including LLM integrations, caching layers, pre-aggregations, and the observability story around them.

Hybrid and self-hosted deployments for enterprise. While Cube Cloud is our primary delivery model, larger enterprise customers can run Cube partially or fully inside their own cloud environments. Supporting PaaS, hybrid, and self-hosted simultaneously brings unique challenges around packaging, upgrades, security, and observability, and you'll be driving the design decisions that make all three options first-class.

Developer productivity and platform engineering. As Cube grows, developer velocity becomes one of the most critical factors of success. You'll lead improvements to our CI/CD, build and release systems, ephemeral and staging environments, and internal developer platform, making sure nothing stands in the way of high-quality product being delivered to Cube's customers.

Security, compliance, and reliability at scale. Cube is trusted with mission-critical data by SOC 2 / HIPAA-grade customers. You'll lead initiatives across IAM, network security, secrets management, audit, incident response, and SLO practice, embedding these into the platform rather than bolting them on, and raising the bar for the rest of the engineering team.

Requirements

Deep understanding of major cloud environments (AWS, GCP, Azure), including networking, IAM, and managed services at scale.
Strong expertise with Kubernetes, operating, upgrading, and tuning production clusters in multi-tenant environments.
Strong experience with IaC tools such as Terraform, Pulumi, or similar, and modern GitOps workflows.
Solid background in designing and operating CI/CD systems and internal developer platforms.
Ability to write production-quality code in TypeScript/JavaScript, Python, Go, or similar.
Track record of leading large infrastructure initiatives end-to-end, influencing technical strategy, and mentoring other engineers.
Strong grasp of observability, incident response, and reliability engineering for distributed systems.
Good communication skills.
Previous startup experience or a genuine interest in working in a fast-moving company with a high level of ownership.

Bonus points

Strong knowledge of TypeScript and experience integrating with Node.js-based services.
Hands-on experience with Pulumi.
Experience writing code in Rust (Cube's core query engine is written in Rust).
Experience operating multi-tenant SaaS platforms and supporting self-hosted / BYOC deployments.
Experience running infrastructure for AI/LLM workloads or building MLOps tooling.
Background in data engineering, analytics applications, or OLAP systems.
Compliance experience (SOC 2, HIPAA, ISO 27001).

We're a fully remote company based in San Francisco, hiring for this role in the United States. You can work from anywhere in the US and be part of a fast-moving, product-driven team.

Or refer someone