About the job Senior Backend/Data Engineer (Python · GCP · Vertex AI) (Remote, PH based only)
We are looking for a highly skilled Senior Backend / Data Engineer (Python · GCP · Vertex AI) to work on pi-sentiment — an existing Python-based sentiment analysis and social data pipeline on Google Cloud for a remote Europe-based client. For this role, we only process candidates that are based in the Philippines and have legal authorization to work in the Philippines.
About the OTA Client
We build analytics tools for creators, influencers, and marketers. We pull data from Instagram, TikTok, Facebook, LinkedIn, X/Twitter, and YouTube, run AI-powered sentiment and keyword analysis on it, and serve it to users through dashboards. Small team, real users, real revenue.
The Role
This is not a greenfield role. The codebase, patterns, components, and infrastructure are already in place. Your work will be extending existing features, fixing bugs, and filling gaps — not designing systems from scratch. We need someone who can drop into an unfamiliar codebase, figure out how it works by reading the code, and start shipping within the first two weeks.
You'll work closely with our Senior Frontend Engineer — shipping schema changes, API contracts, and Supabase tables they consume — so you need to be comfortable reading a Next.js/TypeScript codebase and reasoning about how your data surfaces in the product.
How We Work
- Autonomy is the default. We point you at an issue and expect you to own it end to end. We don't assign tasks step by step.
- Proactive communication is non-negotiable. If you're stuck, say so immediately — don't go quiet. A PO and another engineer are available for questions, and we expect you to use them.
What You'll Do
Core Responsibilities
- Extend the Sentiment Pipeline: Work within the existing end-to-end flow — Supabase RPC Data Scraping ingestion BigQuery Vertex AI Batch sentiment_predictions — adding features and fixing bugs without breaking what works
- Add & Maintain Platform Integrations: Extend existing Apify-based adapters across Instagram, TikTok, Facebook, YouTube, LinkedIn, X/Twitter — handling auth, rate limits, schema drift, and backfills
- Ship Cloud Run Jobs: Modify and add containerized Python jobs following existing patterns — SIGTERM handling, structured logging, idempotent retries
- Evolve Data Contracts: Change BigQuery schemas and Supabase tables/RPCs without breaking the frontend; coordinate migrations with the frontend engineer
- Tune Models & Prompts: Iterate on Gemini structured outputs (Pydantic schemas, enums) to keep sentiment and keyword extraction accurate across languages and platforms
- Benchmark & Evaluate: Use the existing benchmarking/ suite to compare model configs on cost, latency, and quality
- Write Tests: Add pytest coverage for your changes — unit, integration, E2E where warranted
What We're Looking For
Required
- Strong Python in production: type hints, Pydantic, pytest, clean module boundaries. Years matter less than evidence — show us code you've shipped.
- GCP under load: Cloud Run, BigQuery, Cloud Storage. You've operated it, not just prototyped.
- SQL that survives review: complex BigQuery or Postgres — window functions, partitioning, query optimization.
- LLM integration in production: you've shipped a feature backed by Vertex AI, OpenAI, or Anthropic, and you know what structured outputs and prompt regressions feel like.
- Cross-stack literacy: you can read a Next.js / TypeScript PR, understand what data it needs, and co-design the contract with our frontend engineer. Writing React is not required.
- Proactive operator: you drive your own work, flag blockers fast, and don't wait to be assigned. See the "Not a fit if..." section below — we mean it.
Preferred
- Vertex AI Gemini specifically (Batch Prediction, structured JSON output with enums)
- Supabase / PostgreSQL with RLS, RPCs, migrations, multi-tenant patterns
- Apify or similar ingestion platforms for social data
- Data pipeline depth: idempotent backfills, schema evolution, cost engineering (BigQuery slots, batch vs. online)
- Docker (multi-stage, slim) with Cloud Run parity
- Observability that isn't print() — structured logging, Cloud Logging, Sentry
- GitHub Actions for CI/CD
- Multilingual NLP experience (our comments span many languages)
- Terraform / IaC for GCP
This Role Is Not a Fit If...
Read this section carefully. If any of these describe you, please don't apply — you'll be unhappy and so will we.
- You need detailed specs for every task. We hand you an issue and a codebase. Figuring out the "how" is the job. If you need a ticket broken into sub-steps before you can start, this isn't the role.
- You wait to be checked in on. Nobody is going to DM you every morning to ask how it's going. You drive your own status updates, flag slippage early, and ask for review when you're ready.
- You go silent when blocked. If you're stuck for more than a few hours and haven't said anything, that's a problem. Stuck is fine. Quiet is not. A PO and another engineer are one message away — use them.
- You expect a long onboarding ramp. You should be opening small PRs in week one and shipping something meaningful by the end of week two. We'll help, but we won't hand-hold.
Technical Environment
Core Technologies
- Language: Python 3.11+ (strict typing, Pydantic v2)
- ML / LLM: Vertex AI Gemini (2.5-flash) with structured JSON output
- Cloud: Google Cloud — Cloud Run Jobs, Cloud Scheduler, Cloud Storage, BigQuery, Vertex AI
- Region: europe-west3 (EU-focused)
Data & Storage
- Analytics warehouse: BigQuery (partitioned, clustered)
- Operational DB: Supabase (PostgreSQL with RLS) — shared with the frontend
- Ingestion: Apify (15+ social platform adapters), Data365 API
- Batch ML: Vertex AI Batch Prediction (JSONL in/out via GCS)
Developer Experience
- Package Manager: uv / pip
- Formatting: black
- Linting: flake8
- Testing: pytest (unit, integration, E2E)
- Secrets: dotenvx (encrypted environment files)
- Containers: Docker Cloud Run Jobs
- Version Control: GitHub with trunk-based development
- Monitoring: Cloud Logging, Sentry
What the Frontend Looks Like (so you can collaborate)
You won't own this, but you'll read it and design data for it:
- Framework: Next.js 15 (App Router) · React 19 · TypeScript (strict)
- Data layer: Supabase client + TanStack Query
- Auth: Supabase Auth (JWT, OAuth, RLS)
- Charts/Tables: Visx, TanStack Table
What You Get
- Real ML in production: Gemini with real cost, latency, and quality trade-offs — not prototypes
- End-to-end ownership: from ingestion to the Supabase row the frontend reads, the whole path is yours
- A small team, no silos: one PO, one frontend engineer, you. Decisions are fast because the room is small.
- Remote and async. We don't care where you work or when, as long as you communicate and ship.
- Learning budget for conferences and courses.
Our Engineering Principles
- Type Safety First: Pydantic and type hints catch bugs at the boundary, not in production
- Cost-Aware: Batch over online when it fits; measure before scaling
- Observable: Structured logs, error tracking, and metrics ship with every job
- Trunk-Based Development: Small, frequent PRs with feature flags over long-lived branches
Interview Process
- Screen (15 min): Your background, what you've shipped, why this role.
- Take-home (5-6 hours): Small ingestion BigQuery LLM-enrichment task on GCP. AI-assisted development is fine — we care about the decisions, not the keystrokes.
- Code walk-through (60 min): Walk us through your solution. Expect pushback on trade-offs.
- Pairing session (60 min): Open a real pi-sentiment issue together. We want to see how you read unfamiliar code and where you ask questions.
- Offer: We move quickly for strong candidates.