Senior Backend/Data Engineer (Python · GCP · Vertex AI) (Remote, PH based only)

Hamburg, Germany

Job Openings Senior Backend/Data Engineer (Python · GCP · Vertex AI) (Remote, PH based only)

About the job Senior Backend/Data Engineer (Python · GCP · Vertex AI) (Remote, PH based only)

We are looking for a highly skilled Senior Backend / Data Engineer (Python · GCP · Vertex AI) to work on pi-sentiment — an existing Python-based sentiment analysis and social data pipeline on Google Cloud for a remote Europe-based client. For this role, we only process candidates that are based in the Philippines and have legal authorization to work in the Philippines.

About the OTA Client

We build analytics tools for creators, influencers, and marketers. We pull data from Instagram, TikTok, Facebook, LinkedIn, X/Twitter, and YouTube, run AI-powered sentiment and keyword analysis on it, and serve it to users through dashboards. Small team, real users, real revenue.

The Role

This is not a greenfield role. The codebase, patterns, components, and infrastructure are already in place. Your work will be extending existing features, fixing bugs, and filling gaps — not designing systems from scratch. We need someone who can drop into an unfamiliar codebase, figure out how it works by reading the code, and start shipping within the first two weeks.

You'll work closely with our Senior Frontend Engineer — shipping schema changes, API contracts, and Supabase tables they consume — so you need to be comfortable reading a Next.js/TypeScript codebase and reasoning about how your data surfaces in the product.

How We Work

Autonomy is the default. We point you at an issue and expect you to own it end to end. We don't assign tasks step by step.

- Proactive communication is non-negotiable. If you're stuck, say so immediately — don't go quiet. A PO and another engineer are available for questions, and we expect you to use them.

What You'll Do

Core Responsibilities

Extend the Sentiment Pipeline: Work within the existing end-to-end flow — Supabase RPC Data Scraping ingestion BigQuery Vertex AI Batch sentiment_predictions — adding features and fixing bugs without breaking what works

Add & Maintain Platform Integrations: Extend existing Apify-based adapters across Instagram, TikTok, Facebook, YouTube, LinkedIn, X/Twitter — handling auth, rate limits, schema drift, and backfills

Ship Cloud Run Jobs: Modify and add containerized Python jobs following existing patterns — SIGTERM handling, structured logging, idempotent retries

Evolve Data Contracts: Change BigQuery schemas and Supabase tables/RPCs without breaking the frontend; coordinate migrations with the frontend engineer

Tune Models & Prompts: Iterate on Gemini structured outputs (Pydantic schemas, enums) to keep sentiment and keyword extraction accurate across languages and platforms

Benchmark & Evaluate: Use the existing benchmarking/ suite to compare model configs on cost, latency, and quality

Write Tests: Add pytest coverage for your changes — unit, integration, E2E where warranted

What We're Looking For

Required

Strong Python in production: type hints, Pydantic, pytest, clean module boundaries. Years matter less than evidence — show us code you've shipped.

GCP under load: Cloud Run, BigQuery, Cloud Storage. You've operated it, not just prototyped.

SQL that survives review: complex BigQuery or Postgres — window functions, partitioning, query optimization.

LLM integration in production: you've shipped a feature backed by Vertex AI, OpenAI, or Anthropic, and you know what structured outputs and prompt regressions feel like.

Cross-stack literacy: you can read a Next.js / TypeScript PR, understand what data it needs, and co-design the contract with our frontend engineer. Writing React is not required.

Proactive operator: you drive your own work, flag blockers fast, and don't wait to be assigned. See the "Not a fit if..." section below — we mean it.

Preferred

Vertex AI Gemini specifically (Batch Prediction, structured JSON output with enums)

Supabase / PostgreSQL with RLS, RPCs, migrations, multi-tenant patterns

Apify or similar ingestion platforms for social data

Data pipeline depth: idempotent backfills, schema evolution, cost engineering (BigQuery slots, batch vs. online)

Docker (multi-stage, slim) with Cloud Run parity

Observability that isn't print() — structured logging, Cloud Logging, Sentry

GitHub Actions for CI/CD

Multilingual NLP experience (our comments span many languages)

Terraform / IaC for GCP

This Role Is Not a Fit If...

Read this section carefully. If any of these describe you, please don't apply — you'll be unhappy and so will we.

You need detailed specs for every task. We hand you an issue and a codebase. Figuring out the "how" is the job. If you need a ticket broken into sub-steps before you can start, this isn't the role.

You wait to be checked in on. Nobody is going to DM you every morning to ask how it's going. You drive your own status updates, flag slippage early, and ask for review when you're ready.

You go silent when blocked. If you're stuck for more than a few hours and haven't said anything, that's a problem. Stuck is fine. Quiet is not. A PO and another engineer are one message away — use them.

You expect a long onboarding ramp. You should be opening small PRs in week one and shipping something meaningful by the end of week two. We'll help, but we won't hand-hold.

Technical Environment

Core Technologies

Language: Python 3.11+ (strict typing, Pydantic v2)

ML / LLM: Vertex AI Gemini (2.5-flash) with structured JSON output

Cloud: Google Cloud — Cloud Run Jobs, Cloud Scheduler, Cloud Storage, BigQuery, Vertex AI

Region: europe-west3 (EU-focused)

Data & Storage

Analytics warehouse: BigQuery (partitioned, clustered)

Operational DB: Supabase (PostgreSQL with RLS) — shared with the frontend

Ingestion: Apify (15+ social platform adapters), Data365 API

Batch ML: Vertex AI Batch Prediction (JSONL in/out via GCS)

Developer Experience

Package Manager: uv / pip

Formatting: black

Linting: flake8

Testing: pytest (unit, integration, E2E)

Secrets: dotenvx (encrypted environment files)

Containers: Docker Cloud Run Jobs

Version Control: GitHub with trunk-based development

Monitoring: Cloud Logging, Sentry

What the Frontend Looks Like (so you can collaborate)

You won't own this, but you'll read it and design data for it:

Framework: Next.js 15 (App Router) · React 19 · TypeScript (strict)

Data layer: Supabase client + TanStack Query

Auth: Supabase Auth (JWT, OAuth, RLS)

Charts/Tables: Visx, TanStack Table

What You Get

Real ML in production: Gemini with real cost, latency, and quality trade-offs — not prototypes

End-to-end ownership: from ingestion to the Supabase row the frontend reads, the whole path is yours

A small team, no silos: one PO, one frontend engineer, you. Decisions are fast because the room is small.

Remote and async. We don't care where you work or when, as long as you communicate and ship.

Learning budget for conferences and courses.

Our Engineering Principles

Type Safety First: Pydantic and type hints catch bugs at the boundary, not in production

Cost-Aware: Batch over online when it fits; measure before scaling

Observable: Structured logs, error tracking, and metrics ship with every job

Trunk-Based Development: Small, frequent PRs with feature flags over long-lived branches

Interview Process

Screen (15 min): Your background, what you've shipped, why this role.

Take-home (5-6 hours): Small ingestion BigQuery LLM-enrichment task on GCP. AI-assisted development is fine — we care about the decisions, not the keystrokes.

Code walk-through (60 min): Walk us through your solution. Expect pushback on trade-offs.

Pairing session (60 min): Open a real pi-sentiment issue together. We want to see how you read unfamiliar code and where you ask questions.

Offer: We move quickly for strong candidates.

Or refer someone