Job Openings
Senior Data Pipeline & Backend Engineer
About the job Senior Data Pipeline & Backend Engineer
Were looking for a Senior Data Pipeline Engineer who is equally strong in Airflow DAGs, modern data ingestion tools (Airbyte, Fivetran, custom connectors), ETL/ELT pipelines, and backend engineering. This person will own and scale the full data ingestion augmentation AI processing storage pipeline that powers Autoplays session analysis system.
Youll work across ingestion, transformation, augmentation, and backend systems reliability to ensure our pipeline is robust, scalable, observable, and cost-efficient.
What Youll Own
1. Data Pipeline Architecture & Management
- Design, orchestrate, and maintain our multi-source data pipeline (RRWeb events, analytics events, video frames, metadata).
- Manage and optimize Airflow DAGs (scheduling, retries, dependency management, error handling, backfilling).
- Integrate and scale Airbyte connectors to pull data from tools like PostHog, Mixpanel, Pendo, and custom APIs.
- Build high-reliability pipelines that can handle large, bursty session replay data.
2. Pipeline Reliability & Observability
- Implement end-to-end monitoring: logs, metrics, alerts, data quality checks, schema validations.
- Reduce pipeline failures and rate-limit issues (e.g., PostHog ingestion constraints).
- Introduce automatic retries, dead-letter queues, and backpressure strategies.
3. Backend Engineering
- Build and optimize backend services (Python/FastAPI, Node, etc.) that consume and expose pipeline outputs.
- Improve the performance of data storage (Postgres/Neon, vector DBs, GCS).
- Implement caching layers for metadata, summaries, and user-level insights.
4. Scalability & Performance
- Architect systems that can scale across:
- High-volume session replays
- Large embeddings
- JSON augmentation workloads
- Batch and real-time computation
- Identify bottlenecks and implement optimizations across the pipeline (I/O, compute, caching, parallelization).
5. Ownership of the Full Augmentation Flow
Directly manage all backend systems that produce:
- Augmented interactions
- Markdown summaries
- Session highlights
- User intent & frictions
- Session tags
- One-liner summaries
- Product sections
- User flow
- GCS output storage
Youll own this pipeline from ingestion augmentation storage.
Ideal Profile
Required Experience
- 5+ years in data engineering or backend engineering
- Deep experience with Apache Airflow, DAG design, and orchestration at scale
- Strong familiarity with Airbyte, ETL/ELT patterns, and connector configuration
- Strong Python engineering background (FastAPI / Django / async patterns)
- Experience processing large JSON datasets or high-volume event streams
- Proven track record of building scalable, cost-efficient, well-monitored data systems
- Familiarity with GCP (GCS, Cloud Run, Pub/Sub), AWS, or similar cloud environments
Nice to Have
- Experience with RRWeb or session replay data
- Background in AI/ML data pipelines
- Experience with vector databases, embeddings, or semantic search
- Understanding of clickstream analytics
- DevOps exposure (Docker, Terraform, CI/CD)
What Success Looks Like (First 90 Days)
- Pipeline reliability increases to 99%+ success rate
- Airflow DAGs are fully structured, well-documented, modular, and observable
- Rate limit issues with PostHog and others are solved via batching + queuing
- Airbyte pipelines are stable, monitored, and error-recoverable
- Backend bottlenecks (payload size, memory usage, API latency) are reduced
- Augmentation pipeline outputs are consistent, validated, and cached intelligently
- Youve shipped several major improvements to the data throughput and cost efficiency