Founding Data Engineer

San Francisco, California, United States

$ 100,000.00 - 200,000.00 (US Dollar)

Or refer someone

Job Openings Founding Data Engineer

About the job Founding Data Engineer

Title: Founding Data Engineer

Location: Hybrid in San Francisco, New York City or Vancouver

Work type: 3 days in office, 2 days remote

About Our Client

Our client is a fast-growing startup building an AI-powered revenue platform designed to transform how sales teams operate. Founded in 2023 after extensive research, the companys mission is to remove the friction that slows sellers down and give them more time to focus on real customer conversations. Backed by recent seed funding and already delivering measurable results for early customers, they are shaping the future of high-performing sales organizations.

As a Founding Data Engineer, youll take ownership of the companys data platform and play a pivotal role in shaping its foundation. Youll be the steward of the data layer, ensuring AI and ML models are fueled with clean, structured, and high-quality data.

This is a unique opportunity to join a fast-growing, AI-driven sales intelligence startup at an early stage and make a lasting impact. Youll design and build scalable data ingestion, transformation, and storage solutions, working directly with both structured and unstructured data from sources like CRMs, public datasets, and web scraping.

In this role, youll also collaborate closely with ML and AI teams to ensure the right data is delivered at the right time, enabling models to perform at their best.

Why This Role?

High ownership: You'll be responsible for designing, maintaining, and evolving our data platform.
Be the expert: You'll shape how data is structured, transformed, and optimized for ML models.
Direct impact: Your work will power AI-driven sales recommendations for enterprise users.

Responsibilities

Own and maintain scalable data pipelines using Python, SQL, Airflow, and Spark (Databricks).
Develop data ingestion strategies using APIs, Airbyte, and web scraping.
Transform and clean data for ML models using Databricks (or Spark-based systems).
Optimize storage layers using a Medallion architecture (Bronze/Silver/Gold) approach.
Ensure data quality, governance, and observability across all pipelines.
Collaborate with ML, AI, and backend teams to integrate data into AI models.
Continuously refine and improve how data is structured, stored, and served.

What Were Looking For

5+ years of experience in data engineering with strong Python & SQL expertise.
Hands-on experience with Airflow, ETL pipelines, and Spark (Databricks preferred).
Experience integrating structured & unstructured data from APIs, CRMs, and web sources.
Ability to own and scale data infrastructure in a fast-growing AI-driven company.
Strong problem-solving skills and a desire to improve how data is structured for ML.

Bonus Points

Exposure to Golang for API development (not required, but helpful).
Experience with MLOps (feature stores, model data versioning, SageMaker, ClearML).
Familiarity with Terraform, Kubernetes, or data pipeline automation.
Experience in database design to support customer-facing access patterns

Or refer someone