About the job Staff Data Engineer
The client is the first chat-based shopping agent built exclusively for fashion. Designed to redefine how people search and discover fashion, the product offers a personalised, conversational experience powered by advanced AI and natural language understanding.
Backed by top-tier investors including Forerunner Ventures, Index Ventures, Google Ventures, and True Ventures, our team is committed to shaping the future of shopping.
About the role
Are you passionate about the intersection of high fashion and cutting-edge artificial intelligence? Are you passionate about building the data foundations that power truly intelligent systems? As a Staff Data Engineering, you will be a foundational member of the team, responsible for designing and building the entire data ecosystem that fuels our AI Personal Stylist. This is a unique opportunity to solve complex technical challenges while directly shaping a product that will revolutionize how people shop online.
What you'll do:
- Design, build, and optimize scalable, parallel data processing pipelines on Google Cloud to handle massive volumes of offline data.
- Implement and manage large-scale LLM batch inference jobs, processing millions of data points to enrich our product catalog with sophisticated, AI-generated attributes.
- Architect and own the data infrastructure for our Fashion Knowledge Graph, leveraging BigQuery and parallel data processing frameworks.
- Develop and maintain robust feature generation pipelines to craft high-quality signals for both the training and inference of our machine learning models.
- Orchestrate complex workflows of data processing jobs, implementing robust monitoring, alerting, and data quality validation systems to ensure reliability and trust in our data.
- Collaborate closely with data science and machine learning teams to understand data requirements and deliver production-grade data solutions.
- Champion engineering best practices, including writing clean, maintainable Python and SQL, and drive a culture of high-quality data and operational excellence.
Who you are:
- You have extensive experience building and deploying data solutions on a major cloud platform (preferable Google Cloud Platform)
- You are highly proficient with distributed data processing frameworks such as Apache Spark, Flink, or Polars.
- You possess exceptional Python coding skills, with a deep understanding of writing efficient, testable, and maintainable code for data applications.
- You have expert-level SQL skills and deep experience with modern cloud data warehouses like BigQuery, Snowflake, or Redshift.
- You have hands-on experience with workflow orchestration tools like Airflow, Argo or Kubeflow.
- You are a pragmatic and proactive builder who thrives in a fast-paced, autonomous startup environment, capable of driving projects from concept to production.
- You are an empathetic and collaborative teammate, skilled at communicating complex technical ideas and passionate about building the reliable infrastructure that empowers your colleagues.
- You are a natural leader who enjoys mentoring and developing teammates and aligning work to provide growth opportunities while ensuring priorities are aligned with broader company goals
Additional information:
- Work with some of the most dynamic US tech companies, building and iterating on new features and platforms.
- Long-term projects with real technical challenges.
- Fully remote work with flexible hours.
- Option to work from our office in Cluj-Napoca, Romania, if desired.
- Collaboration flexibility: We work with CIM/PFA/SRL contracts.
- 30 paid days off per year.
- We provide equipment as needed (laptop, desktop, etc.).
- Continuous learning: We sponsor career-improving courses, seminars, and certifications.
- Opportunity for annual business visits to the US, depending on project needs.
Get picky and choose a career that matches your mindset and lifestyle. Team up with a company that encourages you to do more and gives you the flexibility you need!