Bucharest, Romania
Data Developer (Python & PySpark)
Job Description:
We are looking for a Python & PySpark Data Developer to join our partner team in a hybrid setup (2-3 days per week on-site), bringing strong data engineering expertise and hands-on experience in building and maintaining ETL pipelines.
What You Will Do:
- Identify and select relevant data sources to be ingested into the data lake based on business and analytical requirements.
- Design and manage the structure and organization of data within the data lake to ensure accessibility, scalability, and optimal performance.
- Develop and maintain ETL pipelines using PySpark for data cleaning, transformation, and integration from multiple sources.
- Configure ETL components such as data formatting, deduplication, volumetric analysis, and enrichment, ensuring all processes are thoroughly documented.
- Contribute to defining and designing new use cases by identifying relevant data, transforming it, and preparing it for analysis.
- Develop and maintain interactive dashboards that visualize key metrics and insights derived from processed data.
- Monitor and troubleshoot data workflows to maintain reliability, scalability, and accuracy in production environments.
- Document pipeline logic, data sources, transformation rules, and operational flows for transparency and maintainability.
Technical Skills:
- Proficiency in Python and PySpark for data processing and pipeline development.
-
Strong SQL skills and experience working with relational databases.
-
Familiarity with data warehousing platforms (e.g., Cloudera).
- Understanding of data modeling, data lakes, and data governance principles.
- Experience with dashboarding tools (Power BI, Tableau, etc.) is a plus.
Preferred Qualifications:
-
Bachelor's or Master's degree in Computer Science, Data Science or
a related field. -
Fluent English is required; French is a plus but not mandatory.
-
Knowledge of business intelligence and data storytelling principles.
Required Skills:
Python