Data Engineer (Databricks) - Porto (2 days/month on-site)

Porto, Portugal

Job Openings Data Engineer (Databricks) - Porto (2 days/month on-site)

About the job Data Engineer (Databricks) - Porto (2 days/month on-site)

ABOUT THE OPPORTUNITY

Join a global custom software solutions company with 40 years of experience delivering innovative data engineering solutions for clients worldwide. Position available for talented Data Engineers to work on modern data architectures supporting both traditional analytics and emerging AI/ML workloads. Operating in a collaborative, flat management structure where you're valued as an integral team member, you'll design and build scalable data pipelines using cutting-edge tools across AWS, Azure, and Databricks platforms. With exceptional flexibility requiring only 2 days per month in the Porto office, you'll enjoy outstanding work-life balance while accessing knowledge sharing, social events, catered lunches, and a culture that emphasizes continuous learning and recognition through awards and performance bonuses across 7 international hubs. This role offers exposure to modern data lakehouse architectures, real-time streaming, and MLOps practices while working alongside passionate professionals on challenging projects that transform raw data into actionable business insights.

PROJECT & CONTEXT

You'll play a pivotal role enabling data-driven decision-making by designing, building, and maintaining efficient ETL/ELT pipelines using Python, SQL, and Apache Spark. Your work will focus on implementing modern data architectures including Data Lakehouse and Medallion Architecture (Bronze/Silver/Gold layers), supporting both business reporting and advanced analytics use cases. Managing and optimizing cloud-based infrastructure on AWS and Azure, you'll ensure cost-effectiveness, performance, and scalability while processing massive data volumes. Responsibilities include implementing data governance and quality standards using frameworks like Great Expectations and Unity Catalog, ensuring data integrity and compliance across the data lifecycle. You'll collaborate closely with Data Scientists, AI Engineers, and Business Analysts to understand requirements and deliver high-quality datasets, while supporting MLOps practices for model deployment and monitoring. The role involves orchestrating complex workflows using Apache Airflow, handling real-time data streams with Kafka, managing Delta Lake features, and driving automation through CI/CD practices to continuously improve pipeline performance and reliability.

WHAT WE'RE LOOKING FOR (Required)

Data Engineering Experience: Minimum 4-5 years proven experience as Data Engineer building production data pipelines
Python Proficiency: Strong programming skills in Python for data manipulation, scripting, and pipeline development
Apache Spark: Extensive hands-on experience with Apache Spark (PySpark) for batch and streaming data processing
Workflow Orchestration: Proficiency with Apache Airflow for scheduling and managing complex data workflows
SQL Expertise: Expert-level SQL skills for data analysis, transformation, and optimization
Big Data Formats: Deep experience with Parquet, Avro, and Delta Lake file formats and their optimization
Data Lake Design: Proven experience designing Data Lakes and implementing Medallion Architecture patterns
Real-Time Streaming: Hands-on experience with Apache Kafka or similar platforms for real-time data processing
Version Control: Proficiency with Git for collaborative development and code management
Data Quality: Experience implementing data quality frameworks like Great Expectations or Soda
AWS Storage: Deep knowledge of Amazon S3 for data lake storage, lifecycle policies, and security configurations
Cloud Platforms: Practical experience with AWS and/or Azure cloud-based data infrastructure
Data Modeling: Strong understanding of data modeling principles and dimensional design
DevOps Practices: Familiarity with CI/CD pipelines for data infrastructure deployment
Language: B2 English (Upper Intermediate) minimum - entire interview process conducted in English with solid proficiency required
Location: Based in Porto/Northern Portugal region with availability for 2 on-site days per month

NICE TO HAVE (Preferred)

AWS Services: AWS Glue (Crawlers, Jobs, Data Catalog), Lake Formation, Kinesis (Data Streams, Firehose), Lambda, IAM, CloudWatch
Azure Services: Azure Data Lake Storage Gen2, Azure Data Factory, Azure Synapse Analytics, Microsoft Purview, Event Hubs, Azure Stream Analytics
Databricks Platform: Workspace management, Unity Catalog, Databricks Jobs, Delta Live Tables (DLT), cluster optimization
Delta Lake Advanced: Time travel, schema enforcement, optimization techniques, ACID transactions
MLOps Tools: MLflow for experiment tracking and model registry, supporting ML model deployment
AI/ML Context: Exposure to Generative AI concepts (LLMs, RAG, Vector Search) and data requirements for AI workloads
Mosaic AI: Experience with Model Serving, Vector Search, AI Gateway for LLM workloads
Alternative Languages: Scala or Java programming experience
Alternative Orchestration: Prefect, Dagster, or Azure Data Factory experience
BI Integration: Experience serving data to Power BI, Tableau, or Looker
Data Governance: Understanding of data lineage, security principles, and compliance requirements
Streaming Analytics: Azure Stream Analytics or advanced Kafka Streams processing

Certifications (Advantageous):

Databricks Certified Data Engineer Professional or Associate
AWS Certified Data Engineer Associate (DEA-C01) or Solutions Architect Associate
Microsoft Certified: Azure Data Engineer Associate (DP-203) or Azure Solutions Architect Expert

Location: Porto, Portugal (2 days/month on-site)

Or refer someone