About the job Big Data & Cloud Data Engineer
Big Data & Cloud Data Engineer
Position Overview
We are seeking a Big Data & Cloud Data Engineer to design, implement, and manage large-scale data processing systems using big data technologies (Hadoop, Spark, Kafka) and cloud-based data ecosystems (Azure, GCP, AWS), enabling advanced analytics and real-time data processing capabilities across our enterprise.
Key Responsibilities
Big Data Platform Development
- Design and implement Hadoop ecosystems including HDFS, YARN, and distributed computing frameworks 
- Develop real-time and batch processing applications using Apache Spark (Scala, Python, Java) 
- Configure Apache Kafka for event streaming, data ingestion, and real-time data pipelines 
- Implement data processing workflows using Apache Airflow, Oozie, and workflow orchestration tools 
- Build NoSQL database solutions using HBase, Cassandra, and MongoDB for high-volume data storage 
Cloud Data Architecture
- Design multi-cloud data architectures using Azure Data Factory, AWS Glue, and Google Cloud Dataflow 
- Implement data lakes and lakehouses using Azure Data Lake, AWS S3, and Google Cloud Storage 
- Configure cloud-native data warehouses including Snowflake, BigQuery, and Azure Synapse Analytics 
- Build serverless data processing solutions using AWS Lambda, Azure Functions, and Google Cloud Functions 
- Implement containerized data applications using Docker, Kubernetes, and cloud container services 
Data Pipeline Engineering
- Develop ETL/ELT pipelines for structured and unstructured data processing 
- Create real-time streaming analytics using Kafka Streams, Apache Storm, and cloud streaming services 
- Implement data quality frameworks, monitoring, and alerting for production data pipelines 
- Build automated data ingestion from various sources including APIs, databases, and file systems 
- Design data partitioning, compression, and optimization strategies for performance 
Platform Administration & Optimization
- Manage cluster provisioning, scaling, and resource optimization across big data platforms 
- Monitor system performance, troubleshoot issues, and implement capacity planning strategies 
- Configure security frameworks including Kerberos, Ranger, and cloud IAM services 
- Implement backup, disaster recovery, and high availability solutions 
- Optimize query performance and implement data governance policies 
Required Qualifications
Technical Skills
- 5+ years experience with big data technologies (Hadoop, Spark, Kafka, Hive, HBase) 
- Strong programming skills in Python, Scala, Java, and SQL for data processing 
- Expert knowledge of at least one major cloud platform (Azure, AWS, GCP) and data services 
- Experience with containerization (Docker, Kubernetes) and infrastructure as code (Terraform, CloudFormation) 
- Proficiency in stream processing frameworks and real-time analytics architectures 
- Knowledge of data modeling, schema design, and database optimization techniques 
Data Engineering Skills
- Experience with data pipeline orchestration and workflow management tools 
- Strong understanding of distributed systems, parallel processing, and scalability patterns 
- Knowledge of data formats (Parquet, Avro, ORC) and serialization frameworks 
- Experience with version control, CI/CD pipelines, and DevOps practices for data platforms 
Preferred Qualifications
- Bachelor's degree in Computer Science, Data Engineering, or related field 
- Cloud certifications (Azure Data Engineer, AWS Data Analytics, Google Cloud Data Engineer) 
- Experience with machine learning platforms and MLOps frameworks 
- Background in data governance, data cataloging, and metadata management 
- Knowledge of emerging technologies (Delta Lake, Apache Iceberg, dbt)