Job Openings
Associate Architect - Data Engineering
About the job Associate Architect - Data Engineering
Key Responsibilities
- Design and implement scalable, high-performance data architectures using Hadoop and Cloudera to support big data processing and analytics
- Build and maintain robust data pipelines for ingesting, processing, and transforming large-scale data sets using tools such as Apache Spark, Hive, and Impala within the Cloudera ecosystem
- Leverage cloud platforms (GCP, AWS, Azure) to deploy and manage data solutions, ensuring seamless integration with on-premises Hadoop/Cloudera environments
- Optimize Hadoop and Cloudera clusters for performance, scalability, and cost efficiency, including resource management with YARN and data storage optimization
- Implement best practices for data security and governance policies, including access controls, encryption, and compliance with regulatory standards on Hadoop and cloud platforms
- Monitor data pipelines and systems, troubleshoot issues, and ensure high availability and reliability of the data infrastructure
- Support migration of on-premises data workloads to cloud environments (GCP, AWS, Azure), and architect hybrid solutions as needed
- Work closely with Data Science Architects, Project Managers, and other stakeholders to understand requirements and deliver data solutions that meet business needs
- Document architectural designs, processes, and configurations, and provide training to team members on best practices
Person Specifications
- 05+ years of experience in data engineering, with at least 2-3 years of focus on Hadoop and Cloudera platforms
- Strong expertise in Hadoop components (HDFS, MapReduce, YARN) and Cloudera tools (CDH, CDP, Impala, Hive, HBase, Sqoop, Oozie)
- Hands-on experience with data services in GCP (BigQuery, Dataflow, Dataproc), AWS (EMR, Redshift, Glue), and Azure (Synapse Analytics, Data Lake, Databricks)
- Proficiency in programming languages such as Python, Java, or Scala for data processing and pipeline development
- Experience with Apache Spark, Kafka, and Flume for real-time and batch data processing
- Solid hands-on experience in data modeling techniques for relational and NoSQL databases, including schema design for large-scale data lakes
- Understanding of cloud architecture principles, including serverless computing, containerization (e.g., Kubernetes, Docker), and Infrastructure as Code (IaC) tools like Terraform
- Familiarity with data security practices, including Kerberos, Ranger, and cloud native security tools (e.g., AWS IAM, Azure AD, GCP IAM)
- Strong analytical and problem-solving skills to address complex data engineering challenges
- Excellent communication and collaboration skills to work effectively with cross-functional teams