Job Openings Associate Architect - Data Engineering

About the job Associate Architect - Data Engineering

Overview

We are seeking a skilled Associate Architect in Data Engineering to join our team. This role will focus on designing, implementing, and optimizing data engineering solutions with a strong emphasis on Hadoop and Cloudera ecosystems. The ideal candidate will also bring expertise in cloud platforms, including Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure to support our data infrastructure and analytics initiatives.

Key Responsibilities

  • Architecture Design: Design and implement scalable, high-performance data architectures using Hadoop and Cloudera to support big data processing and analytics.
  • Data Pipeline Development: Build and maintain robust data pipelines for ingesting, processing, and transforming large-scale data sets using tools such as Apache Spark, Hive, and Impala within the Cloudera ecosystem.
  • Cloud Integration: Leverage cloud platforms (GCP, AWS, Azure) to deploy and manage data solutions, ensuring seamless integration with on-premises Hadoop/Cloudera environments.
  • Optimization and Performance Tuning: Optimize Hadoop and Cloudera clusters for performance, scalability, and cost efficiency, including resource management with YARN and data storage optimization.
  • Data Security and Governance: Implement best practices for data security and governance policies, including access controls, encryption, and compliance with regulatory standards on Hadoop and cloud platforms.
  • Monitoring and Maintenance: Monitor data pipelines and systems, troubleshoot issues, and ensure high availability and reliability of the data infrastructure.
  • Cloud Migration and Hybrid Solutions: Support migration of on-premises data workloads to cloud environments (GCP, AWS, Azure), and architect hybrid solutions as needed.
  • Collaboration: Work closely with Data Science Architects, Project Managers, and other stakeholders to understand requirements and deliver data solutions that meet business needs.
  • Documentation and Training: Document architectural designs, processes, and configurations, and provide training to team members on best practices.

Required Skills and Qualifications

  • Experience: 5+ years of experience in data engineering, with at least 2-3 years of focus on Hadoop and Cloudera platforms.
  • Hadoop Ecosystem: Strong expertise in Hadoop components (HDFS, MapReduce, YARN) and Cloudera tools (CDH, CDP, Impala, Hive, HBase, Sqoop, Oozie).
  • Cloud Platforms: Hands-on experience with data services in GCP (BigQuery, Dataflow, Dataproc), AWS (EMR, Redshift, Glue), and Azure (Synapse Analytics, Data Lake, Databricks).
  • Programming: Proficiency in programming languages such as Python, Java, or Scala for data processing and pipeline development.
  • Big Data Tools: Experience with Apache Spark, Kafka, and Flume for real-time and batch data processing.
  • Data Modeling: Solid hands-on experience in data modeling techniques for relational and NoSQL databases, including schema design for large-scale data lakes.
  • Cloud Architecture: Understanding of cloud architecture principles, including serverless computing, containerization (e.g., Kubernetes, Docker), and Infrastructure as Code (IaC) tools like Terraform.
  • Security and Compliance: Familiarity with data security practices, including Kerberos, Ranger, and cloud native security tools (e.g., AWS IAM, Azure AD, GCP IAM).
  • Problem-Solving: Strong analytical and problem solving skills to address complex data engineering challenges.
  • Communication: Excellent communication and collaboration skills to work effectively with cross-functional teams.