Job Openings Associate Architect - Data Engineering

About the job Associate Architect - Data Engineering

Key Responsibilities

  • Design and implement scalable, high-performance data architectures using Hadoop and Cloudera to support big data processing and analytics
  • Build and maintain robust data pipelines for ingesting, processing, and transforming large-scale data sets using tools such as Apache Spark, Hive, and Impala within the Cloudera ecosystem
  • Leverage cloud platforms (GCP, AWS, Azure) to deploy and manage data solutions, ensuring seamless integration with on-premises Hadoop/Cloudera environments
  • Optimize Hadoop and Cloudera clusters for performance, scalability, and cost efficiency, including resource management with YARN and data storage optimization
  • Implement best practices for data security and governance policies, including access controls, encryption, and compliance with regulatory standards on Hadoop and cloud platforms
  • Monitor data pipelines and systems, troubleshoot issues, and ensure high availability and reliability of the data infrastructure
  • Support migration of on-premises data workloads to cloud environments (GCP, AWS, Azure), and architect hybrid solutions as needed
  • Work closely with Data Science Architects, Project Managers, and other stakeholders to understand requirements and deliver data solutions that meet business needs
  • Document architectural designs, processes, and configurations, and provide training to team members on best practices

Person Specifications

  • 05+ years of experience in data engineering, with at least 2-3 years of focus on Hadoop and Cloudera platforms
  • Strong expertise in Hadoop components (HDFS, MapReduce, YARN) and Cloudera tools (CDH, CDP, Impala, Hive, HBase, Sqoop, Oozie)
  • Hands-on experience with data services in GCP (BigQuery, Dataflow, Dataproc), AWS (EMR, Redshift, Glue), and Azure (Synapse Analytics, Data Lake, Databricks)
  • Proficiency in programming languages such as Python, Java, or Scala for data processing and pipeline development
  • Experience with Apache Spark, Kafka, and Flume for real-time and batch data processing
  • Solid hands-on experience in data modeling techniques for relational and NoSQL databases, including schema design for large-scale data lakes
  • Understanding of cloud architecture principles, including serverless computing, containerization (e.g., Kubernetes, Docker), and Infrastructure as Code (IaC) tools like Terraform
  • Familiarity with data security practices, including Kerberos, Ranger, and cloud native security tools (e.g., AWS IAM, Azure AD, GCP IAM)
  • Strong analytical and problem-solving skills to address complex data engineering challenges
  • Excellent communication and collaboration skills to work effectively with cross-functional teams