Job Openings Data Engineer

About the job Data Engineer

Key Responsibilities

Data Operations & Pipeline Support

  • Assist in ingesting, collecting, validating, and storing structured/unstructured batch data coming through Edge nodes or direct DB connections
  • Support ETL/ELT jobs running on Hadoop, Hive, Impala, and Spark
  • Monitor daily data loads, troubleshoot failures, and ensure data availability for analytics use cases
  • Maintain HDFS directory structure, Hive tables, and data partitions
  • Perform file-level data quality checks and checksum validations and table level validations for data consistency

Platform & Infrastructure Operations

  • Support the operation of on-prem Hadoop clusters (Cloudera)
  • Assist in OS-level tasks: log checks, service restarts, disk usage monitoring, user/permission handling
  • Assist in regular Big Data cluster health checks
  • Support platform upgrades, patches, configuration changes, and security hardening efforts managed by the senior engineer
  • Work with network and system teams during installation, troubleshooting, or hardware issues

Tools & Technologies

  • Assist in running and maintaining data flows involving Hive, Impala, HDFS, Spark, Kafka (basic), HBase (basic), and Linux environments
  • Use tools like NiFi/SFTP for data movement with NiFi flow development & NiFi cluster management
  • Support API-based data push/pull if required for integrations

Data Governance & Documentation

  • Maintain metadata, data dictionary updates, and platform documentation
  • Ensure compliance with Kerberos/LDAP authentication and Cloudera Navigator governance processes
  • Record operational runbooks and incident logs

Collaboration & Support

  • Work under the senior engineer to ensure continuous operations of the client environment
  • Participate in joint troubleshooting with Client team during data-source onboarding
  • Provide L1/L2 support for data ingestion, cluster operations, and daily job executions

Work Complexity and Role Expectation

  • Work on assigned operational tasks within the Big Data platform under guidance
  • Support development, testing, and automation of simple data flows
  • Involve in routine batch workloads, testbed validations
  • Participate as a team member in platform enhancements, monitoring improvements, and data integration activities

Person Specifications

Education

  • Bachelors degree in computer science, IT, Electronics/Telecom Engineering, or a related field

Technical Skills

  • Basic knowledge of Hadoop ecosystem: HDFS, Hive, Spark, Yarn (hands-on exposure is an added benefit)
  • Familiarity with Linux shell commands; ability to navigate logs and services
  • Good understanding of SQL able to write and troubleshoot complex queries
  • Exposure to Python/Scala/Java is an added advantage
  • Basic understanding of data pipelines, ETL processes, and batch data workflows
  • Exposure to Cloudera platform is a plus

Experience

  • 1 - 2 years of experience in Data Engineering, Database operations, or Big Data platform support
  • Experience in telecom domain or enterprise data environments is an added advantage

Soft Skills

  • Good analytical and troubleshooting mindset
  • Ability to collaborate with senior engineers and follow structured operational practices
  • Effective communication and willingness to learn complex distributed systems