Job Openings Data Engineer

About the job Data Engineer

Role Purpose

The Data Engineer (AI) is responsible for building and maintaining robust data pipelines and infrastructure that power AI and advanced analytics across the organization. This role ensures that data is collected, cleaned, transformed, and made available in an AI-ready format for Data Scientists, Generative AI/LLM specialists, and business stakeholders.

Key Responsibilities

Data Pipeline Development

  • Develop and maintain scalable ETL/ELT pipelines for structured, semi-structured, and unstructured data.

  • Integrate data from multiple sources including ERP, IoT/sensors, APIs, external datasets, and files.

  • Support both batch and streaming ingestion (real-time pipelines).

Data Management & Transformation

  • Implement data cleansing, transformation, and normalization processes.

  • Ensure data consistency, accuracy, and integrity.

  • Build curated datasets and feature stores for AI/ML models.

Collaboration & Support

  • Work closely with Data Scientists to prepare and deliver training and inference datasets.

  • Support AI Product Managers with data requirements for new use cases.

  • Collaborate with the AI Data Governance team to implement metadata, lineage, and access controls.

Operations & Monitoring

  • Monitor pipeline performance, latency, and costs.

  • Troubleshoot data ingestion or quality issues.

  • Automate workflows using orchestration tools (Airflow, dbt, Azure Data Factory, etc.).

Documentation & Best Practices

  • Maintain documentation of pipelines, schemas, and data dictionaries.

  • Follow coding standards, version control (Git), and CI/CD practices.

  • Ensure compliance with Responsible AI and data security guidelines.

Required Qualifications

  • Bachelors degree in Computer Science, Data Engineering, IT, or a related field.

  • 2-5 years of experience in data engineering, data integration, or ETL development.

  • Proficiency in Python and SQL (Scala/Java is a plus).

  • Hands-on experience with big data tools such as Spark, Kafka/EventHub, Hadoop, dbt, Airflow.

  • Experience with relational and non-relational databases (Postgres, SQL Server, MongoDB, Cassandra).

  • Familiarity with cloud data platforms such as Azure Data Factory, AWS Glue/Redshift, or GCP BigQuery.