Data Engineer - Data Structuring
Job Description:
Role Overview
We are seeking a Senior Data / Streaming Engineer with 5–7 years of experience to design, build, and operate scalable real-time and batch data processing platforms. The role focuses on stream processing, cloud-native data pipelines, and analytics systems running on AWS.
You will work closely with data, platform, and product teams to deliver reliable, high-performance data solutions supporting real-time analytics, monitoring, and downstream consumption.
Key Responsibilities
- Design, develop, and maintain real-time stream processing applications using Apache Flink / PyFlink and Spark, including state management and event-time processing with watermarks.
- Build and optimize Python-based data pipelines using libraries such as Pandas, Polars, boto3, and PyArrow for data transformation and integration.
- Implement and manage Kafka-based streaming architectures (Apache Kafka / AWS MSK), including topic design, partitioning, and consumer/producer optimization.
- Develop and operate cloud-native data platforms on AWS, leveraging services such as S3, Managed Flink, CloudWatch, MSK, and IAM.
- Write and optimize SQL-based transformations using Flink SQL, ensuring efficient query execution and scalable data processing.
- Store, query, and analyze large datasets using ClickHouse, and build Grafana dashboards for observability, analytics, and system monitoring.
- Orchestrate batch and streaming workflows using Apache Airflow, including DAG design, scheduling, and operational monitoring.
- Containerize applications using Docker and support deployments on Kubernetes, following best practices for scalability and resilience.
- Collaborate with DevOps, platform, and analytics teams to improve system reliability, performance, and cost efficiency.
- Participate in code reviews, technical design discussions, and production support activities.
Required Skills & Experience
- 5–7 years of professional experience in data engineering, streaming platforms, or distributed systems.
- Strong hands-on experience with Apache Flink / PyFlink and/or Apache Spark for stream and batch processing.
- Proficient in Python for data engineering and automation (Pandas, Polars, boto3, PyArrow).
- Solid experience with Apache Kafka or AWS MSK, including streaming concepts such as partitions, offsets, and consumer groups.
- Strong understanding of AWS cloud services, particularly S3, MSK, Managed Flink, CloudWatch, and IAM.
- Advanced SQL skills, including data transformation and query optimization (Flink SQL preferred).
- Experience with ClickHouse or similar OLAP databases, and Grafana for dashboards and monitoring.
-
Working knowledge of Docker and Kubernetes fundamentals.
- Experience with Apache Airflow for pipeline orchestration and scheduling.
- Good understanding of distributed systems, fault tolerance, and performance tuning.
Required Skills:
Airflow Grafana Data Engineering Apache Spark Data Processing Kafka Apache Kafka Pandas Spark Pipelines Apache Scalability Reviews Reliability DevOps AWS Optimization Analytics Kubernetes Automation Integration Databases Docker Scheduling Design Engineering SQL Python Management