Job Openings
ETL Developer
About the job ETL Developer
Roles and Responsibility of ETL Developer:
- Must be able to understand and develop ANSI SQL queries.
- Must be aware handling other semi-structured data formats JSON, XMLs, HTMLs etc.
- Must be comfortable in querying tools for distributed systems such as Hive, Impala and other SQL clients (RDBMS, Python, Scala etc.).
- Must be comfortable in handling transforming data to suite business needs.
- Must be able to handle tools for data transformation and/or ETL through tool-based and code-based development ex. Informatica or Talend for tool based / Spark or MapReduce framework in Python or Scala/Java for code-based development
- Must have a background on data warehousing and reporting technologies.
Additional Bonus Qualifications:
- Experienced in end-to-end devOps techniques is a plus. Able to create Jenkins pipelines and create unit test cases for automated testing. Experience in Git version control is a plus.
- Experience in Docker containers and container technologies is a plus.
- Experience in Flink Streaming framework.
- Experience in using Cloud Technologies (AWS RedShift, S3 etc.)
- Must be comfortable in designing sustainable data architecture for Big data systems. That is balance partitioning with respect to target block sizes of the HDFS, compaction techniques, bucketing etc.
- Must understand concepts of Small Files in HDFS, Storage formats such as columnar format (Parquet and ORC) and Transmit formats (Avro).
- Must be able to understand resource allocation and proper resource sizing for distributed jobs. Ex. Spark executor and driver sizes
- Required Technological Skills:
- Strong SQL skills
- Datawarehouse concepts
- Experience in any of the following languages Python, Java, Scala
- Experience in Talend, Information, Ab Initio and other ETL GUI based tools
- Comfortable in working with Hive partitioning and bucketing