Job Openings Data Quality Engineer

About the job Data Quality Engineer


As a Data Quality Engineer you will be responsible for building automated, repeatable, efficient

processes to perform data quality checks across various data assets. It is your duty to ensure

high quality data is available to internal stakeholders and customers. To do this you will be

designing, developing and documenting automated (sometimes manual) test plans and test

cases writing high-quality, well-structured code. The output of these quality assurance checks

should be easy to disseminate to appropriate audiences so that the proper action can be taken

to ensure consistent, high quality data across the organization. You will be involved with

requirements gathering, working closely with other teams such as Data Engineering and

effectively communicating or explaining technical concepts to developers, product managers,

and business partners. You will provide ongoing support of this data quality framework along

with continuous improvement and refinement.


Contribute to the design, development, documentation, maintenance and monitoring of a data

quality framework

Build repeatable, automated, efficient data quality check

Continuously validate the data quality across data pipelines and repositories against data from

source systems

Work across teams for requirements gathering

Document test plans and test cases

Execute test cases, perform bug tracking, document and share results

Troubleshooting, performance tuning and resolution where necessary

Assist with data quality support tickets and inquiries

Design Data Quality reports and dashboards for various audiences to analyze and

communicate the output of the data quality tests


B.S. in computer science or information systems fields required, or 5+ years related work


Strong analytical, critical thinking skills used to solve complex problems

Strong technical background with a mix of development and automation skills

Outstanding attention to detail and consistently meets deadlines

Exceptional communication and interpersonal skills

Ability to work alongside a highly collaborative team, but also a self-starter, able to work

independently with little guidance

Experience in troubleshooting, performance tuning, and optimization

Proficient in shell scripting, Python, Scala or other programming languages

Knowledge of Spark/PySpark

Excellent SQL knowledge, ability to read/write SQL queries

Skilled in Hive (HQL) and HDFS

Experience working with both unstructured and structured data sets, including flat files, JSON,

XML, ORC, Parquet and AVRO

Comfortable working with big data environments and dealing with large diverse data sets

Proficient in Linux environments

Familiarity with source code management/versioning tools such as Github

Understanding of CI/CD principles and best practices in data processing

Experience building data visualization dashboards to capture data quality metrics using tools

like Tableau

Understanding of public cloud technologies such as AWS, GCP and Azure is a plus