About the job Data Quality Engineer (PySpark) – Data Migration Project
Job Title
Data Quality Engineer (PySpark) – Data Migration Project
Role Overview
We are hiring a Data Quality Engineer to support a large-scale Data Migration Project.
Your mission is to ensure that data migrated through ingestion and transformation processes remains accurate, consistent, and fully validated before client acceptance.
This role focuses on data validation, reconciliation, and migration testing within a big data environment.
Key Responsibilities
-
Validate data accuracy after migration from source to target platform
-
Perform data reconciliation between legacy and new data environments
-
Develop and execute PySpark scripts to compare datasets across systems
-
Verify schema consistency and table structure alignment
-
Validate data type conversion (e.g., Int / Double / Float Decimal)
-
Perform row count, min/max, null count, distinct count, and string validation checks
-
Generate checksum / hash validation (e.g., SHA-256) to ensure data integrity
-
Identify mismatched records and conduct root cause analysis
-
Produce validation reports to support UAT and project sign-off
-
Collaborate with data engineering and platform teams to resolve discrepancies
Required Qualifications
-
2–5+ years of experience in Data Engineering, Data QA, or Data Validation
-
Strong hands-on experience with PySpark / Spark SQL
-
Solid understanding of data migration and schema validation
-
Experience performing cross-system data reconciliation
-
Knowledge of checksum / hashing concepts for data integrity validation
-
Strong analytical and debugging skills
-
Able to work independently in a project delivery environment
Nice to Have
-
Experience in large-scale data migration projects
-
Familiarity with big data platforms or distributed data environments
-
Understanding of data quality frameworks or governance concepts
-
Experience working in enterprise or system integration environments