About the job 7190-A - Data Catalog Developer
, on-site
Job Title: Data Catalog Developer
Company: Direct Recruit Agency
Contract Details: Full-time, on-site
We are seeking a highly skilled and experienced Data Catalog Developer to join our team at Direct Recruit Agency. As a Data Catalog Developer, you will be responsible for designing, developing, and maintaining data catalogs for our clients. This is a full-time, on-site position that offers a competitive salary and benefits package.
Pay rate 100-105/hr w2
- Expertise in Collibra is a must.
- Will be building Collibra Data Catalog-
- Experience in the new Collibra software - Edge
- Top must-have hard skills:
- Expertise in Collibra Data Management, Data Asset, Data Governance, and BAU support of Collibra Data Catalog.
- Edge Server experience is a must. Collibra Ranger certification preferred.
- Top nice-to-have hard skills: Databricks, AWS ( S3, Glue, Aurora Postgres, Athena), SQL
- Top soft skills: Communication, Problem-Solving, Collaboration, Attention to Detail.
- Team size: 10 Key aspects of the role:
- Development of Data Catalog, build Collibra workflows, and Integrate Edge server with various data sources, authentication, and access controls.
- Day-to-day Expectations: Data Catalog build out, Metadata Synchronization, Lineage Harvester
- Interview Plan/Process: Two virtual Interviews and one final On-site interview
- Citizenship: USC only
Your role as a Senior Data Engineer
- Work on migrating applications from an on-premises location to the cloud service providers.
- Develop products and services on the latest technologies through contributions in development, enhancements, testing, and implementation.
- Develop, modify, and extend code for building cloud infrastructure, and automate using CI/CD pipeline.
- Partners with business and peers in the pursuit of solutions that achieve business goals through an agile software development methodology.
- Perform problem analysis, data analysis, reporting, and communication.
- Work with peers across the system to define and implement best practices and standards.
- Assess applications and help determine the appropriate application infrastructure patterns.
- Use the best practices and knowledge of internal or external drivers to improve products or services.
Qualifications:
What we are looking for:
- Bachelor's degree in Computer Science, Information Systems, or a related field
- Minimum of 3 years of experience as a Data Catalog Developer or in a similar role
- Hands-on experience in building ETL using Databricks SaaS infrastructure.
- Experience in developing data pipeline solutions to ingest and exploit new and existing data sources.
- Expertise in leveraging SQL, programming languages like Python, and ETL tools like Databricks
- Perform code reviews to ensure requirements, optimal execution patterns, and adherence to established standards.
- Computer Science or Equivalent
- Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue).
- Advanced understanding of Container Orchestration services, including Docker and Kubernetes, and a variety of AWS tools and services.
- Good understanding of AWS Identity and Access Management, AWS Networking, and AWS Monitoring tools.
- Proficiency in CI/CD and deployment automation using GITLAB pipeline.
- Proficiency in Cloud infrastructure provisioning tools, e.g., Terraform.
- Proficiency in one or more programming languages, e.g., Python, Scala.
- Experience in Starburst, Trino, and building SQL queries in a federated architecture.
- Good knowledge of Lake house architecture.
- Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark and Scala).
- Build data ingestion workflows from various sources (structured, semi-structured, and unstructured).
- Develop reusable components and frameworks for efficient data processing.
- Implement best practices for data quality, validation, and governance.
- Collaborate with data architects, analysts, and business stakeholders to understand data requirements.
- Tune Spark jobs for performance and scalability in a cloud-based environment.
- Maintain robust data lake or Lakehouse architecture.
- Ensure high availability, security, and integrity of data pipelines and platforms.
- Support troubleshooting, debugging, and performance optimization in production workloads.
If you are a highly motivated and skilled Data Catalog Developer looking for a challenging and rewarding opportunity, we encourage you to apply for this position. Join our dynamic team at Direct Recruit Agency and be a part of our mission to provide top-notch data solutions to our clients.
Package Details
Require 100% onsite 5x/week and a 2nd round onsite interview.100-105/hr w2