7190-A - Data Catalog Developer

NYC, New York, United States Engineering Dept

$ 100.00 - 105.00 (US Dollar)

Or refer someone

Job Openings 7190-A - Data Catalog Developer

About the job 7190-A - Data Catalog Developer

, on-site

Job Title: Data Catalog Developer

Company: Direct Recruit Agency

Contract Details: Full-time, on-site

We are seeking a highly skilled and experienced Data Catalog Developer to join our team at Direct Recruit Agency. As a Data Catalog Developer, you will be responsible for designing, developing, and maintaining data catalogs for our clients. This is a full-time, on-site position that offers a competitive salary and benefits package.

Pay rate 100-105/hr w2

- Expertise in Collibra is a must.

- Will be building Collibra Data Catalog-

- Experience in the new Collibra software - Edge

- Top must-have hard skills:

- Expertise in Collibra Data Management, Data Asset, Data Governance, and BAU support of Collibra Data Catalog.

- Edge Server experience is a must. Collibra Ranger certification preferred.

- Top nice-to-have hard skills: Databricks, AWS ( S3, Glue, Aurora Postgres, Athena), SQL

- Top soft skills: Communication, Problem-Solving, Collaboration, Attention to Detail.

- Team size: 10 Key aspects of the role:

- Development of Data Catalog, build Collibra workflows, and Integrate Edge server with various data sources, authentication, and access controls.

- Day-to-day Expectations: Data Catalog build out, Metadata Synchronization, Lineage Harvester

- Interview Plan/Process: Two virtual Interviews and one final On-site interview

- Citizenship: USC only

Your role as a Senior Data Engineer
- Work on migrating applications from an on-premises location to the cloud service providers.
- Develop products and services on the latest technologies through contributions in development, enhancements, testing, and implementation.
- Develop, modify, and extend code for building cloud infrastructure, and automate using CI/CD pipeline.
- Partners with business and peers in the pursuit of solutions that achieve business goals through an agile software development methodology.
- Perform problem analysis, data analysis, reporting, and communication.
- Work with peers across the system to define and implement best practices and standards.
- Assess applications and help determine the appropriate application infrastructure patterns.
- Use the best practices and knowledge of internal or external drivers to improve products or services.

Qualifications:
What we are looking for:

- Bachelor's degree in Computer Science, Information Systems, or a related field

- Minimum of 3 years of experience as a Data Catalog Developer or in a similar role

- Hands-on experience in building ETL using Databricks SaaS infrastructure.
- Experience in developing data pipeline solutions to ingest and exploit new and existing data sources.
- Expertise in leveraging SQL, programming languages like Python, and ETL tools like Databricks
- Perform code reviews to ensure requirements, optimal execution patterns, and adherence to established standards.
- Computer Science or Equivalent
- Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue).
- Advanced understanding of Container Orchestration services, including Docker and Kubernetes, and a variety of AWS tools and services.
- Good understanding of AWS Identity and Access Management, AWS Networking, and AWS Monitoring tools.
- Proficiency in CI/CD and deployment automation using GITLAB pipeline.
- Proficiency in Cloud infrastructure provisioning tools, e.g., Terraform.
- Proficiency in one or more programming languages, e.g., Python, Scala.
- Experience in Starburst, Trino, and building SQL queries in a federated architecture.
- Good knowledge of Lake house architecture.
- Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark and Scala).
- Build data ingestion workflows from various sources (structured, semi-structured, and unstructured).
- Develop reusable components and frameworks for efficient data processing.
- Implement best practices for data quality, validation, and governance.
- Collaborate with data architects, analysts, and business stakeholders to understand data requirements.
- Tune Spark jobs for performance and scalability in a cloud-based environment.
- Maintain robust data lake or Lakehouse architecture.
- Ensure high availability, security, and integrity of data pipelines and platforms.
- Support troubleshooting, debugging, and performance optimization in production workloads.

If you are a highly motivated and skilled Data Catalog Developer looking for a challenging and rewarding opportunity, we encourage you to apply for this position. Join our dynamic team at Direct Recruit Agency and be a part of our mission to provide top-notch data solutions to our clients.

Package Details

Require 100% onsite 5x/week and a 2nd round onsite interview.100-105/hr w2

Or refer someone