Cloud Data Engineer

NYC, New York, United States Engineering Dept

$ 200,000.00 - 20,000.00 (US Dollar)

Job Openings Cloud Data Engineer

About the job Cloud Data Engineer

Position: 7190-B -Cloud Data Engineer ON-SITE ONLY

Direct Recruit Agency is currently seeking a highly motivated and skilled Cloud Data Engineer to join our team on a full-time basis. As a Cloud Data Engineer, you will play a crucial role in designing and implementing data solutions on cloud platforms for our clients. You will be responsible for building and maintaining data pipelines, optimizing data storage and retrieval, and ensuring data quality and security.

Company: Direct Recruit Agency

Contract Details: Full-time, on-site

7190-B - Cloud Data Engineer-Screen your candidates in PySpark - they will be tested on it-Want someone who has been a PySpark developer, SQL background is necessary - they will be interviewed on it - should be very versatile

7190- B Location Preference: Onsite only. Work from the NY office. No Remote

- Top nice-to-have hard skills: Starburst, AWS CFS, Terraform, CI/CD, GitLab

- Top must-have hard skills: Databricks, AWS ( S3, Glue, Aurora Postgres, Athena),

- Top soft skills: Communication, Problem-Solving, Collaboration, Attention to Detail.

- Team size: 10 Key aspects of the role: Development of Data Pipelines, build Python/Pyspark framework, build solution for a Databricks Medallion architecture, Databricks SaaS experience Day-to-day Expectations: Develop ETL pipelines, Test, Validate, and Deploy using CI/CD Interview Plan/Process: Coderpad exercise on Pyspark and SQL Citizenship: USC only

Your role as a Senior Data Engineer
- Work on migrating applications from an on-premises location to the cloud service providers.
- Develop products and services on the latest technologies through contributions in development, enhancements, testing, and implementation.
- Develop, modify, and extend code for building cloud infrastructure, and automate using CI/CD pipeline.
- Partners with business and peers in the pursuit of solutions that achieve business goals through an agile software development methodology.
- Perform problem analysis, data analysis, reporting, and communication.
- Work with peers across the system to define and implement best practices and standards.
- Assess applications and help determine the appropriate application infrastructure patterns.
- Use the best practices and knowledge of internal or external drivers to improve products or services.

Qualifications:
What we are looking for:

- Bachelor's degree in Computer Science, Information Systems, or a related field

- Minimum of 3 years of experience as a Data Catalog Developer or in a similar role

- Hands-on experience in building ETL using Databricks SaaS infrastructure.
- Experience in developing data pipeline solutions to ingest and exploit new and existing data sources.
- Expertise in leveraging SQL, programming languages like Python, and ETL tools like Databricks
- Perform code reviews to ensure requirements, optimal execution patterns, and adherence to established standards.
- Computer Science or Equivalent
- Expertise in AWS Compute (EC2, EMR), AWS Storage (S3, EBS), AWS Databases (RDS, DynamoDB), AWS Data Integration (Glue).
- Advanced understanding of Container Orchestration services, including Docker and Kubernetes, and a variety of AWS tools and services.
- Good understanding of AWS Identity and Access Management, AWS Networking, and AWS Monitoring tools.
- Proficiency in CI/CD and deployment automation using GITLAB pipeline.
- Proficiency in Cloud infrastructure provisioning tools, e.g., Terraform.
- Proficiency in one or more programming languages, e.g., Python, Scala.
- Experience in Starburst, Trino, and building SQL queries in a federated architecture.
- Good knowledge of Lake house architecture.
- Design, develop, and optimize scalable ETL/ELT pipelines using Databricks and Apache Spark (PySpark and Scala).
- Build data ingestion workflows from various sources (structured, semi-structured, and unstructured).
- Develop reusable components and frameworks for efficient data processing.
- Implement best practices for data quality, validation, and governance.
- Collaborate with data architects, analysts, and business stakeholders to understand data requirements.
- Tune Spark jobs for performance and scalability in a cloud-based environment.
- Maintain robust data lake or Lakehouse architecture.
- Ensure high availability, security, and integrity of data pipelines and platforms.
- Support troubleshooting, debugging, and performance optimization in production workloads.

If you are a highly motivated and skilled Data Catalog Developer looking for a challenging and rewarding opportunity, we encourage you to apply for this position. Join our dynamic team at Direct Recruit Agency and be a part of our mission to provide top-notch data solutions to our clients.

Key Responsibilities:

- Design, develop and deploy data solutions on cloud platforms such as AWS, Azure, or Google Cloud

- Build and maintain data pipelines to extract, transform, and load data from various sources

- Optimize data storage and retrieval for performance and cost efficiency

- Collaborate with cross-functional teams to understand data requirements and design appropriate solutions

- Ensure data quality and security by implementing best practices and monitoring data processes

- Troubleshoot and resolve data-related issues in a timely manner

- Stay updated with the latest trends and technologies in cloud data engineering and make recommendations for process improvements

- Mentor and train junior data engineers on cloud data engineering best practices

Qualifications:

- Bachelor's degree in Computer Science, Information Systems, or a related field

- Minimum of 3 years of experience in data engineering, with at least 1 year of experience in cloud data engineering

- Strong knowledge of cloud platforms such as AWS, Azure, or Google Cloud and their data services

- Proficiency in programming languages such as Python, Java, or Scala

- Experience with data warehouse technologies such as Redshift, Snowflake, or BigQuery

- Familiarity with data integration tools like Informatica, Talend, or Matillion

- Strong understanding of data modeling and database design principles

- Experience working with Agile methodologies and DevOps practices

- Excellent problem-solving and analytical skills

- Strong communication and collaboration skills

- Ability to work independently and in a team environment

If you are passionate about data engineering and have a strong understanding of cloud platforms, we encourage you to apply for this exciting opportunity. At Direct Recruit Agency, we offer a competitive salary, comprehensive benefits, and a dynamic work environment. Join our team and be a part of our growing success!

Package Details

Both roles require 100% onsite 5x/week and a 2nd round onsite interview.

Or refer someone