Job Openings
HPC Cloud Engineer
About the job HPC Cloud Engineer
Responsibilities:
- Optimize performance and scalability of HPC applications running in containerized environments.
- Stay up to date with the latest advancements in HPC, cloud technologies.
- Collaborate with other DevOps engineers and developers to ensure seamless integration of HPC solutions.
- Configure Linux OS for HPC needs.
- Implement and maintain Kubernetes clusters for HPC workloads.
- Explore, Qualify & tune open source cloud-based technology stacks for High Performance Compute demands.
- Design robust high performant cloud-based software architecture systems involving CPU/GPU workloads, scalable/robust storages, high-bandwidth inter-connects
Qualifications/Education Desired
Attitude:
Looking for individuals who are inquisitive, thrives on challenge, enjoy problem solving and have excellent written & verbal skills.
Required Qualifications:
- Strong knowledge of HPC systems and cloud computing technologies (gRPC, Kafka, Kubernetes, ZeroMQ, Redis, Ceph, etc.).
- Strong Linux Performance tunning
- Proven experience with Kubernetes and container orchestration
- Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu)
- Experience in different remote boot technologies like System-D, Net boot/PXE, Linux HA.
- Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP.
- Strong fundamentals with respect to linux networking, storages.
- Proficiency in scripting languages such as Ansible, Python and Bash.
- Decent proficiency in low-level language as in c.
- Experience with CI/CD tools like Jenkins, GitLab or similar.
- Familiarity with HPC workload managers and schedulers (e.g., Slurm, PBS).
- Excellent problem-solving skills and attention to detail.
- Strong communication and teamwork abilities.
- Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc) .
Preferred Qualifications:
- CPU and GPU Performance tunning
- BS or MS degree + 4 to 9 years validated experience
- Computer Engineering or Electrical Engineer related fields
Skills and Abilities:
- Team Orientation & Interpersonal Highly motivated teammate with ability to develop and maintain collaborative relationships with all levels within and external to the organization.
- Organization & Time Management Able to plan, schedule, organize, and follow up on tasks related to the job to achieve goals within or ahead of established time frames.
- Multi-task - Ability to expeditiously organize, coordinate, manage, prioritize, and perform multiple tasks simultaneously to swiftly assess a situation, determine a logical course of action, and apply the appropriate response.
- Adaptability to Change Able to be flexible and supportive, and able to assimilate change positively and proactively in rapid growth environment.
- Outstanding teammate with excellent written and verbal communications skills.
Must-Haves
Experience: 4-9 years
HPC
Any Cloud -AWS (Amazon Web Services), Microsoft Azure, Google Cloud Platform (GCP),
IBM Cloud, Oracle Cloud
Cloud computing technologies (gRPC, Kafka, Kubernetes, ZeroMQ, Redis, Ceph, etc.).
Performance tunning
Kubernetes and container orchestration