Job Openings HPC Cloud Engineer

About the job HPC Cloud Engineer

Responsibilities:

  • Optimize performance and scalability of HPC applications running in containerized environments.
  • Stay up to date with the latest advancements in HPC, cloud technologies.
  • Collaborate with other DevOps engineers and developers to ensure seamless integration of HPC solutions.
  • Configure Linux OS for HPC needs.
  • Implement and maintain Kubernetes clusters for HPC workloads.
  • Explore, Qualify & tune open source cloud-based technology stacks for High Performance Compute demands.
  • Design robust high performant cloud-based software architecture systems involving CPU/GPU workloads, scalable/robust storages, high-bandwidth inter-connects

Qualifications/Education Desired

Attitude:

Looking for individuals who are inquisitive, thrives on challenge, enjoy problem solving and have excellent written & verbal skills.

Required Qualifications:

  • Strong knowledge of HPC systems and cloud computing technologies (gRPC, Kafka, Kubernetes, ZeroMQ, Redis, Ceph, etc.).
  • Strong Linux Performance tunning
  • Proven experience with Kubernetes and container orchestration
  • Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu)
  • Experience in different remote boot technologies like System-D, Net boot/PXE, Linux HA.
  • Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP.
  • Strong fundamentals with respect to linux networking, storages.
  • Proficiency in scripting languages such as Ansible, Python and Bash.
  • Decent proficiency in low-level language as in c.
  • Experience with CI/CD tools like Jenkins, GitLab or similar.
  • Familiarity with HPC workload managers and schedulers (e.g., Slurm, PBS).
  • Excellent problem-solving skills and attention to detail.
  • Strong communication and teamwork abilities.
  • Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc) .

Preferred Qualifications:

  • CPU and GPU Performance tunning
  • BS or MS degree + 4 to 9 years validated experience
  • Computer Engineering or Electrical Engineer related fields

Skills and Abilities:

  • Team Orientation & Interpersonal Highly motivated teammate with ability to develop and maintain collaborative relationships with all levels within and external to the organization.
  • Organization & Time Management Able to plan, schedule, organize, and follow up on tasks related to the job to achieve goals within or ahead of established time frames.
  • Multi-task - Ability to expeditiously organize, coordinate, manage, prioritize, and perform multiple tasks simultaneously to swiftly assess a situation, determine a logical course of action, and apply the appropriate response.
  • Adaptability to Change Able to be flexible and supportive, and able to assimilate change positively and proactively in rapid growth environment.
  • Outstanding teammate with excellent written and verbal communications skills.

Must-Haves
Experience: 4-9 years
HPC
Any Cloud -AWS (Amazon Web Services), Microsoft Azure, Google Cloud Platform (GCP),
IBM Cloud, Oracle Cloud
Cloud computing technologies (gRPC, Kafka, Kubernetes, ZeroMQ, Redis, Ceph, etc.).
Performance tunning
Kubernetes and container orchestration