Job Description:

Purpose of the Role

Design, implement, and maintain secure, scalable, and cost-effective cloud infrastructure. This role ensures long-term cloud sustainability through FinOps, cost optimization, automation, and resilient architectures that support business growth, reliability, and operational efficiency.

Key Responsibilities

  • Design and implement scalable, secure, cost-efficient cloud infrastructure.
  • Lead cloud cost-optimization using FinOps principles and long-term commitments.
  • Architect cloud solutions for sustainability and economies of scale.
  • Configure and manage compute, networking, storage, and monitoring tools.
  • Automate provisioning, deployment, and maintenance using IaC.
  • Work closely with DevOps and Engineering to ensure performance and high availability.
  • Monitor infrastructure health, optimize resource usage, and resolve performance issues.
  • Implement strong cloud security, encryption, and compliance standards.
  • Evaluate and recommend new cloud services and technologies.

Minimum Requirements

  • Bachelor's degree in Computer Science / Information Technology / or related field.
  • 7+ years in infrastructure engineering or similar roles.
  • 3–5+ years hands‑on experience designing and managing secure, scalable cloud environments (AWS, Azure, or GCP).
  • Strong understanding of cloud architecture, networking, security, and FinOps.
  • Experience with Infrastructure as Code (e.g., Terraform, CloudFormation, ARM/Bicep).
  • Relevant certifications beneficial (AWS Solutions Architect, Azure Architect, FinOps Certified Practitioner).
  • Strong analytical, problem‑solving, communication, and collaboration skills.

Key Performance Measures

  • Cloud Cost Efficiency: Savings via right‑sizing, Reserved Instances/Savings Plans, and FinOps reporting.
  • Scalability & Elasticity: Ability to scale environments with minimal manual intervention.
  • Security & Compliance: Effective implementation of security controls and audit readiness.
  • System Uptime: Meeting or exceeding cloud uptime SLAs.
  • Incident Response (MTTR): Speed and effectiveness in detecting and resolving incidents.
  • Automation: Level of automation improving deployment velocity and reducing manual tasks.
  • Resource Utilization: Efficient CPU, memory, storage, and network usage.
  • Disaster Recovery Readiness: Achieving target RPO/RTO and successful DR test results.
  • 360° Internal Feedback: Collaboration and stakeholder satisfaction across teams.

Working Place:

Bryanston