Job Openings G02 - Platform Operations Engineer

About the job G02 - Platform Operations Engineer

Responsibilities:

  • Lead cloud platform operations for Cloud File Transfer (CFT) with focus on monitoring, performance optimisation, reliability, release management, and continuous improvement within AWS environments.
  • Own L2 incident management, troubleshooting, and escalation handling for high-throughput file transfer workflows across multiple agencies, working closely with engineering, security, and agency stakeholders to resolve incidents within defined SLAs.
  • Manage, design, and continuously optimise AWS cloud infrastructure to ensure scalability, security, cost-efficiency, and high availability of the CFT platform.
  • Establish, refine, and enforce operational processes including runbooks, dashboards, daily health checks, incident communication practices, and operational reporting with actionable insights.
  • Drive change, release, and maintenance management by performing impact analysis, risk assessment, mitigation planning, and executing system upgrades and infrastructure improvements to ensure platform stability.
  • Review testing results to ensure all changes meet operational, performance, and security requirements before release, while defining and improving operational OKRs, SLAs, and reliability metrics.
  • Contribute to portal and backend enhancements, bug fixes, and operational tooling to continuously improve platform reliability, performance, and maintainability.
  • Share operational best practices, incident learnings, and technical knowledge within the team and across the programme to improve engineering standards and platform reliability.

Requirements 

  • Degree in Computer Science, Information Technology, or related field, or equivalent practical experience.
  • Minimum 2 years of hands-on experience managing production workloads in public cloud environments (preferably AWS).
  • Strong problem-solving skills across cloud infrastructure, applications, and distributed systems.
  • Experience handling production incidents with ownership, urgency, and attention to detail.
  • Experience defining and enforcing operational processes, procedures, and best practices.
  • Familiarity with maintaining high-availability, secure cloud environments and implementing preventative operational controls.
  • Understanding of change management, impact assessment, and service reliability improvements.
  • Preferred: experience in operating applications on AWS, and experience working on

Key Technologies:

  • Terraform for infrastructure as code and cloud resource management.
  • GitLab for CI/CD pipelines and version control.
  • Strong understanding of AWS services and architecture supporting production workloads.