Job Openings
G02 - Platform Operations Engineer
About the job G02 - Platform Operations Engineer
Responsibilities:
- Lead cloud platform operations for Cloud File Transfer (CFT) with focus on monitoring, performance optimisation, reliability, release management, and continuous improvement within AWS environments.
- Own L2 incident management, troubleshooting, and escalation handling for high-throughput file transfer workflows across multiple agencies, working closely with engineering, security, and agency stakeholders to resolve incidents within defined SLAs.
- Manage, design, and continuously optimise AWS cloud infrastructure to ensure scalability, security, cost-efficiency, and high availability of the CFT platform.
- Establish, refine, and enforce operational processes including runbooks, dashboards, daily health checks, incident communication practices, and operational reporting with actionable insights.
- Drive change, release, and maintenance management by performing impact analysis, risk assessment, mitigation planning, and executing system upgrades and infrastructure improvements to ensure platform stability.
- Review testing results to ensure all changes meet operational, performance, and security requirements before release, while defining and improving operational OKRs, SLAs, and reliability metrics.
- Contribute to portal and backend enhancements, bug fixes, and operational tooling to continuously improve platform reliability, performance, and maintainability.
- Share operational best practices, incident learnings, and technical knowledge within the team and across the programme to improve engineering standards and platform reliability.
Requirements
- Degree in Computer Science, Information Technology, or related field, or equivalent practical experience.
- Minimum 2 years of hands-on experience managing production workloads in public cloud environments (preferably AWS).
- Strong problem-solving skills across cloud infrastructure, applications, and distributed systems.
- Experience handling production incidents with ownership, urgency, and attention to detail.
- Experience defining and enforcing operational processes, procedures, and best practices.
- Familiarity with maintaining high-availability, secure cloud environments and implementing preventative operational controls.
- Understanding of change management, impact assessment, and service reliability improvements.
- Preferred: experience in operating applications on AWS, and experience working on
Key Technologies:
- Terraform for infrastructure as code and cloud resource management.
- GitLab for CI/CD pipelines and version control.
- Strong understanding of AWS services and architecture supporting production workloads.