G02 - Platform Operations Engineer

Singapore, Singapore, Singapore

Or refer someone

Job Openings G02 - Platform Operations Engineer

About the job G02 - Platform Operations Engineer

Responsibilities:

Lead cloud platform operations for Cloud File Transfer (CFT) with focus on monitoring, performance optimisation, reliability, release management, and continuous improvement within AWS environments.

Own L2 incident management, troubleshooting, and escalation handling for high-throughput file transfer workflows across multiple agencies, working closely with engineering, security, and agency stakeholders to resolve incidents within defined SLAs.

Manage, design, and continuously optimise AWS cloud infrastructure to ensure scalability, security, cost-efficiency, and high availability of the CFT platform.

Establish, refine, and enforce operational processes including runbooks, dashboards, daily health checks, incident communication practices, and operational reporting with actionable insights.

Drive change, release, and maintenance management by performing impact analysis, risk assessment, mitigation planning, and executing system upgrades and infrastructure improvements to ensure platform stability.

Review testing results to ensure all changes meet operational, performance, and security requirements before release, while defining and improving operational OKRs, SLAs, and reliability metrics.

Contribute to portal and backend enhancements, bug fixes, and operational tooling to continuously improve platform reliability, performance, and maintainability.

Share operational best practices, incident learnings, and technical knowledge within the team and across the programme to improve engineering standards and platform reliability.

Requirements

Degree in Computer Science, Information Technology, or related field, or equivalent practical experience.

Minimum 2 years of hands-on experience managing production workloads in public cloud environments (preferably AWS).

Strong problem-solving skills across cloud infrastructure, applications, and distributed systems.

Experience handling production incidents with ownership, urgency, and attention to detail.

Experience defining and enforcing operational processes, procedures, and best practices.

Familiarity with maintaining high-availability, secure cloud environments and implementing preventative operational controls.

Understanding of change management, impact assessment, and service reliability improvements.

Preferred: experience in operating applications on AWS, and experience working on

Key Technologies:

Terraform for infrastructure as code and cloud resource management.

GitLab for CI/CD pipelines and version control.

Strong understanding of AWS services and architecture supporting production workloads.

Or refer someone