Hong Kong, Hong Kong SAR, Hong Kong

Lead Support Analyst (Observability/SRE)

 Job Description:

Lead Support Analyst (Observability / SRE)

We are seeking a senior Lead Support Analyst to join a Shared Services team responsible for monitoring, observability, and site reliability engineering (SRE) operations across critical systems.

This is a hands-on role focused on ensuring platform reliability, performance, and availability while also providing guidance to junior team members.

Key Responsibilities

  • Support monitoring, observability, and SRE operations for critical production systems
  • Build and maintain dashboards, alerts, and monitoring solutions using Grafana, Prometheus, Elasticsearch, and related tools
  • Troubleshoot Linux environments and investigate system performance issues
  • Collaborate with engineering, infrastructure, and application teams to improve system stability and resilience
  • Participate in incident management, on-call support, and continuous improvement initiatives
  • Mentor and support junior team members

Requirements

  • 8+ years of experience in SRE, Production Support, Monitoring Engineering, or related areas
  • Strong hands-on experience with Grafana and observability/monitoring platforms
  • Experience with Prometheus, Elasticsearch/Kibana, or similar technologies
  • Solid Linux administration and troubleshooting skills
  • Proficiency in Python ORĀ Golang, and Shell scripting.
  • Strong understanding of system reliability, incident management, and monitoring best practices
  • Excellent communication skills in English

Preferred

  • Experience in banking, financial services, or large enterprise environments
  • Exposure to ITRS Geneos, Victoria Metrics, Ansible, or CI/CD tools
  • Experience mentoring or guiding junior engineers
  Required Skills:

Prometheus Kibana Support Incident Management Grafana Shell Scripting Financial Services Operations ElasticSearch Ansible CI/CD Mentoring Metrics Reliability Continuous Improvement Availability Banking Infrastructure Communication Skills Linux Troubleshooting Administration Engineering English Python Communication Management