Job Openings I20 - Software Engineer - SRE (057)

About the job I20 - Software Engineer - SRE (057)

Responsibilities

  • Provide operations and project support as a Reliability Engineer, working closely with IT teams and stakeholders.
  • Monitor system, application, and infrastructure health through observability and proactive monitoring practices.
  • Install, configure, and manage monitoring tools and solutions.
  • Implement, enhance, and integrate monitoring solutions to improve operational visibility and business processes.
  • Analyse operational data and generate dashboards and reports for insights and reporting purposes.
  • Automate day-to-day operational activities using tools such as Ansible, Jenkins, Shell Scripting, PowerShell, and Python.
  • Support reliability engineering service requests and operational activities.
  • Provide support for production incidents and project cutovers, including after office hours when required.
  • Participate in 24x7 operational support when necessary to ensure system availability and stability.

Requirements

  • Minimum 3 years of experience in IT operations automation and monitoring solutions.
  • At least 2 years of scripting experience, preferably with Ansible, Shell Scripting, Python, or PowerShell.
  • Familiarity with platforms and technologies such as Windows, Linux, Unix, AWS Cloud, databases, and middleware.
  • Experience with monitoring, observability, and automation tools.
  • Strong communication and stakeholder management skills to work effectively with internal and external partners.
  • SRE (Site Reliability Engineering) certification or equivalent certification is preferred.