Job Openings Incident Manager

About the job Incident Manager

As a Remote Incident Manager, you will be responsible for leading the response to major incidents that disrupt business operations, ensuring timely resolution, root cause identification, and continuous improvement. This role serves as the central point of coordination during service disruptions, providing clear communication to stakeholders, enforcing escalation procedures, and minimizing downtime across IT systems and infrastructure.

You will work closely with IT operations, security, DevOps, engineering, and service desk teams to triage, investigate, and resolve high-priority incidents. Your ability to stay calm under pressure, maintain a structured approach to incident response, and effectively communicate with both technical and non-technical stakeholders is essential for success.

Key Responsibilities:

Serve as the primary coordinator for major incidents affecting production systems, applications, or infrastructure

Lead incident bridges and war rooms, ensuring proper escalation, timely collaboration, and task ownership

Ensure clear, real-time communication with stakeholders, including business leadership, support teams, and customers

Follow incident management frameworks (e.g., ITIL, ISO 20000) to track incident progress and document timelines

Work with technical teams to ensure root cause analysis (RCA), resolution, and post-incident review (PIR) are completed

Maintain and improve incident management procedures and escalation protocols

Monitor system alerts, SLAs, and dashboards to identify and respond to anomalies proactively

Contribute to change and problem management workflows to reduce future incidents

Document incident metrics, trends, and lessons learned; report performance against incident SLAs

Assist in developing playbooks and training materials to strengthen organizational readiness

Required Qualifications:

Bachelors degree in Information Technology, Computer Science, Cybersecurity, or a related field (or equivalent experience)

2+ years of experience in incident management, IT operations, site reliability, or technical support

Proven experience managing high-severity technical incidents in cloud or enterprise environments

Strong understanding of infrastructure components (e.g., servers, databases, networks, APIs, cloud platforms)

Familiarity with incident management tools (e.g., ServiceNow, PagerDuty, Opsgenie, Jira, Splunk)

Excellent communication and conflict resolution skills; ability to drive consensus and urgency during crises

Ability to work on a rotating schedule, including nights/weekends, if needed for 24/7 incident support.