About the job Incident Manager
As a Remote Incident Manager, you will be responsible for leading the response to major incidents that disrupt business operations, ensuring timely resolution, root cause identification, and continuous improvement. This role serves as the central point of coordination during service disruptions, providing clear communication to stakeholders, enforcing escalation procedures, and minimizing downtime across IT systems and infrastructure.
You will work closely with IT operations, security, DevOps, engineering, and service desk teams to triage, investigate, and resolve high-priority incidents. Your ability to stay calm under pressure, maintain a structured approach to incident response, and effectively communicate with both technical and non-technical stakeholders is essential for success.
Key Responsibilities:
Serve as the primary coordinator for major incidents affecting production systems, applications, or infrastructure
Lead incident bridges and war rooms, ensuring proper escalation, timely collaboration, and task ownership
Ensure clear, real-time communication with stakeholders, including business leadership, support teams, and customers
Follow incident management frameworks (e.g., ITIL, ISO 20000) to track incident progress and document timelines
Work with technical teams to ensure root cause analysis (RCA), resolution, and post-incident review (PIR) are completed
Maintain and improve incident management procedures and escalation protocols
Monitor system alerts, SLAs, and dashboards to identify and respond to anomalies proactively
Contribute to change and problem management workflows to reduce future incidents
Document incident metrics, trends, and lessons learned; report performance against incident SLAs
Assist in developing playbooks and training materials to strengthen organizational readiness
Required Qualifications:
Bachelors degree in Information Technology, Computer Science, Cybersecurity, or a related field (or equivalent experience)
2+ years of experience in incident management, IT operations, site reliability, or technical support
Proven experience managing high-severity technical incidents in cloud or enterprise environments
Strong understanding of infrastructure components (e.g., servers, databases, networks, APIs, cloud platforms)
Familiarity with incident management tools (e.g., ServiceNow, PagerDuty, Opsgenie, Jira, Splunk)
Excellent communication and conflict resolution skills; ability to drive consensus and urgency during crises
Ability to work on a rotating schedule, including nights/weekends, if needed for 24/7 incident support.