Job Openings Tech - Junior Site Reliability Engineer

About the job Tech - Junior Site Reliability Engineer

We are seeking a detail-oriented and responsible Junior Site Reliability Engineer

to join our team. The ideal candidate will be part of a dynamic environment, ensuring the seamless operation of IT and business systems by providing 24/7 monitoring, resolving tickets, and identifying the root cause of issues. You will collaborate with cross-functional teams to maintain system reliability and address operational challenges.


System Monitoring and Maintenance:

  • Participate in 24/7 monitoring of IT and business systems to ensure availability, performance, and stability.
  • Use tools like Grafana and Kibana to monitor metrics, logs, and alerts.
  • Identify and escalate potential issues proactively to minimize downtime.

Incident and Ticket Resolution:

  • Respond promptly to system incidents and user-reported issues by resolving tickets efficiently.
  • Troubleshoot and diagnose problems by analyzing metrics, logs, and SQL queries.
  • Work with teams to identify and implement solutions to recurring issues.

Root Cause Analysis:

  • Use strong analytical skills to investigate and determine the root cause of incidents.
  • Document findings and preventive measures to avoid future occurrences.

Collaboration and Teamwork:

  • Collaborate with other IT team members and business units to address and resolve technical issues.
  • Provide clear communication and updates on incidents and resolution progress.

Documentation and Reporting:

  • Maintain accurate documentation of operational processes, troubleshooting steps, and system changes.
  • Prepare reports on system performance, incidents, and resolutions as required

Key Requirements:

Technical Skills:

  • Basic understanding of SQL (ability to query and analyze data).
  • Basic programming skills (bash, python)
  • Familiarity with APIs, IT metrics, services, and monitoring processes.
  • Basic knowledge of networking (DNS, HTTP, Load Balancing)
  • Experience using tools like Grafana and Kibana for system monitoring.
  • Ability to interpret system logs and metrics to diagnose issues.

Analytical and Problem-Solving Skills:

  • Strong ability to analyze issues and find root causes effectively.
  • Proactive and resourceful in addressing operational challenges.

Soft Skills:

  • High level of attentiveness and responsibility in monitoring and issue resolution.
  • Excellent teamwork and communication skills to collaborate across teams.

Work Environment:

  • Comfortable working in a 24/7 rotational shift environment, including weekends and holidays