Job Openings
Tech - Junior Site Reliability Engineer
About the job Tech - Junior Site Reliability Engineer
We are seeking a detail-oriented and responsible Junior Site Reliability Engineer
to join our team. The ideal candidate will be part of a dynamic environment, ensuring the seamless operation of IT and business systems by providing 24/7 monitoring, resolving tickets, and identifying the root cause of issues. You will collaborate with cross-functional teams to maintain system reliability and address operational challenges.
System Monitoring and Maintenance:
- Participate in 24/7 monitoring of IT and business systems to ensure availability, performance, and stability.
- Use tools like Grafana and Kibana to monitor metrics, logs, and alerts.
- Identify and escalate potential issues proactively to minimize downtime.
Incident and Ticket Resolution:
- Respond promptly to system incidents and user-reported issues by resolving tickets efficiently.
- Troubleshoot and diagnose problems by analyzing metrics, logs, and SQL queries.
- Work with teams to identify and implement solutions to recurring issues.
Root Cause Analysis:
- Use strong analytical skills to investigate and determine the root cause of incidents.
- Document findings and preventive measures to avoid future occurrences.
Collaboration and Teamwork:
- Collaborate with other IT team members and business units to address and resolve technical issues.
- Provide clear communication and updates on incidents and resolution progress.
Documentation and Reporting:
- Maintain accurate documentation of operational processes, troubleshooting steps, and system changes.
- Prepare reports on system performance, incidents, and resolutions as required
Key Requirements:
Technical Skills:
- Basic understanding of SQL (ability to query and analyze data).
- Basic programming skills (bash, python)
- Familiarity with APIs, IT metrics, services, and monitoring processes.
- Basic knowledge of networking (DNS, HTTP, Load Balancing)
- Experience using tools like Grafana and Kibana for system monitoring.
- Ability to interpret system logs and metrics to diagnose issues.
Analytical and Problem-Solving Skills:
- Strong ability to analyze issues and find root causes effectively.
- Proactive and resourceful in addressing operational challenges.
Soft Skills:
- High level of attentiveness and responsibility in monitoring and issue resolution.
- Excellent teamwork and communication skills to collaborate across teams.
Work Environment:
- Comfortable working in a 24/7 rotational shift environment, including weekends and holidays