Job Openings
Observability Administrator
About the job Observability Administrator
Job Description:
- Location: Fully remote, Central Europe Time Zone
- Start date: To be defined
- Languages: English is mandatory
Duties and Responsibilities:
- Assess current monitoring and observability setup and identify gaps.
- Design, implement and upgrade Prometheus-based monitoring solutions in on-premises setup with multi-tenant and several support teams design.
- Configure and maintain Grafana dashboards for real-time visualization with multi-tenant and several support teams design.
- Integrate Prometheus with other systems and tools (e.g., Loki, Mimir, Tempo, Thanos).
- Design, implement and upgrade Elastic (ELK Stack) for on-premises setups.
- Develop and document monitoring and logging strategies and best practices.
- Set up alerts and notification mechanisms to preemptively address system issues.
- Train internal staff on the use and maintenance of Prometheus, Grafana, and Elastic.
- Provide ongoing support and improvements to the observability framework.
- Ensure high availability and performance of the monitoring and logging systems.
- Provide stand-by services on a rotation basis during weekends, holidays and outside of normal working hours.
- Perform other duties as required.
Required Qualifications & Experience
- At least 5 years in a similar role
- Proven experience in deploying and managing Elastic, Prometheus and Grafana in on-premises setup with multi-tenant and multi-support teams design.
- Strong understanding of observability concepts and best practices, including APM.
- Experience with related technologies (e.g., Kubernetes, Docker, Kibana, Mimir, Loki, Tempo, Thanos, on-premises infrastructure).
- Proficiency in scripting and automation (e.g., Bash, Python).
- DevOps experience and practice.
- Familiarity with infrastructure-as-code tools (e.g., Ansible, Terraform).
- Experience with log management and tracing solutions (e.g., Loki, ELK stack, Jaeger).
- Knowledge of other monitoring tools is desirable, especially SCOM and Checkmk.
- Programming skills is desirable, especially .NET C# and Python.
Education and Certifications:
- Bachelor's or master's degree in information technology is desirable.
- Monitoring certifications in SCOM, Checkmk, Elastic, Prometheus, Grafana is desirable. Linux and/or Windows System Administration
- Network Administration