Site Reliability Engineer

Job Openings Site Reliability Engineer

About the job Site Reliability Engineer

UpSkill is a recruitment agency ready to go the extra mile in order to help candidates find the best possible job opportunity. Our team of experts is well-versed and experienced in consulting and providing long-term HR support.

We believe that being friendly is the best policy, that's why we are eager to help you through the whole lifecycle of recruitment. Our team comes with 15 years of recruitment experience behind its back. At any given moment, we can offer multiple opportunities from different companies in need of a wide variety of talent.

If you are interested in starting a new job, we will present you with multiple opportunities, will be there to answer all your questions, help you prepare for interviews and tests, provide essential feedback and even guide and support you through the process of recruitment up to the first day at your new job. We support both large international and local companies in growing their business, providing them with the best talent to help them succeed!

Our client is a global leader in cloud-based communications and collaboration software. They are fundamentally changing the nature of human interaction—giving people the freedom to connect powerfully and personally from anywhere, at any time, on any device.

On their behalf we are looking for an experienced SRE Engineer — someone who takes ownership, solves problems independently, and treats production systems with care and respect.
You'll join a team that keeps business-critical telephony and communication services running with 99.999% availability. They need someone who not only reacts to incidents but also anticipates them — who improves systems, automates routine tasks, and helps shape how the team works.

Responsibilities:

Support and maintain Linux-based servers and telephony services in production;
Investigate and resolve incidents in a high-load, distributed environment;
Participate in on-call shifts and ensure the stability of systems under strict SLAs;
Analyze service performance, reliability, and architecture bottlenecks; propose improvements;
Work with development teams to safely deliver and validate changes before production deployment.;
Contribute ideas and help evolve team processes, automation, and monitoring practices.

Requirements:

Strong experience with UNIX/Linux systems and using the CLI for troubleshooting;
Good understanding of networking protocols and SIP;
Strong hands-on experience with Kubernetes (k8s) and containerized environments;
Proven track record of working in production environments, with a careful and methodical approach to changes (testing before deployment, rollback planning, risk mitigation);
Understanding of high-availability systems, fault tolerance, and performance optimization;
Experience automating tasks with Python, Golang, or Shell scripts;
Mindset of an SRE: you treat operations as an engineering discipline and continuously look for ways to make systems more reliable and efficient;
Good command of English (B2 or higher) — ability to communicate effectively with distributed international teams (both written and spoken).

The company offers:

Well-coordinated professional team;
Cutting edge technologies, interesting and challenging tasks, dynamic projects, great opportunities for self-realization, professional and career growth;
Additional Health and Life Insurance Package;
Employee Assistance Program;
25 vacation days;
102,26 EUR/200 BGN Digital Food Vouchers;
61, 36 EUR/120 BGN Gross as part of the salary for Working Expenses Allowance;
Multisport card option.

We welcome the opportunity to learn more about you!
Please send your CV in English.
Please note that only short-listed candidates will be contacted.
License No.2826. We will treat your application with full confidentiality!

Or refer someone