Ofertas de empleo SRE - Broadcast Automation specialized

Acerca del puesto SRE - Broadcast Automation specialized

Client Description

Our client is the world's leading media-focused consultancy. Spanning the globe with 20 locations, they are connected with customers in every time zone. They strategize, advise, design and implement technology and business solutions tailored to businesses’ needs. 

Project Details:

The Site Reliability Engineer will investigate, analyze, and address issues within broadcast playout systems and their integration points, ensuring optimal performance and reliability.

The focus will be on driving investigations, reporting findings to leadership and operations, collaborating with team members and third-party vendors, and assisting in the deployment and testing of patches or fixes.

The project will also involve supporting on-air systems integration and providing 24x7 on-air systems support.

Profile Descriptions:

Responsibilities:

Investigate issues within broadcast playout systems and their integration points to find the root cause of problems or systemic issues.

As a Level 2 resource, drive and own investigations related to Broadcast issues and report back findings in a timely manner to leadership and operations.

Follow up with team members & 3rd party vendors if issues found cannot be solved and drive vendors for root cause and solutions if possible.

Create comprehensive documentation outlining the intricacies of encountered issues, elucidating the root cause and steps for effective issue resolution.

Assist in the deployment and testing of patches or fixes from vendors both in the Development environment as well as the Production environment until completion and to the satisfaction of the Operations team.

Assist in the design, analysis, or evaluation of assigned projects using sound engineering principles and adhering to business standards, practices, procedures, and product/program requirements.

Support and participate in On-air systems integration and on-air rollout.

Provide 24x7 On-Air systems support and daily operations support; some on-call support may be required from time to time during on-air rollout and special broadcast events.

Attend daily maintenance and operations review calls to report back to leadership and Operations on findings from new and open issues and their potential fixes and planned deployments of those fixes.


Requirements:

A passion for investigating issues, driving towards resolutions, and effective problem solving

3+ years of DevOps/SRE experience in the technology sector delivering production-quality software or software-defined infrastructure in a high-traffic environment run on a “cloud hosting” environment (AWS preferred)

2+ years of experience in a support/analysis role

Experience with deployment automation within AWS-hosted services

Familiarity with containerization and orchestration services such as Kubernetes and Docker

Familiarity with CI/CD orchestration tools (e.g., GitHub Actions, or Jenkins)

Experience with CI/CD build and deployment practices

5+ years of Linux System Administration

5+ years experience coding in Go, Python, Ruby, Java, or shell languages

Experience in designing, analyzing, and building automation and tools for large-scale systems

Professional experience using modern log/metric aggregation software (e.g. Stackdriver, Cloudwatch, Datadog, Elasticsearch + Kibana, Splunk, Grafana)

Experience and comfort with continuous delivery/frequent releases of code to production

A methodical and logical approach to reasoning about problems and system interactivity

Willingness and ability to prioritize business needs to meet short-term demands

An unwillingness to tolerate user-facing downtime

Qualification:

  • 3+ years of experience in the Media & Entertainment industry

  • 3+ years of experience in 24x7 production environments

  • 3+ years of experience supporting IT/Broadcast Systems

  • 5+ years of customer-facing experience