Senior Site Reliability Engineer (SRE) – Automation & Observability

Montreal, QC, Canada

Job Openings Senior Site Reliability Engineer (SRE) – Automation & Observability

About the job Senior Site Reliability Engineer (SRE) – Automation & Observability

Tech Talent International (SI) supplies technical talent to a variety of clients ranging from Fortune 100/500/1000 companies to small and mid-sized organizations in Canada/US and Europe.

We currently have a role as a Senior Site Reliability Engineer (SRE) – Automation & Observability with our large consulting client, working onsite at a major financial services client in the downtown Montreal area

Role: Cybersecurity - Senior Site Reliability Engineer (SRE) – Automation & Observability

Type: Permanent or Contract 40 hrs/week

Location: Hybrid - Downtown Montreal, QC -(roles starts off 5 days in office for 1st 3 months, then turns into hybrid setup 3 days onsite, 2 days from home)

Salary: $110,000 - $120,000 + 9% bonus + 3-5 weeks paid vacation + RRSP contribution + benefits + sick/personal days

Position Overview

The Automation team consists of several Subject Matter Experts (SMEs) who assist the Global Process Owner in designing, building, and maintaining the organization's IT services. While leading the company's IT services team, the IT Service Manager strives to develop reliable IT services and improve the organization's existing IT service infrastructure.

IT Service Managers are responsible for maintaining a high standard of service delivery while managing the organization's IT services and anticipating and resolving issues that may arise within company systems or client environments. These services include infrastructure monitoring, task automation, server asset management, and network inventory management.

Change, incident, problem, and request management, along with CMDB (Configuration Management Database) functions, are core services widely used throughout CIB IT. The ITSM team serves as the bridge between IT and business stakeholders, ensuring coordination and predictability for CIB IT and its business operations.

The team includes SMEs focused on key service areas as directed by management, with the objective of delivering high-quality services through various platforms that maximize efficiency and consistent results.

Within the Automation & Observability organization, the Production Smart Automation team provides production support services for the Analytics Consulting and Digital Assets IT clusters. This includes both functional and technical support as well as project delivery for production and non-production platforms. The team operates globally and consists of approximately 10 members located in Paris, Warsaw, Mumbai, and Montreal.

Key Responsibilities

The Site Reliability Engineer (SRE) will be part of a multidisciplinary team providing Level 1 and Level 2 technical and project support. This is a production-focused role requiring a broad range of technical expertise.

The SRE will work closely with development and infrastructure teams to:

Monitor, manage, and proactively improve the availability and performance of production environments, from presentation and application layers through infrastructure layers.
Plan and implement application deployments, load testing activities, and configuration changes.
Ensure production environments are operational and available while collaborating with teams to understand user needs.
Contribute to medium- and large-scale technical projects, including architecture reviews, solution design, application upgrades, and migrations to new platforms.
Collaborate on prioritized tasks while providing regular status updates and maintaining focus on target solutions.
Understand delivery lifecycle phases to ensure work is completed according to defined specifications and timelines.
Identify opportunities to improve operational efficiency and contribute to automation initiatives.
Provide constructive feedback and recommendations to management regarding performance, capacity, and system design.
Assist in documenting architectures and designs, as well as distributing meeting minutes and action items.

The SRE will also work with other teams to respond to incidents and resolve issues quickly, often under pressure, in order to restore normal business services. As a result, participation in on-call rotations and after-hours support may be required.

Candidates should possess both the aptitude and desire to learn new technologies and contribute innovative ideas that may benefit the department.

Requirements

Candidates should have:

5–7 years of experience in a similar role.
Experience providing multidisciplinary technical support within a team environment.
Practical knowledge of performance and capacity management across:
- Applications
- Databases
- Networks
Strong automation skills and mindset.

Skills & Competencies

Systems Administration

Strong Linux/Unix administration skills
Good knowledge of Windows environments

Containerization & Cloud

Strong knowledge of Docker and Kubernetes
Understanding of cloud-based platforms and solutions

Infrastructure & Networking

Good understanding of enterprise infrastructure, firewalls, and networking concepts
Knowledge of load-balancing technologies
Strong understanding of networking fundamentals

Security

Experience with APIs
Familiarity with CyberArk or HashiCorp Vault

Databases

Experience with SQL Server
Experience with Oracle
Exposure to NoSQL databases

Monitoring & Observability

Experience configuring application monitoring tools such as Dynatrace

DevOps & CI/CD

Experience with:

Jenkins
Bitbucket
Artifactory
Ansible
ArgoCD

Development & Automation

Knowledge of software development and scripting methodologies
Demonstrated programming ability in languages such as Python

IT Service Management

Good understanding of ITIL processes
Understanding of user and server authentication mechanisms that enable automated deployment cycles while maintaining strong security controls

Personal Attributes

Strong problem-solving abilities
Team-oriented mindset
Customer-focused approach

Or refer someone