About the job Senior Site Reliability Engineer (SRE) – Automation & Observability
Tech Talent International (SI) supplies technical talent to a variety of clients ranging from Fortune 100/500/1000 companies to small and mid-sized organizations in Canada/US and Europe.
We currently have a role as a Senior Site Reliability Engineer (SRE) – Automation & Observability with our large consulting client, working onsite at a major financial services client in the downtown Montreal area
Role: Cybersecurity - Senior Site Reliability Engineer (SRE) – Automation & Observability
Type: Permanent or Contract 40 hrs/week
Location: Hybrid - Downtown Montreal, QC -(roles starts off 5 days in office for 1st 3 months, then turns into hybrid setup 3 days onsite, 2 days from home)
Salary: $110,000 - $120,000 + 9% bonus + 3-5 weeks paid vacation + RRSP contribution + benefits + sick/personal days
Position Overview
The Automation team consists of several Subject Matter Experts (SMEs) who assist the Global Process Owner in designing, building, and maintaining the organization's IT services. While leading the company's IT services team, the IT Service Manager strives to develop reliable IT services and improve the organization's existing IT service infrastructure.
IT Service Managers are responsible for maintaining a high standard of service delivery while managing the organization's IT services and anticipating and resolving issues that may arise within company systems or client environments. These services include infrastructure monitoring, task automation, server asset management, and network inventory management.
Change, incident, problem, and request management, along with CMDB (Configuration Management Database) functions, are core services widely used throughout CIB IT. The ITSM team serves as the bridge between IT and business stakeholders, ensuring coordination and predictability for CIB IT and its business operations.
The team includes SMEs focused on key service areas as directed by management, with the objective of delivering high-quality services through various platforms that maximize efficiency and consistent results.
Within the Automation & Observability organization, the Production Smart Automation team provides production support services for the Analytics Consulting and Digital Assets IT clusters. This includes both functional and technical support as well as project delivery for production and non-production platforms. The team operates globally and consists of approximately 10 members located in Paris, Warsaw, Mumbai, and Montreal.
Key Responsibilities
The Site Reliability Engineer (SRE) will be part of a multidisciplinary team providing Level 1 and Level 2 technical and project support. This is a production-focused role requiring a broad range of technical expertise.
The SRE will work closely with development and infrastructure teams to:
- Monitor, manage, and proactively improve the availability and performance of production environments, from presentation and application layers through infrastructure layers.
- Plan and implement application deployments, load testing activities, and configuration changes.
- Ensure production environments are operational and available while collaborating with teams to understand user needs.
- Contribute to medium- and large-scale technical projects, including architecture reviews, solution design, application upgrades, and migrations to new platforms.
- Collaborate on prioritized tasks while providing regular status updates and maintaining focus on target solutions.
- Understand delivery lifecycle phases to ensure work is completed according to defined specifications and timelines.
- Identify opportunities to improve operational efficiency and contribute to automation initiatives.
- Provide constructive feedback and recommendations to management regarding performance, capacity, and system design.
- Assist in documenting architectures and designs, as well as distributing meeting minutes and action items.
The SRE will also work with other teams to respond to incidents and resolve issues quickly, often under pressure, in order to restore normal business services. As a result, participation in on-call rotations and after-hours support may be required.
Candidates should possess both the aptitude and desire to learn new technologies and contribute innovative ideas that may benefit the department.
Requirements
Candidates should have:
- 5–7 years of experience in a similar role.
- Experience providing multidisciplinary technical support within a team environment.
-
Practical knowledge of performance and capacity management across:
- Applications
- Databases
- Networks
- Strong automation skills and mindset.
Skills & Competencies
Systems Administration
- Strong Linux/Unix administration skills
- Good knowledge of Windows environments
Containerization & Cloud
- Strong knowledge of Docker and Kubernetes
- Understanding of cloud-based platforms and solutions
Infrastructure & Networking
- Good understanding of enterprise infrastructure, firewalls, and networking concepts
- Knowledge of load-balancing technologies
- Strong understanding of networking fundamentals
Security
- Experience with APIs
- Familiarity with CyberArk or HashiCorp Vault
Databases
- Experience with SQL Server
- Experience with Oracle
- Exposure to NoSQL databases
Monitoring & Observability
- Experience configuring application monitoring tools such as Dynatrace
DevOps & CI/CD
Experience with:
- Jenkins
- Bitbucket
- Artifactory
- Ansible
- ArgoCD
Development & Automation
- Knowledge of software development and scripting methodologies
- Demonstrated programming ability in languages such as Python
IT Service Management
- Good understanding of ITIL processes
- Understanding of user and server authentication mechanisms that enable automated deployment cycles while maintaining strong security controls
Personal Attributes
- Strong problem-solving abilities
- Team-oriented mindset
- Customer-focused approach