About the job Site Reliability Engineer
Company Description
Aqilea is an IT and engineering consulting partner that helps companies get more out of their technology and operations. With teams in Stockholm and Bangalore, we work closely with our clients to build solutions that fit their needs - from software development, AI and infrastructure engineering to industrial automation and embedded systems.
We combine strong technical expertise with a practical, business-focused approach to help organizations modernize, improve security, and scale with confidence. Above all, we focus on long-term partnerships built on trust, quality, and real results.
With us, you have great opportunities to take real steps in your career and the opportunity to take great responsibility.
About the Role
Company: Aqilea India
Role : Site Reliability Engineer(SRE)
Exp : 5 to 10 years
Location : Bangalore(WFO)
Job description:
- Work in a cross functional team working with Reliability as Expertise in a product or a product area.
- Apply Reliability engineering practices with support from SRE governance teams.
- Ensure delivery quality and supply KPI reporting.
- Collaborate closely within product teams to ensure predictable operations and minimal disruptions to Production.
- Collaborate closely within your Capability, share best practices as well as discuss and improve on operations ways of working.
- Work together in a cross-functional product team to monitor, manage, and resolve issues of the supported applications.
- Technical analysis, troubleshooting of complex issues/Incidents in production.
- Improve monitoring performance by focusing on preventive measures.
- Product Improvements (code & log analysis).
- Continuous improvement on proactive monitoring, housekeeping automation to proactively detect and avoid incidents.
- Ensure environment stability and reliability.
- Automate processes impacting development and production leveraging tools and building scripted solutions.
- Participate in On-Call technical support to resolve business critical incidents
- 5+ years of experience in Site Reliability Engineering, maintenance & operations and/or development.
- Strong working experience eCommerce.
- Strong working experience in DevOps practices (automated testing, CI/CD etc.).
- Experience within solutions architecture and how to fast pinpoint causes of issues.
- Experience from working with API-based frameworks (e.g., Commerce tools or Fabric is ideal).
- Experience from ITIL support processes and ITSM tools (e.g., ServiceNow) in a microservices context.
- Familiarity with common tech stacks in Headless Ecommerce is a nice to have.
- Experience of maintaining/supporting and/or developing desktop and mobile applications.
- Knowledge of design principles and fundamentals of solutions architecture is a plus.
- Understanding of performance engineering (Application Reliability).
- Knowledge of multiple front-end languages and libraries (ReactJS, React Native, NodeJS).
- Experience in building CI/CD workflows using GitHub Actions.
- Knowledge of Azure DevOps and/or other cloud environments is nice to have.
- Experience working on cloud-based infrastructure e.g., Azure and GCP.
- Experience in provisioning Infra resources leveraging Infra as Code (Terraform / Ansible).
- A passion for problem solving with strong analytical capabilities.
- Stay current on technical trends to suggest innovative tools and approaches to interesting problems.
- Know at least one of {Python, Ruby, Java, C#, Go} at an intermediate level.
- Experience in monitoring tools (Splunk, Grafana etc.).
- Experience working through SRE Metrics such as SLI, SLO and Error Budget.
- Experience with managed cloud Kubernetes services (e.g. AKS, GKE).
Start: Immediate to 15 Days
Location: Bangalore (Hybrid)