Job Openings Site Reliability Engineer

About the job Site Reliability Engineer

COMPANY PROFILE:

Our client is a Tech Ecommerce Scale-Up that provides a single platform for customers to shop for the best price online. Not only that, they also provide data and insights to customers on latest trends and e-commerce sector.  
They are looking for a Site Reliability Engineers (SREs) who are responsible for keeping all services and production systems running smoothly. SREs ensures that services have reliability, uptime appropriate to users' needs and a fast rate of improvement.
You'll have the opportunity to work on complex challenges of scale, using your experience in coding, algorithms, and analysis

RESPONSIBILITY:

  • Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation and refinement.
  • Collaborate with engineering teams on their infrastructure needs, and advise them throughout the development lifecycle.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, within our Service Level Objectives.
  • Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless post-mortems.
  • Debug production issues across services, databases and levels of the stack.
  • Design, develop and manage monitoring tools to provide performance dashboards, alerts, and collect data required to proactively identify issues and/or recommend improvements.


REQUIREMENTS:

  • 5-8 years of experience in provisioning environments, deploying applications, and maintaining infrastructures.
  • Professional experience using Python, Go, or Ruby.
  • Experience with deployment automation/configuration management tools like Chef, Ansible, Puppet, or Terraform.
  • Experience in cloud-based environment such as AWS, GCP or Azure.
  • Have extensive experience building scalable platforms leveraging containers in a production environment.
  • Added bonus if you have experience in operated distributed data storage systems at scale, especially Elasticsearch and SQL Azure.
  • Solid knowledge of continuous integration, continuous delivery, automated testing and all phases of the software development lifecycle.
  • Experience of working in an agile and multi-cultural environment across many SCRUM teams at the same time.
  • A Kaizen mindset and spirit of continuous improvement on a personal level and always up to date with the latest technology trends professionally.
  • Ability to identify problems before they happen and implement solutions that detect and prevent outages.
  • Expertise in designing, analysing and troubleshooting large-scale distributed systems.
  • Ability to debug, optimize code and automate routine tasks.
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
  • Understanding of CI/CD principles, Linux fundamentals, networking concepts and IP protocols.


HOW TO APPLY:

  • If you're interested, do click apply on the button provided and attach your CV as well. For further information, feel free to speak to Ariff at +6012-9264666 or email him at ariff.w@aislingsearch.com