About the job Site Reliability Engineer (SRE)
Responsibilities:
· Research and develop solutions for improving company platform
· Closely collaborate with the software development team to deliver software
· Engage in and improve the whole lifecycle of services
· Build, operate and maintain the underlying infrastructure
· Maintain services once they are live by measuring, monitoring, and alerting
· Mitigate and prevent the risk of software and library vulnerabilities
· Practice sustainable incident response and blameless postmortems
· This individual will also provide “On Call” support on a scheduled rotation or may be required to work a shift that provides operational support on Saturday and Sunday.
Requirements:
· Ability to working as collaborative and be a leader in the transformation process work better
· Ability to design and build the infrastructure by fault-tolerant, reliability, scalability, and performance
· Ability to professionally use Linux operating systems (e.g., file systems, system calls) and administration or networking (e.g., routing).
· Understanding of Continuous Integration and Continuous Delivery (CICD) software with measurement.
· Understanding of monitoring, logging and tracing for helping our team to find the root cause
· Experience of cloud platform and Kubernetes
· Experience of datastores
· Being a self-improvement
· Be a growth mindset
· Common sense
· (Bonus) Understand and able to test basic security
· (Bonus) Understand of Python and Node.js to work with our team