Job Description:
As part of the team responsible for maintaining high availability systems, you will be responsible for monitoring and managing system uptime, identifying and resolving issues in a timely manner, and implementing solutions to improve system reliability and performance. This will require a deep understanding of system architecture and the ability to troubleshoot and debug complex issues.
You will also be involved in designing and implementing disaster recovery plans, ensuring that the systems can withstand unexpected failures and continue to operate smoothly. This will involve working closely with the infrastructure team to ensure proper backups, failover mechanisms, and other measures are in place.
In addition, you will be responsible for continuously monitoring system performance and implementing proactive measures to prevent downtime and improve overall system health. This will involve utilizing monitoring tools, analyzing data, and making recommendations for system improvements.
If you have a passion for building and maintaining robust and reliable systems, a strong understanding of software development principles, and the ability to thrive in a fast-paced and dynamic environment, we want to hear from you. Join us and be a key player in ensuring our systems are always available to meet the needs of our customers.
Tech stack: Pytho, Django and AWS