About the job Senior DevOps Engineer / Site Reliability Engineer (SRE)
Senior DevOps Engineer / Site Reliability Engineer (SRE)
About the Role
We're hiring a Senior DevOps Engineer / Site Reliability Engineer (SRE) to help architect and scale the unified global operations platform behind a fast-growing cross-border e-commerce SaaS company.
This is a newly created, high-impact role supporting the company's North America engineering operations. You'll work closely with technical leadership, platform experts, and executive leadership to build resilient infrastructure, modern observability systems, intelligent automation, and scalable cloud-native operations.
The environment is highly collaborative, technically ambitious, and focused on long-term platform reliability and modernization.
This is a remote-first opportunity open to candidates located in the United States.
About the Company
The company is a leading B2B SaaS platform serving the cross-border e-commerce industry.
As the business continues expanding globally, the engineering organization is investing heavily in modern DevOps, platform engineering, and reliability infrastructure to support increasing scale, automation, and operational complexity.
The company operates in a fast-moving international environment with close collaboration across North America and overseas engineering teams.
What You'll Do
- Design, build, and maintain unified operations and platform management systems
- Develop infrastructure supporting resource management, monitoring, alerting, configuration management, and automated operations
- Build and operate observability platforms and CI/CD pipelines
- Develop self-healing systems and automated incident response capabilities
- Establish DevOps standards, tooling strategies, and engineering best practices
- Support engineering and product teams with platform-level technical expertise
- Lead infrastructure modernization and architecture improvement initiatives
- Reduce technical debt and improve operational reliability
- Promote SRE principles and reliability engineering practices across teams
- Conduct technical research and evaluate emerging cloud-native technologies
- Drive continuous improvement across DevOps and platform engineering workflows
Required Qualifications
- Currently based in California or North Carolina
- US Citizen or Green Card holder (no sponsorship available)
- Fluent in Mandarin Chinese for day-to-day collaboration with overseas engineering teams
- Bachelor's degree in Computer Science or related field
- 4–6 years of experience in DevOps, SRE, or Platform Engineering
- Strong experience with AWS, Azure, or GCP
- Deep understanding of cloud infrastructure including VPC, EC2, Kubernetes/EKS, RDS, and IAM
- Strong Linux systems and networking knowledge
- Experience with Docker, Kubernetes, load balancing, and service governance
- Experience with Infrastructure as Code tools such as Terraform, Ansible, and Helm
- Experience building CI/CD pipelines using Jenkins, Argo CD, CodeBuild, or similar tools
- Experience with observability and monitoring platforms including Prometheus, Grafana, ELK, and OpenTelemetry
- Proficiency in at least one scripting or programming language such as Python, Shell, or Go
- Strong troubleshooting, systems analysis, and problem-solving skills
- Strong cross-functional communication and collaboration abilities
Preferred Qualifications
- Master's degree in Computer Science or related field
- Experience supporting global or multi-cloud platforms
- Experience leading observability, self-healing, or platform modernization initiatives
- Experience with service mesh, chaos engineering, or capacity planning
- Go development experience
- Strong track record improving system reliability, automation, and operational efficiency
- Experience collaborating across international and cross-cultural engineering teams
- Self-driven mindset with strong technical leadership and knowledge-sharing abilities
Compensation & Benefits
Compensation
- Base Salary: $140,000 – $160,000 USD
- Exceptional candidates may receive compensation above the posted range
Benefits
- 401(k) with dollar-for-dollar match up to 4%
- Medical insurance
- 12 days PTO annually
Work Environment
- Remote-first work environment
- Home base available in Silicon Valley, CA or Raleigh, NC
- Standard Monday–Friday schedule
- No business travel required
- Immediate hiring need
Why This Role Stands Out
This is an opportunity to help modernize and scale the operational backbone of a global SaaS platform serving a rapidly growing international market. You'll work on cloud-native infrastructure, automation, observability, and reliability engineering initiatives with significant visibility and ownership across the organization.