About the job Senior Infrastructure Engineer
Position Overview
We are seeking a Senior Infrastructure Engineer to join our team and take ownership of the company's infrastructure systems. As the inaugural infrastructure hire, you will play a pivotal role in designing, implementing, and maintaining the technical foundation that supports our business. This is a hands-on position with significant responsibility, working closely with engineering and product teams to ensure our systems are secure, scalable, and highly performant.
As the organization grows, this role will evolve to include defining and shaping the structure of our infrastructure team and establishing best practices across the company.
Key Responsibilities
Infrastructure Management
- Design, provision, and maintain AWS resources using Infrastructure-as-Code tools such as Terraform or CloudFormation.
- Manage compute, storage, databases, networking, and IAM policies, including EC2, ECS/EKS, S3, EFS, RDS, DynamoDB, VPCs, and Load Balancers.
Containerization & Orchestration
- Build, optimize, and maintain Docker images and registries.
- Operate container clusters (ECS, EKS, Kubernetes) and manage deployments using Helm or similar tools.
CI/CD & Release Engineering
- Architect, maintain, and optimize end-to-end CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, Bitbucket Pipelines).
- Automate build, test, security scanning, and deployment processes, including blue/green and canary releases.
- Enforce branch-and-release policies, code reviews, and pipeline governance.
Monitoring, Logging & Reliability
- Implement and tune observability and monitoring systems (CloudWatch, Prometheus, Grafana, ELK/EFK, Datadog, Sentry).
- Define and track SLIs/SLOs, error budgets, and key operational metrics.
- Participate in on-call rotations, incident response, and post-incident reviews; maintain runbooks and operational documentation.
Security & Compliance
- Manage secrets and credentials using AWS Secrets Manager, HashiCorp Vault, or Parameter Store.
- Apply security best practices to networks, hosts, and containers.
- Conduct regular security assessments and remediate vulnerabilities promptly.
Collaboration & Process Improvement
- Collaborate with development teams to resolve production issues and optimize deployment and feedback processes.
- Document architecture, operational procedures, and standard operating practices.
- Identify opportunities to improve cost-efficiency, performance, reliability, and workflow processes.
Required Qualifications
- Minimum of 4 years of experience in Infrastructure Engineering, DevOps, or Site Reliability Engineering.
- Proven hands-on experience with AWS production workloads, including EC2, RDS, S3, VPC, and IAM.
- Strong expertise in Docker and container orchestration platforms (ECS, EKS, Kubernetes).
- Proficient in CI/CD tools and pipelines (GitHub Actions, GitLab CI, Jenkins, Bitbucket Pipelines).
- Experience with Infrastructure-as-Code tools (Terraform, CloudFormation, Pulumi).
- Familiarity with monitoring, logging, and observability systems (CloudWatch, Grafana, Prometheus, ELK, Datadog, Sentry).
- Strong security awareness, including IAM policies, secrets management, and vulnerability mitigation.
- Excellent written and verbal communication skills with the ability to clearly articulate complex technical concepts.
- Demonstrated ownership of infrastructure projects, from design through post-incident analysis.
Preferred Competencies
- Experience scaling infrastructure in early-stage or high-growth environments.
- Familiarity with Kubernetes operators and custom resource definitions.
- Networking knowledge, including VPC peering, Transit Gateways, VPNs, and service meshes.
- Experience with AWS cost-optimization strategies and tools.
- Prior experience with on-call and incident management responsibilities.
Note: Hybrid Option available for candidates residing in remote locations