Site Reliability Engineer
AceStack - Chicago, IL
Apply NowJob Description
Job Title: Site Reliability Engineer Location: Chicago, IL FTE Only Job Description Must Have Technical/Functional Skills We are looking for a Senior Site Reliability Engineer (SRE) with deep experience in AWS infrastructure, automation, observability, and production support. As an SRE, you will ensure our cloud-native systems are resilient, scalable, and efficient, driving reliability through code, not just processes. 5+ years of experience in SRE, DevOps, or Cloud Engineering Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.) Hands-on experience with Terraform, Ansible, or other IaC tools Strong scripting/coding skills (Python, Go, Shell, etc.) Experience with Kubernetes, containerization, and orchestration Deep knowledge of Linux systems and networking Experience with Service Meshes (e.g., Istio, App Mesh) Familiarity with AWS Well-Architected Framework Experience building self-healing systems and automated remediation Background in security, compliance, or multi-account/multi-region AWS architectures Roles & Responsibilities Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processes Optimize systems for cost, performance, and reliability Drive chaos engineering and resilience testing Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets Mentor junior SREs and promote DevOps/SRE culture across the org
Created: 2026-04-02