Sr. SRE, Compute Infrastructure
ZipRecruiter - Boston, MA
Apply NowJob Description
OverviewSenior Site Reliability Engineer – Compute InfrastructureLocation: Boston, MA (Hybrid – Tues–Fri Onsite | Mondays Remote)Compensation: $134,250 – $214,800 + Bonus + Equity + Full BenefitsWe are representing a cutting-edge technology company that is seeking a Senior Site Reliability Engineer (SRE) to join their global infrastructure team. In this role, you/'ll play a critical part in scaling and optimizing the organization/'s cloud- Kubernetes platform—the backbone for internal engineering teams delivering high-impact applications and services.This role is ideal for an SRE who thrives in complex distributed environments, is passionate about developer enablement, and enjoys building robust systems that balance performance, reliability, and scalability.Why You Should ApplyYou/'ll work on global, mission-critical systems running on modern cloud infrastructureHigh autonomy in a fast-paced, high-impact engineering environmentOpportunity to shape SRE best practices across the orgHybrid work culture that values face-to-face collaboration and innovationWhat You/'ll DoArchitect and scale cloud- Kubernetes infrastructure to support internal engineering workflowsDevelop tools and platforms that empower product and infrastructure teams to deploy and manage services rapidly and securelyWrite clean, efficient, and maintainable code in such as Python, Go, C#, or JavaUse Infrastructure as Code (IaC) tools like Terraform or Pulumi to provision and manage cloud resourcesEnhance observability and alerting systems using APM, metrics, and log aggregation toolsPartner with developers to optimize CI/CD pipelines and ensure smooth software delivery lifecyclesProvide strong documentation to promote self-service and onboarding across engineeringContinually assess and improve platform reliability, operability, and cost-efficiencyContribute to system design reviews and mentor junior engineers on cloud- best practicesWhat You Bring7+ years of experience in Platform Engineering or Site Reliability EngineeringProven experience managing Kubernetes platforms at scale (e.g., AKS, EKS, or GKE)Strong programming experience in Python, Go, C#, Java, or similarDeep understanding of cloud platforms like AWS or AzureExperience with ArgoCD, GitHub Actions, or similar CI/CD toolsProficiency with observability tooling (Datadog, Prometheus, Grafana, etc.)Expertise in networking, security protocols, and container orchestrationFamiliarity with communication protocols such as SPI, UART, RS485, and modern interfaces like TLS, X.509, etc.Experience building testable, scalable IaC modules and managing multi-environment deploymentsStrong collaboration and documentation habits in cross-functional teamsEmpathy for internal users and a customer-focused mindsetBenefitsCompetitive base salary: $134,250 – $214,800 (based on experience & location)Bonus + equity opportunitiesDiscretionary time off (DTO) policyPaid parental leave for all caregiversMedical, dental, and vision coverageFitness and wellness reimbursementsMental health & professional development supportHybrid workplace with in-office perks (snacks, events, and team-building activities)Note: Compensation and benefits may vary depending on experience level and geographic market. #J-18808-Ljbffr
Created: 2025-09-21