SRE Architect
Incedo Inc. - Austin, TX
Apply NowJob Description
Job Description: SRE ArchitectLocation: Austin, TX (Hybrid) Employment Type: Full-Time Experience Level: ArchitectRole OverviewWe are seeking an experienced Site Reliability Engineer (SRE) Architect to design, build, and scale highly reliable, resilient, and observable systems. This role is ideal for a hands-on architect who can define SRE strategy, influence engineering practices, and partner closely with development, platform, and security teams.The position requires onsite or hybrid presence in Austin, TX, with collaboration across distributed teams.Key ResponsibilitiesArchitecture & ReliabilityDefine and own the SRE architecture strategy, including reliability, availability, scalability, and performance standards.Design resilient, fault-tolerant systems for cloud-native and hybrid environmentsEstablish and govern SLIs, SLOs, and error budgets across platforms and services.Lead capacity planning, resilience testing, and chaos engineering initiativesPlatform & Cloud EngineeringArchitect and operate platforms on AWS/GCP/Azure (multi-cloud or hybrid setups).Design and manage Kubernetes-based platforms (EKS/GKE/AKS).Drive Infrastructure as Code (IaC) practices using Terraform, Ansible, or similar tools.Standardize environments, deployment patterns, and runtime configurations.Operational ExcellenceBuild and maintain observability frameworks using tools such as Prometheus, Grafana, Datadog, ELK, Splunk, or equivalent.Lead incident management, root cause analysis (RCA), and post-incident reviews.Reduce MTTR through automation, tooling, and process improvements.Participate in and improve on-call models, escalation policies, and runbooks.DevOps & AutomationPartner with engineering teams to embed CI/CD best practicesDrive automation across provisioning, deployments, testing, and operations.Improve system reliability by eliminating manual operational toil.Security & GovernanceArchitect secure platforms aligned with enterprise security standards.Implement best practices for secrets management, access control, compliance, and auditsCollaborate with Security and Compliance teams on governance models.Leadership & CollaborationAct as a technical mentor and thought leader within SRE and platform teams.Influence engineering culture toward reliability-focused design.Partner with product, application, and infrastructure teams to deliver business outcomes.Required Qualifications15+ years of experience in SRE, DevOps, Platform Engineering, or Systems ArchitectureStrong experience designing and operating large-scale distributed systemsDeep hands-on expertise with cloud platforms (AWS/GCP/Azure)Advanced experience with Kubernetes and containerized workloadsStrong knowledge of Linux internals, networking, storage, and system performanceProven experience implementing IaC and configuration managementProficiency in one or more programming/scripting languages (Python, Go, Bash, etc.).Strong understanding of observability, monitoring, and alerting strategiesExcellent communication and stakeholder management skills.Preferred QualificationsExperience in multi-cloud or regulated environmentsBackground supporting high-throughput, high-availability, or data-intensive systemsExperience with Kafka, Spark, or large-scale data platformsExposure to fintech, healthcare, enterprise SaaS, or hyperscale platformsPrior experience as Principal Engineer, Architect, or Lead SREWork ModelHybrid / Onsite role based in Austin, TXRequires regular collaboration with local and global teamsWhy Join UsArchitect systems at enterprise scaleInfluence platform and reliability strategy across teamsWork with modern cloud-native technologiesHigh-impact role with strong visibility and ownership
Created: 2026-05-09