Senior Site Reliability Engineer
DASH2 - Chicago, IL
Apply NowJob Description
OverviewWe are looking for technical team members at all levels who want to push themselves to deliver best in market SaaS solutions. We offer a challenging environment where you will have to grow, adapt and use your skills consistently. Our customers rely on us in the moments that matter. Engineering delivers on that promise. The Senior Site Reliability Engineer – Network is responsible for ensuring the networks in our SaaS products are fast, stable and optimized for our customers. SREs here take on availability, performance, managing change, monitoring, response and are guardians of non-functional requirements. You either have a network infrastructure background with a programmatic, automated mindset or are someone that comes with a software engineering background with extensive network infrastructure experience. The SRE goal is to build automated systems that reduce or eliminate manual work to keep our products up and running and performing optimally. We are looking for someone who thrives on collaboration within the team and across other groups and can operate independently to deliver solutions.ResponsibilitiesChampion and implement a culture of SRE to maintain a reliable and performant network infrastructure in our SaaS productsDesign and implement secure, redundant, fault-tolerant networks in our SaaS products; you understand networking protocols and network elements and how they are integrated together to create resilient, fault-tolerant networks in SaaS productsChoose and configure common network elements in SaaS product network topologies including load balancers, routers, DNS, etc.; provision route tables and routing paths in our SaaS products so development teams do not have toDefine, lead the implementation, and maintain SaaS product network monitoring and alerting to prevent client impacting issues and ensure network availability, performance and scalability to maintain SLOs and SLAsIdentify and remediate issues in SaaS product network infrastructure (high latency, timeouts, dropped connections, etc.) using diagnostic tooling and network traces; perform thorough Root Cause Analysis (RCA); drive vendor partners (Microsoft) to provide quality assurances by requiring immediate defect fixes, software updates, etc., as necessary to ensure an ideal customer experienceServe as a senior escalation point for SaaS product network issues and collaborate with our IT to integrate SaaS products into broader network topologiesAutomate everything including system operational runbooksDive deep into technology and stay on the forefront of the latest network analysis tools, technologies, and strategies; help evaluate, prototype, and integrate them into work processesPerform with broad independence and deliver on project milestones and tasks on schedule while communicating progress regularlyBuild strong relationships with SRE team members and software engineering teams to hold each other accountable to expectationsLearn continuously and apply lessons learnedEvangelize best practices, eliminate bottlenecks, and improve processParticipate in on-call duties 365/24/7 and lead the triage and RCA of production incidentsQualificationsBS in Computer Science or equivalent work experienceThorough understanding of common networking protocols including IP, TCP/IP, ICMP, DNS, DHCP, ARP, SSL, TLS and how to diagnose network issues by isolating problems at the protocol layer within specific network elements5+ years experience with Azure network design and network element configuration including provisioning of routing tables5+ years experience monitoring and preventing issues in SaaS network topologies in Azure5+ years experience implementing network performance, availability, and scalability monitoring and alerting using tooling such as SolarWinds5+ years experience creating automated deployments with tools such as Harness, Azure DevOps, Ansible or Jenkins to manage Infrastructure as Code and software build and deployment in a CI/CD environment5+ years experience as a global admin of Azure including cloud cost management5+ years experience writing scripts in PowerShell or Python/Bash to automate system operations as runbooks for Windows or Linux environments5+ years experience supporting public client facing revenue generating systemsStrong DevOps focus and experience building and deploying Infrastructure as Code with Terraform or similar technologyExperience planning, coordinating, developing and executing all stages of post deployment verification test scriptsExperience securing Windows or Linux systems in 24x7 production environmentExperience with containerization and managing Kubernetes clusters (AKS or EKS) #J-18808-Ljbffr
Created: 2025-09-18