REMOTE Site Reliability Engineer (SRE)
Insight Global - Brookfield, WI
Apply NowJob Description
Job Description Weu2019re looking for a REMOTE SRE who has a software engineering backgroundu2014someone who can drop into ongoing projects, quickly mesh with cross-functional teams, and drive reliability outcomes with strong procedural and systems thinking. This is a backfill aimed at stabilizing and improving production systems and delivery practices. Youu2019ll focus on SaaS services, reliability engineering, observability, and pragmatic automation. The right person writes clean, tested code, reasons about distributed systems, and applies software engineering discipline to operational problems. Key Responsibilities: u2022u2003Embed with product and platform teams to own reliability for key services; come in and u201crun withu201d active projects. u2022u2003Define and drive SLOs/SLAs/SLIs; implement actionable alerting and dashboards (primary: Datadog). u2022u2003Automate reliability work (deployment, scaling, failover, incident workflows) using code-first approaches. u2022u2003Author infrastructure as code (primarily Terraform) and collaborate on Docker/Kubernetes workflows. u2022u2003Instrument services (.NET primary stack; Python/Rust for tooling; Java is a plus) for observability and performance. u2022u2003Own incidents end-to-end: triage, root cause, postmortems, and preventative engineering. u2022u2003Apply systems thinking to reduce complexity, improve resilience, and increase change velocity safely. u2022u2003Partner with security and cloud teams on guardrails, least-privilege, and cross-cloud considerations. u2022u2003Write stories and technical docs that clarify problems, solutions, and acceptance criteria. u2022u2003Continuously improve reliability patterns, runbooks, and automation pipelines. We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: Skills and Requirements u2022u2003Proven SRE experience (3+ years minimum at mid-staff level) owning reliability for production systems. u2022u2003Software engineering background with strong procedural thinking; youu2019ve shipped production code. u2022u2003Proficient in scripting languages such as Python, Bash, or similar u2022u2003.NET expertise as the primary skillset (services, APIs, performance, instrumentation). u2022u2003Datadog hands-on experience (dashboards, monitors, logs, APM, alerting). u2022u2003AWS foundational knowledge (you donu2019t need a pro cert; you can reason about core services and IAM). u2022u2003Infrastructure as Code with Terraform (modules, state, environments). u2022u2003Practical knowledge of Docker and Kubernetes (how it works, how to debug and operate). u2022u2003Familiarity with SQL/Postgres (querying, performance basics). u2022u2003Continued education and/or advanced degree(s) in Computer Science, Information Technology, or a related field u2022u2003AWS certifications (such as AWS Certified Solutions Architect, AWS Certified Database - Specialty, or AWS Certified Security - Specialty) u2022u2003Ability to understand and refactor complex legacy software u2022u2003Experience in environments subject to HIPAA and/or PCI regulations u2022u2003Professional experience with project lifecycle planning such as Agile/Scrum u2022u2003Comfortable with Atlassian software suite (Jira, Confluence, and OpsGenie) u2022u2003Experience with Rust u2022u2003AWS Glue u2022u2003AWS Neptune or other AWS purpose-built databases
Created: 2026-01-14