Senior Network Reliability Engineer
MSCCN - San Francisco, CA
Apply NowJob Description
Job Description Insight Global is seeking a Network Engineer u2013 Reliability & Observability to support the quality, reliability, and lifecycle performance of large-scale AI network infrastructure. This role serves as a reliability engineering leader, responsible for building processes, data collection frameworks, and reliability metrics to improve network performance from initial deployment through ongoing operations. This position focuses on developing scalable processes, systems, tooling, and data pipelines that drive network observability and reliability. You will deliver automated 24x7 metrics as well as periodic reliability reporting for both internal stakeholders and external customers, ensuring visibility into network health, performance, and risk. This role is well-suited for experienced network operators who are passionate about reliability engineering and full-lifecycle software development, including quality assurance audits, circuit audits, periodic inspections, failure rate tracking, and root cause analysis. Ideal candidates bring a strong interest in both hardware (electronics and optics) and software development, and consistently leverage data to guide deployment decisions, operational improvements, and strategic sourcing. Experienced Site Reliability Engineers (SREs) with a strong networking background and a focus on observability and reliability are strongly encouraged to apply. We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: Skills and Requirements u2022 5u20138+ yearsu2019 experience in large scale / hyperscale network operations u2022 Strong background as an SRE, NRE, or Network Engineer with software focus u2022 Experience building software for observability (metrics, data stores, dashboards) u2022 Strong automation and coding skills (Python, SQL; others acceptable) u2022 Ability to analyze data and say u201cthis is what we need to measureu201d u2022 Excellent communication skills u2014 can present technical insights to executives u2022 Experience working with routers, switches, interface cards u2022 Data science or analytics experience applied to infrastructure u2022 Experience with network reliability, performance analysis, or capacity planning u2022 Familiarity with modern observability stacks and custom dashboards u2022 Background in deployment, operations, or repair environments u2022 Prior titles such as NRE, Network SRE, Network Architect
Created: 2026-04-04