Systems Administrator 2 - 3024429
Apex Systems, Inc. - Redmond, WA
Apply NowJob Description
Job#: 3024429Job Description:u00a0Systems Administrator 2 -- High-Performance Computing (HPC)Location: Redmond, WA (Hybrid -- Onsite 3x/week)Contract RoleOverviewWe are seeking a Systems Administrator 2 with strong Linux and automation experience to support the design, deployment, and ongoing operations of high-performance compute (HPC) clusters used by Microsoft's Quantum Computing research teams. This role ensures our researchers have a secure, compliant, highly available, and high-performance environment for running advanced simulations and workloads.You will work hands-on with compute, storage, and networking infrastructure; develop automation for cluster lifecycle management; and collaborate closely with engineering, security, and research partners. This position is ideal for someone who thrives in distributed Linux environments, enjoys solving complex systems problems, and wants to contribute directly to cutting-edge quantum research.Key Responsibilities- Build, deploy, and maintain HPC cluster infrastructure, including compute nodes, storage systems, and networking components.- Develop and operate automation for cluster deployment, configuration, scaling, and lifecycle management.- Diagnose and resolve platform-level issues affecting reliability, performance, or workload execution.- Participate in the full DevOps lifecycle, including code development, code review, testing, and production operations.- Validate HPC platforms for readiness, security compliance, and internal customer use.- Maintain accurate and comprehensive documentation for architecture, deployment processes, and operational procedures.- Collaborate with researchers to troubleshoot issues related to running simulations and workloads on HPC platforms.- Support a major migration of HPC infrastructure from a general Microsoft corporate tenant to a custom Quantum tenant to enhance security and isolation.Typical Day in the RoleA typical day involves a blend of hands-on systems work and cross-team collaboration. You may be deploying new compute nodes, writing automation to streamline cluster configuration, debugging performance issues affecting research workloads, or validating new platform capabilities for compliance and readiness. You will regularly interact with researchers to ensure their simulations run smoothly and with engineering partners to maintain a secure, stable, and scalable HPC environment.Required Qualifications- Bachelor's degree in Computer Science, Computer Engineering, or a related technical field.- 2+ years of Linux systems administration experience in production, lab, or research computing environments.- 2+ years of experience with automation tools such as Python, Ansible, or Terraform.- 2+ years of experience supporting distributed, multi-user systems.- Strong proficiency with the Linux terminal and command-line tooling.- Experience troubleshooting performance, reliability, or configuration issues in production or pre-production systems.- Experience writing scripts or tools for automation, diagnostics, or operational workflows.- Ability to learn and operate within existing platforms and processes while contributing to long-term improvements.Disqualifier: Candidates without hands-on Linux administration experience will not be considered.Preferred / Beneficial SkillsExperience with high-performance computing (HPC) as a user or developer of parallel/accelerated applications.Familiarity with HPC schedulers such as Slurm.Exposure to HPC offerings in Azure.
Created: 2026-03-07