Senior HPC Linux System Administrator
Leidos - Atlanta, GA
Apply NowJob Description
Description The Public Health and Human Services Operation of Leidos is seeking a Senior HPC Linux System Administrator to lead a team of system administrator professionals in managing a high-performance computing (HPC) infrastructure used by public health researchers and scientists. This senior-level position requires extensive Linux expertise combined with a deep understanding of the specialized hardware, software, and networking required for scientific research and large-scale data analysis. Candidate MUST:be located in the Atlanta, GA area for partial onsite workbe a US Citizen with the ability to obtain a Public Trust ClearanceThe candidate provides secure and always-on infrastructure services, accessed by researchers to customer-sponsored data hosted in an on-premise infrastructure and the cloud, and secure access to the high performance computing resources for scientific researches.High-performance Computing infrastructure management: Deploy, administer, monitor HPC clusters. Manage multi-petabtyes of data using Pure Storage flash memory storage, AWS S3 Glacier.Software and resource management: Install, maintain, and upgrade scientific software, libraries, and batch schedulers such as GridEngine and Slurm. The role also involves developing effective process and solution for sharing resources across multiple research teams.VMware: Manage the VMware vSphere Foundation for virtual server provisioning, deployment, and configuration, as well as hardware and software implementation and maintenance.System Operations: System monitoring, routine and ad hoc security patch management, trouble shooting, performance tuning,Project planning and coordination: Advise customer and Project Manager in designing and documenting technical solutions. Support infrastructure projects, from planning, coordinating team activities, executing planned activities, and providing status update. Communicate and work collaboratively with internal and client team members across the program, provide technical council, and/or alternative designs, solutions, and or processes to leadership.Automation and scripting: Lead automation efforts to streamline system management tasks using scripting languages (Bash, Python) and configuration management tools (Puppet,Ansible).Research collaboration: Work closely with scientists, bioinformatics developers, and principal investigators to understand their computational needs and translate scientific goals into technical configurations. This includes providing technical support to help optimize workflows.System architecture and deployment: Lead the technical design, integration, and optimization of on-site HPC and cloud resources.Mentorship and team coordination: Guide and mentor other system administrators on best practices for system administration and troubleshooting. Some roles involve managing a team of system administrators.Security and compliance: Implement robust security measures, manage access controls, and design architectures that meet compliance standards such as HIPAA or NIST. Support SA&A processDisaster recovery and monitoring: Design and implement backup and disaster recovery plans. Integrate monitoring and alerting systems to ensure system availability and reliability. REQUIRED EDUCATION AND EXPERIENCEA Bachelor's degree in computer science or a related field, plus 10 years of System Administration experience.Requires extensive experience (7+ years) in designing and operating HPC infrastructure. (High performance computing)Linux expertise: Mastery of Linux systems and administration, including troubleshooting, security, performance monitoring, and various distributions (e.g., Red Hat, Ubunut) to support scientific computing.Soft skills: Strong problem-solving and communication skills are critical for collaborating with customers, bioinformatics developers, researchers and leading a team. Experience working with a team to introduce and integrate new technologies and process into existing production environmentsNetwork: Proficiency in working with applicable network devices to include routers and switches, gateways and hubsSecurity: Develop the infrastructure deliverables, continuous diagnostics and mitigation, threat mitigation and incident response, security architecture support, critical infrastructure protection, patch management, vulnerability management, risk management, information assurance, and Security Assessment and Authorization (SA&A) documentation.VMWare: Experienced in managing VM infrastructure.Leadership: Proven leadership in planning, coordinating infrastructure support activities, leading and mentoring system administratorsHPC and cluster management: Proven experience with HPC clusters, job schedulers (Slurm), and high-speed networking (10/40/100Gb)Other technical skills: Proficiency in Bash and Python scripting for automation is essential. Experience with cloud technologies (hybrid-cloud integration) and container environments (e.g., Docker, Singularity, Kubernetes).DESIRED QUALIFICATIONS:A Master’s Degree in in IT, engineering, or other relevant fields.Experience of working at a federal government agency or a research organizationLarge scale infrastructure design and implementation project experienceRed Hat Certified Engineer (RHCE), Red Hat Certified Architect (RHCA), or equivalent certifications.Experience with computer networking protocols including, but not limited to TCP, IP, UDP, HTTP, DHCP, and DNS. Understanding of network design and management - LAN, WAN, and VPN.Experience optimizing Cloud utilization patterns, support development, validation, operations, and security with migration experience from an on-premises model to a hybrid model.AWS or Azure Cloud engineer certificationIf you're looking for comfort, keep scrolling. At Leidos, we outthink, outbuild, and outpace the status quo — because the mission demands it. We're not hiring followers. We're recruiting the ones who disrupt, provoke, and refuse to fail. Step 10 is ancient history. We're already at step 30 — and moving faster than anyone else dares.Original Posting:September 29, 2025For U.S. Positions: While subject to change based on business needs, Leidos reasonably anticipates that this job requisition will remain open for at least 3 days with an anticipated close date of no earlier than 3 days after the original posting date as listed above.Pay Range:Pay Range $89,700.00 - $162,150.00The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.#Remote#Featuredjob
Created: 2025-09-29