Sr.Platform DevOps HPC Engineer
Srimatrix Inc. - Mountain View, CA
Apply NowJob Description
OverviewOur client is seeking a Sr.Platform DevOps HPC EngineerWhen submitting candidates please provide at minimum the following information: ____Location: Hybrid – 3 days onsite in Mountain View, CA with flexibility for the remainder of the daysLength 3 6-month CTHIndustry: AerospaceEmployment Type: only Permanent ResidentsSenior Staff Platform Engineer-DevOps/HPC to join our client team. The goal of a Senior Staff Platform Engineer-DevOps/HPC at our client is to design, build, and maintain secure, high-performance infrastructure that powers aerospace engineering, simulation, testing, and mission operations. You will also support a scalable CI/CD environment that spans both on-premises and Cloud Environments. Your work will directly enable rapid development, advanced modeling, and the secure operation of systems critical to our aerospace programs.Responsibilities(Note: The original description provides responsibilities in narrative form; this section is preserved as part of the overview content to reflect role expectations that were stated.)Requirements12+ years in Platform Engineering, SRE, or DevOps, ideally in mission-critical environments.Aerospace/defence experience.Familiarity with managing Distributed Systems, including HPC clusters (Slurm, PBS, Grid Engine).Cloud infrastructure experience (AWS, Azure, Google Cloud Platform), preferably Google Cloud Platform/AWS.Proficiency with Terraform and Ansible.Observability tools experience (Prometheus, Grafana, ELK).Strong networking, security, and system performance knowledge.CI/CD pipeline and automation experience, preferably GitLabLinux administration and troubleshooting.Scripting experience (Python, Bash, Go).Preferred:Cloud-based or hybrid HPC solutions (AWS ParallelCluster).Familiarity with NIST 800-53, FedRAMP, and aerospace security.Familiarity with storage systems and parallel filesystems (e.g., Lustre, GPFS, PANFS, NFS) in HPC setups.Experience deploying and operating Kubernetes in production environments.GPU computing exposure (CUDA, AI/ML).Relevant certifications (AWS, HashiCorp, CNCF). #J-18808-Ljbffr
Created: 2025-09-17