DevOps Engineer( Azure HPC)
微创软件 WICRESOFT - Washington, DC
Apply NowJob Description
Overview HPC Image team works on HPC Linux VM images for Azure customers. We build, test and release GPU VM images of Linux distributions optimized for High Performance Computing and AI workloads. Our goal is to streamline image building and validation on iterative intervals and automate testing across multiple VM types using Azure DevOps CI/CD pipelines. We are seeking someone with experience in Linux administration, HPC, and DevOps. Responsibilities Maintain and upgrade HPC Images - Create and maintain HPC Images (Linux Distribution) and support extensions for Trusted Launch Scenarios using CI/CD pipelines that may be triggered manually or automatically. Enhance existing image building and testing infrastructure as modules and add-ons are upgraded. Automation and Test Execution - Create, maintain and trigger automated validation pipelines in iterative intervals or on-demand for early bug detection and during each release phase (multiple times per month across multiple VM types). Log, detect and maintain validation results including a PowerBI performance dashboard. Upon successful validations and for each feature addition/bug fix, publish to Azure Marketplace for customer consumption. Release/Publish - Manage and release various HPC offerings in Linux Distribution to Azure Marketplace on a monthly cycle and support HPC-related image building and publishing on a quarterly basis. Collaborate with the engineering team to troubleshoot and resolve HPC image related issues and contribute to open-source image repository where applicable. Manage public GitHub requests such as image creation pipelines, bug fixes, code reviews, and other image-specific ad-hoc requests. Required / Minimum Qualifications 2+ years of experience in software design and development 1+ years of experience in HPC or Machine Learning 1+ years of experience with Deep Learning, AI Infrastructure, and accelerators 2+ years in Linux administration (Ubuntu, AlmaLinux/RHEL, Azure Linux), and/or DevOps in platforms of high-performance computation / communication for HPC/AI workloads. Familiarity with HPC networking and GPU computing is highly desirable. Good English speaking ability, and basic Chinese speaking ability. Position details Seniority level: Mid-Senior level Employment type: Full-time Job function: Information Technology Industries: Software Development #J-18808-Ljbffr
Created: 2025-09-17