AI & HPC Infrastructure Solutions Engineer

Accenture - Beaverton, OR

Apply Now

Job Description

We Are: The Global Infrastructure Engineering AI & HPC team plays a pivotal role in reshaping infrastructure for the innovative era of digital solutions driven by AI and High-Performance Computing. Our team excels in integrating expertise across cloud, on-premises, and hybrid environments to craft and manage cutting-edge infrastructure tailored for high-performance workloads at scale. We empower our most crucial clients to achieve unprecedented levels of performance, efficiency, and innovation. Our responsibilities span the complete lifecycle—from strategic planning and architectural design to execution and operational management—encompassing modernization efforts across the entire infrastructure stack. We collaborate with industry partners to unlock new technologies, drive growth, and transform markets. Join our proactive team as we spearhead the way enterprises utilize AI and HPC to foster groundbreaking innovation and redefine infrastructure capabilities. Key Responsibilities: Design and implement advanced HPC and AI infrastructure solutions, ensuring alignment with specific industry performance and scalability requirements. Deploy, configure, and manage XPU-based clusters using schedulers, VM/K8s orchestration platforms, Slurm, and containerized environments to facilitate Metal as a Service (MaaS), GPUaaS, AIaaS, and more. Optimize cluster performance, scalability, energy consumption, and cost-effectiveness across on-premises, cloud, and hybrid infrastructures. Integrate AI and HPC platforms with existing IT systems, data workflows, and security protocols. Monitor, troubleshoot, and fine-tune infrastructure to guarantee high availability, low-latency networking, and resilient workload performance. Create and maintain comprehensive documentation, including architecture diagrams, configuration standards, and operational guides. Provide users with technical guidance and support to enhance the execution of HPC/AI workloads, including large-scale models and simulations. Travel may be necessary for this position, varying from 25% to 100% based on business and client needs. Required Skills and Qualifications: A minimum of 4 years of practical experience in designing, deploying, and managing HPC and AI infrastructure across multiple environments—on-premises, cloud, and hybrid—supporting key sectors such as Financial Services, Life Sciences, Manufacturing, and Retail. At least 4 years of experience with accelerated computing architectures (GPUs, XPUs, DPUs), high-performance networking fabrics (InfiniBand, Ethernet), SONiC, and modern storage/data platforms (e.g., NVMe-oF, Lustre, GPFS). A minimum of 4 years of expertise in cluster management and orchestration tools (e.g., Slurm, Run:ai, Kubernetes, Docker) and real-time performance monitoring and observability frameworks. At least 4 years of experience with cloud platforms (e.g., AWS, Azure, GCP) and virtualization technologies including expertise in automation and optimization through scripting (Python, AI tools) alongside foundational Infrastructure-as-Code tools like Terraform and Ansible. A minimum of 4 years of experience implementing MLOps and DevSecOps practices to establish secure, automated, and reproducible workflows. Bachelor's degree or equivalent work experience (minimum 12 years). An Associate's Degree requires at least 6 years of relevant experience. Preferred Skills and Qualifications: Experience managing deployments of clusters with 1,000+ GPUs for HPC and AI workloads, integrating various infrastructure services. Familiarity with GPU computing libraries and accelerators (e.g., NVIDIA CUDA, Dynamo, AMD ROCm). Experience in HPC & AI networking technologies (e.g., RoCE, InfiniBand) and multi-rail designs. Knowledge of Machine Learning frameworks (e.g., TensorFlow, PyTorch) and cloud-based data science environments. Proficiency in HPC & AI workload management and optimization strategies. Acquaintance with DevOps practices and tools (e.g., Ansible, Terraform) for automating infrastructure. Industry certifications related to NVIDIA infrastructure or public cloud providers are advantageous. Accenture's compensation varies based on multiple factors including office location, role, skill level, and experience. The annual salary range is as follows: California: $73,800 to $218,800 Cleveland: $68,300 to $175,000 Colorado: $73,800 to $189,000 District of Columbia: $78,500 to $201,300 Illinois: $68,300 to $189,000 Maryland: $73,800 to $189,000 Massachusetts: $73,800 to $201,300 Minnesota: $73,800 to $189,000 New York: $68,300 to $218,800 New Jersey: $78,500 to $218,800 Washington: $80,200 to $201,300 Accenture is an equal opportunity employer committed to diversity and inclusion. We welcome applicants from all backgrounds and ensure that individuals are not discriminated against based on various factors. Our rich diversity fuels innovation and creativity, allowing us to serve our clients and communities more effectively. For more details about our commitment to equal opportunities, please refer to our Equal Opportunity Statement.

Created: 2026-03-04

➤

Login

Create Account

AI & HPC Infrastructure Solutions Engineer