Member of Technical Staff - GPU Infrastructure
Prime Intellect - San Francisco, CA
Apply NowJob Description
Member of Technical Staff - GPU InfrastructureJoin to apply for the Member of Technical Staff - GPU Infrastructure role at Prime IntellectMember of Technical Staff - GPU Infrastructure1 day ago Be among the first 25 applicantsJoin to apply for the Member of Technical Staff - GPU Infrastructure role at Prime IntellectBuilding the Future of Decentralized AI DevelopmentAt Prime Intellect, we're enabling the next generation of AI breakthroughs by helping our customers deploy and optimize massive GPU clusters. As our Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms customer requirements into production-ready systems capable of training the world's most advanced AI models.We recently raised $15mm in funding (total of $20mm raised) led by Founders Fund, with participation from Menlo Ventures and prominent angels including Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao (Chief Scientific Officer of Together AI), Dylan Patel (SemiAnalysis), Clem Delangue (Huggingface), Emad Mostaque (Stability AI) and many others.Core Technical ResponsibilitiesThis customer-facing role combines deep technical expertise with hands-on implementation. You'll be instrumental in:Customer Architecture & DesignPartner with clients to understand workload requirements and design optimal GPU cluster architecturesCreate technical proposals and capacity planning for clusters ranging from 100 to 10,000+ GPUsDevelop deployment strategies for LLM training, inference, and HPC workloadsPresent architectural recommendations to technical and executive stakeholdersInfrastructure Deployment & OptimizationDeploy and configure orchestration systems including SLURM and Kubernetes for distributed workloadsImplement high-performance networking with InfiniBand, RoCE, and NVLink interconnectsOptimize GPU utilization, memory management, and inter-node communicationConfigure parallel filesystems (Lustre, BeeGFS, GPFS) for optimal I/O performanceTune system performance from kernel parameters to CUDA configurationsProduction Operations & SupportServe as primary technical escalation point for customer infrastructure issuesDiagnose and resolve complex problems across the full stack - hardware, drivers, networking, and softwareImplement monitoring, alerting, and automated remediation systemsProvide 24/7 on-call support for critical customer deploymentsCreate runbooks and documentation for customer operations teamsTechnical RequirementsRequired Experience3+ years hands-on experience with GPU clusters and HPC environmentsDeep expertise with SLURM and Kubernetes in production GPU settingsProven experience with InfiniBand configuration and troubleshootingStrong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stackExperience with infrastructure automation tools (Ansible, Terraform)Proficiency in Python, Bash, and systems programmingTrack record of customer-facing technical leadershipInfrastructure SkillsNVIDIA driver installation and troubleshooting (CUDA, Fabric Manager, DCGM)Container runtime configuration for GPUs (Docker, Containerd, Enroot)Linux kernel tuning and performance optimizationNetwork topology design for AI workloadsPower and cooling requirements for high-density GPU deploymentsNice to HaveExperience with 1000+ GPU deploymentsNVIDIA DGX, HGX, or SuperPOD certificationDistributed training frameworks (PyTorch FSDP, DeepSpeed, Megatron-LM)ML framework optimization and profilingExperience with AMD MI300 or Intel Gaudi acceleratorsContributions to open-source HPC/AI infrastructure projectsGrowth OpportunityYou'll work directly with customers pushing the boundaries of AI, from startups training foundation models to enterprises deploying massive inference infrastructure. You'll collaborate with our world-class engineering team while having direct impact on systems powering the next generation of AI breakthroughs.We value expertise and customer obsession - if you're passionate about building reliable, high-performance GPU infrastructure and have a track record of successful large-scale deployments, we want to talk to you.Apply now and join us in our mission to democratize access to planetary scale computing.Seniority levelSeniority levelMid-Senior levelEmployment typeEmployment typeFull-timeJob functionJob functionEngineering and Information TechnologyIndustriesSoftware DevelopmentReferrals increase your chances of interviewing at Prime Intellect by 2xGet notified about new Member of Technical Staff jobs in San Francisco, CA.Berkeley, CA $60,000.00-$240,000.00 1 year agoMember of Technical Staff - Software EngineerMember of Technical Staff, DevSecOps / InfrastructureSoftware Architect - Consulting Member of Technical StaffSan Francisco, CA $105,000.00-$230,000.00 6 hours agoSoftware Architect - Consulting Member of Technical StaffRedwood City, CA $96,800.00-$251,600.00 1 week agoSan Francisco, CA $58,800.00-$109,600.00 1 day agoSan Francisco, CA $136,947.00-$239,699.00 7 months agoEngineering Manager, Internal Tools, AGI AutonomyAssociate Director of Counseling & Psychological Services - (Administrator II) - Counseling and Psychological ServicesSan Francisco, CA $10,000.00-$120,000.00 10 months agoSan Francisco, CA $141,800.00-$221,600.00 2 weeks agoProject Archaeologist/ Cultural Resources SpecialistOakland, CA $140,000.00-$220,000.00 2 months agoSan Mateo, CA $141,800.00-$221,600.00 2 weeks agoSENIOR ENVIRONMENTAL SCIENTIST (SPECIALIST)Alameda, CA $7,556.00-$10,221.00 3 weeks agoSENIOR ENVIRONMENTAL SCIENTIST (SPECIALIST)Oakland, CA $7,556.00-$10,221.00 3 weeks agoSan Mateo, CA $90,000.00-$140,000.00 2 weeks agoChief Nursing Officer - San Mateo Medical CenterSan Mateo County, CA $278,054.40-$347,547.20 2 weeks agoDivision of Gastroenterology - GastroenterologistSan Francisco, CA $110,500.00-$164,700.00 2 weeks agoMember of Technical Staff (Student Internship)San Francisco, CA $6,700.00-$8,300.00 1 month agoMember of Technical Staff - Compute PlatformSan Francisco, CA $100,000.00-$150,000.00 1 month agoSan Francisco, CA $110,000.00-$400,000.00 2 months ago(New Grad) Member of Technical Staff, IntegrationsSan Francisco, CA $150,000.00-$300,000.00 2 weeks agoSan Francisco, CA $80.00-$150.00 1 day agoSan Francisco, CA $85,000.00-$100,000.00 3 weeks agoQuantum Engineer - Member of Technical StaffSan Francisco, CA $120,000.00-$180,000.00 3 months agoMember of Technical Staff, Founding Design EngineerSan Francisco, CA $130,000.00-$200,000.00 8 months agoMember of Technical Staff, Founding Frontend EngineerSan Francisco, CA $130,000.00-$200,000.00 7 months agoWe’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI. #J-18808-Ljbffr
Created: 2025-09-17