Senior Engineer, Performance - Cloud Software
NVIDIA - Santa Clara, CA
Apply NowJob Description
OverviewNVIDIA is widely considered to be one of the technology world’s most desirable employers. NVIDIA leads the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing (HPC) and Visualization. DGX Cloud provides a serverless generative AI infrastructure to the world enabling NVIDIA’s AI supercomputer technologies to be used by anyone. DGX Cloud engineering has a mission to ensure our customers receive timely and quality-assured releases. We are seeking a Performance Engineer proficient in performance and scalability testing, identifying limitations across the Kubernetes (K8s) and application stack using industry standard tools and telemetry. If you excel in problem-solving, can think creatively on your feet, and enjoy working in a distributed team setting, we would love to have you join us!ResponsibilitiesAnalyze and optimize performance across application, middleware, runtime, and infrastructure layers—networking, storage, GPU utilization, and beyondDevelop tooling and metrics that provide deep observability into system performanceCollaborate closely with infra, platform, runtime, and product teams to identify key performance goals and drive systemic improvementsLead investigations into high-impact performance regressions or scalability issues in productionInfluence architecture and design decisions to prioritize latency, throughput, and efficiency at scaleDrive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systemsQualificationsBachelor’s or Master’s degree in Computer Science, Data Science, or a related field (or equivalent experience)5+ years in software engineering with a strong track record in performance or scalability of high-scale distributed systemsAre deeply comfortable with performance profiling tools and tracing systemsBe able to identify performance issues, root cause problems, and be able to come up with potential solutionsExperience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Golang internals, GPU utilization)Contributed to observability, benchmarking, or performance-focused infrastructure at scaleStrong understanding of OS internals, scheduling, memory management, and IO patternsHave demonstrated success navigating ambiguity and aligning stakeholders around performance goalsProficient in container-based infrastructure (Docker, Kubernetes, Helm)Ways to stand out from the crowdDemonstrated ability to handle sophisticated technical environments while meeting or exceeding all security, reliability, scalability, and availability metricsStrong and confirmed knowledge of modern architectures at scaleCompensation and benefitsWith competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you/'re a creative and autonomous engineer with a real passion for technology, we want to hear from you.Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 144,000 USD - 230,000 USD for Level 3, and 168,000 USD - 270,250 USD for Level 4.You will also be eligible for equity and benefits.Applications for this job will be accepted at least until September 14, 2025. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr
Created: 2025-09-17