Senior Product Manager - AI/ML Runtime Infrastructure
Amazon - Cupertino, CA
Apply NowJob Description
AWS Trainium is revolutionizing the AI/ML landscape, with millions of chips actively utilized to empower the training and inference of state-of-the-art models. The AWS Neuron software stack is pivotal for Trainium, facilitating customers in achieving excellence in deep learning and generative AI tasks while optimizing for both performance and cost. We are seeking an innovative Technical Product Manager dedicated to enhancing the developer experience for high-performance machine learning workloads on AWS Trainium. You will assist customers from the initial setup with Neuron Deep Learning Containers and AMIs to a large-scale operational framework encompassing orchestration, resiliency, and observability. Your role will involve shaping developer interactions with Trainium through container ecosystems and resource management platforms, ensuring seamless integration with orchestration tools like SLURM and Kubernetes, as well as AWS services such as EKS and SageMaker. You will also formulate strategies for resiliency and observability tools that support system diagnostics, performance monitoring, health management, automated recovery, and telemetry, ultimately maximizing uptime and efficiency for AI training and inference workloads. Success in this position hinges on collaboration with engineering teams, product managers in training and inference, marketing, business development, and solution architects. A deep understanding of Trainium Architecture and the Neuron Runtime System and its components will be crucial for crafting product strategies and making informed technical decisions. Key Responsibilities: Product Strategy & Vision: Spearhead product strategy and roadmap development, balancing performance, scalability, and developer experience, while producing PRFAQs and PRDs. Customer Discovery: Identify deployment challenges and infrastructure hurdles, prioritizing customer insights in executive decision-making. Technical Leadership: Align Neuron components with AWS services, creating user stories and defining success metrics. Impact: Enable customers like Anthropic and Databricks to deploy, monitor, and manage machine learning workloads efficiently through advanced container orchestration and resource management tools. About AWS Neuron: AWS Neuron is the essential software stack for executing deep learning and generative AI workloads on AWS Trainium and AWS Inferentia. It encompasses compilers, runtime, training and inference libraries, alongside developer tools for monitoring, profiling, and debugging. Built on an open-source foundation, Neuron supports popular ML frameworks, enabling rapid experimentation, distributed training, and cost-effective inference. BASIC QUALIFICATIONS: Bachelor's degree in computer science, engineering, or a related field. 10+ years of industry experience, including 5+ years in technical product management and 3+ years in software development. Robust understanding of container orchestration and Kubernetes. Proficient in computer architecture fundamentals and operating system concepts. Exceptional communication skills, both written and verbal. PREFERRED QUALIFICATIONS: Experience with Linux systems and kernel development. Proven track record of developing developer libraries. Familiarity with machine learning accelerators. Knowledge of performance optimization, profiling, and related tooling. Experience with deep learning model training or inference. Background in distributed computing and parallel processing. Hands-on experience with major ML frameworks like JAX or PyTorch. Adeptness with AWS services and cloud infrastructure. Demonstrated success in driving open standards and ecosystem integrations. Amazon is an equal opportunity employer. We highly encourage applications from individuals with diverse backgrounds and experiences. Our inclusive culture empowers our teams to deliver excellent results for our customers. If you require accommodation during the application process, please let us know. The salary for this position ranges from $136,100 to $235,200 annually, depending on geographical location, market factors, and individual experience. In addition to competitive salaries, Amazon offers a comprehensive range of benefits that promote employee well-being. This position is posted until filled. Please apply through our career site.
Created: 2026-03-17