ML Runtime Optimization Engineer - Lead
Apam 91 - Mountain View, CA
Apply NowJob Description
About the role We are looking for a software engineer with expertise in optimizing ML models and deploying them on production-grade runtime environments and chips. You’ll work across the entire ML framework/compiler stack (e.g. PyTorch, JAX, ONNX, TensorRT, CUDA, XLA, Triton). At Applied Intuition, you will: Build the optimization pipeline for deploying ML models to real-world hardware. Build foundational libraries for analyzing and optimizing model performance, correctness, numerical stability, and cross-platform reproducibility. Closely collaborate with ML developers on model architecture details to reduce compiled latency and resource usage. Learn about the variety of production-grade boards our customers use and develop computational resource strategies for different customer needs. We're looking for someone who has: B.Sc in Computer Science, Mathematics or a related field Knowledge and experience with ML accelerators, GPU, CPU, SoC architecture and micro-architecture. Proficiency in C++, strong software development skills with the focus on high-performance computing. Working experience with Python. Experience in developing on or using deep learning frameworks (e.g., PyTorch, JAX, ONNX, etc.) Nice to have: M.Sc or PhD in a ML related area Built a ML compiler or optimization framework from scratch before Deployed ML solutions to embedded chips for real time robotics applications The salary range for this position is $125,000 - $222,000 USD annually. This salary range is an estimate, and the actual salary may vary based on the Company's compensation practices. #J-18808-Ljbffr
Created: 2025-09-17