Machine Learning Engineer - Model Performance

inference.net - San Francisco, CA

Apply Now

Job Description

Join to apply for the Machine Learning Engineer - Model Performance role at Join to apply for the Machine Learning Engineer - Model Performance role at is seeking a Machine Learning Engineer to join our team, focusing on optimizing the performance of our cutting-edge AI inference systems. This role involves working with state-of-the-art large language models and ensuring they run efficiently and effectively at scale. You will be responsible for deploying state-of-the-art models at scale and performing optimizations to increase throughput and enable new features. This position offers the chance to collaborate closely with our engineering team and make significant contributions to open source projects, like SGLang and vLLM.About We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco. Our investors include A16z CSX and Multicoin. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do.ResponsibilitiesDesign and implement optimization techniques to increase model throughput and reduce latency across our suite of modelsDeploy and maintain large language models at scale in production environmentsDeploy new models as they are released by frontier labsImplement techniques like quantization, speculative decoding, and KV cache reuseContribute regularly to open source projects such as SGLang and vLLMDeep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issuesCollaborate with the engineering team to bring new features and capabilities to our inference platformDevelop robust and scalable infrastructure for AI model servingCreate and maintain technical documentation for inference systemsRequirements3+ years of experience writing high-performance, production-quality codeStrong proficiency with Python and deep learning frameworks, particularly PyTorchDemonstrated experience with LLM inference optimization techniquesHands-on experience with SGLang and vLLM, with contributions to these projects strongly preferredFamiliarity with Docker and Kubernetes for containerized deploymentsExperience with CUDA programming and GPU optimizationStrong understanding of distributed systems and scalability challengesProven track record of optimizing AI models for production environmentsNice to HaveFamiliarity with TensorRT and TensorRT-LLMKnowledge of vision models and multimodal AI systemsExperience implementing techniques like quantization and speculative decodingContributions to open source machine learning projectsExperience with large-scale distributed computingCompensationWe offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including :Full healthcare coverageQuarterly offsitesFlexible PTOEqual is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.If you're passionate about building the next generation of high-performance systems that push the boundaries of what's possible with large language models, we want to hear from you!Seniority levelSeniority levelNot ApplicableEmployment typeEmployment typeFull-timeJob functionJob functionEngineering and Information TechnologyIndustriesSoftware DevelopmentReferrals increase your chances of interviewing at by 2xSign in to set job alerts for "Machine Learning Engineer" roles.San Francisco, CA $115,000.00-$185,000.00 5 days agoSan Francisco, CA $140,000.00-$180,000.00 5 months agoSan Francisco, CA $175,000.00-$225,000.00 8 months agoSan Francisco, CA $150,000.00-$225,000.00 3 months agoAI / ML Engineer (Founding Technical Team)San Francisco, CA $145,000.00-$175,000.00 1 month agoSan Francisco, CA $100,000.00-$300,000.00 1 month agoSan Francisco, CA $140,000.00-$160,000.00 4 months agoResearch Engineer - Machine Learning (ML)San Francisco, CA $140,000.00-$200,000.00 3 weeks agoSan Mateo, CA $140,000.00-$210,000.00 1 month agoSan Mateo, CA $195,000.00-$255,000.00 7 months agoSan Francisco, CA $150,000.00-$195,000.00 3 days agoSan Francisco, CA $150,000.00-$225,000.00 2 weeks agoMachine Learning Engineer, Identity ProductSan Francisco, CA $212,000.00-$318,000.00 4 days agoSoftware Engineer - Data Acquisition / Web CrawlingSan Francisco, CA $225,000.00-$325,000.00 6 months agoSan Francisco, CA $85,000.00-$120,000.00 1 hour agoSan Francisco, CA $133,687.50-$178,250.00 3 days agoSan Francisco, CA $140.00-$210.00 8 months agoML Research Engineer, Foundation Models (Senior / Staff / Principal)San Francisco, CA $85,000.00-$300,000.00 4 weeks agoSan Francisco, CA $140,000.00-$250,000.00 1 month agoWe're unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.J-18808-Ljbffr #J-18808-Ljbffr

Created: 2025-09-17

➤

Login

Create Account

Machine Learning Engineer - Model Performance