Senior ML Inference Platform Engineer

aion - Seattle, WA

Apply Now

Job Description

OverviewAION is building the next generation of AI cloud platform by transforming the future of high-performance computing (HPC) through its decentralized AI cloud. Purpose-built for bare-metal performance, AION democratizes access to compute power for AI training, fine-tuning, inference, data labeling, and full stack AI/ML lifecycle.Who You AreYou're an ML systems engineer who's passionate about building high-performance inference infrastructure. You don't need to be an expert in everything - this field is evolving rapidly, but you have strong fundamentals and the curiosity to dive deep into optimization challenges. You thrive in early-stage environments where you/'ll learn cutting-edge techniques while building production systems. You think systematically about performance bottlenecks and are excited to push the boundaries of what/'s possible in AI infrastructure.RequirementsKey ResponsibilitiesBuild and optimize LLM inference systems working towards 2-4x performance improvements over standard frameworks like vLLM and TensorRT-LLMImplement modern inference optimizations including KV-cache management, dynamic batching, speculative decoding, compression and quantization strategiesDevelop GPU optimization solutions using CUDA, with opportunities to learn advanced techniques like Triton kernel development and CUDA graphsDesign model evaluation and benchmarking systems to assess performance across reasoning, coding, and safety metricsResearch and integrate trending open-source models (DeepSeek R1, Qwen 3, Llama 4, Mistral variants) with optimized configurationsBuild performance monitoring and profiling tools for GPU cluster analysis, bottleneck identification, and cost optimizationCreate cost-performance optimization strategies that balance throughput, latency, and infrastructure costsExplore agent orchestration capabilities for multi-step reasoning and tool integration workflowsCollaborate with tech and product teams to identify optimization opportunities and translate them into production improvementsSkills & ExperienceHigh agency individual looking to own and influence product architecture and company direction3+ years of software engineering experience with focus on performance-critical systems and production deploymentsStrong Python expertise and working knowledge of C++ for performance optimizationWorking understanding of deep learning fundamentals including transformer architectures, attention mechanisms, and neural network training/inferenceHands-on experience of model serving and deployment techniquesExperience with at least one modern inference framework (vLLM, TensorRT-LLM, SGLang or similar) in a production settingHands-on experience with PyTorch including model development, training loops, and basic distributed computing conceptsUnderstanding of distributed systems concepts including load balancing, auto-scaling, and fault toleranceBasic GPU programming experience with CUDA or willingness to quickly learn GPU optimization techniquesStrong debugging and performance profiling skills for identifying and resolving system bottlenecksBenefitsJoin the ground floor of a mission-driven AI startup revolutionizing compute infrastructureWork with a high-caliber, globally distributed team backed by major VCsCompetitive compensation and benefitsFast-paced, flexible work environment with room for ownership and impactHybrid model: 3 days in-office, 2 days remote with flexibility to work remotely for part of the case you have any questions about the role please reach out to the hiring manager on LinkedIn or X. #J-18808-Ljbffr

Created: 2025-09-17

➤

Login

Create Account

Senior ML Inference Platform Engineer