Member of Technical Staff -- Inference
RadixArk - Palo Alto, CA
Apply NowJob Description
About the RoleRadixArk is seeking a Member of Technical Staff - Inference to push the limits of large-scale AI inference.You will work on the core systems that serve frontier models at scale, optimizing performance, latency, throughput, and cost across thousands of GPUs. This role sits at the intersection of systems engineering, ML infrastructure, and performance optimization.Your work will directly shape how state-of-the-art models are deployed and experienced by users worldwide.This is a deeply technical, high-impact role for engineers who enjoy working close to the hardware-software boundary and solving performance-critical problems at scale.Requirements5+ years of experience in systems engineering, ML infrastructure, or performance-critical backend systemsStrong expertise in large-scale inference systems for LLMs or generative modelsDeep understanding of GPU architecture and performance characteristicsExperience optimizing latency- and throughput-critical production systemsStrong knowledge of distributed systems and networking fundamentalsProficiency in C++, Rust, Go, or Python for production systemsExperience profiling and optimizing compute-intensive workloadsStrong debugging skills across system layers (model, runtime, kernel, network)Strong PlusExperience with LLM serving stacks (vLLM, TensorRT-LLM, SGLang, etc.)Familiarity with CUDA, Triton, or custom kernel optimizationExperience with batching, KV-cache management, and scheduling strategiesExperience running inference at scale (1000+ GPUs)Background in HPC or high-performance systemsOpen-source contributions in ML or systems infrastructureResponsibilitiesDesign and build large-scale inference systems for frontier AI modelsOptimize latency, throughput, and GPU utilization in production inferenceDevelop and improve model serving architectures and runtimesWork on batching, scheduling, and memory management strategiesCollaborate with kernel, compiler, and systems teams on performance optimizationDebug performance bottlenecks across the stackDrive reliability and scalability of inference infrastructureBuild tooling for observability, profiling, and performance analysisContribute to long-term inference architecture and strategyAbout RadixArkRadixArk is an infrastructure-first company built by engineers who've shipped production AI systems, created SGLang (20K+ GitHub stars, the fastest open LLM serving engine), and developed Miles (our large-scale RL framework).We're on a mission to democratize frontier-level AI infrastructure by building world-class open systems for inference and training.Our team has optimized kernels serving billions of tokens daily and designed distributed systems coordinating 10,000+ GPUs across training and serving.We're backed by leading infrastructure investors and collaborate with frontier AI labs and cloud providers.Join us in building the infrastructure layer that powers the next generation of AI.CompensationWe offer competitive compensation with meaningful equity, comprehensive benefits, and flexible work arrangements. Compensation depends on location, experience, and level.Equal OpportunityRadixArk is an Equal Opportunity Employer and welcomes candidates from all backgrounds.
Created: 2026-04-02