Apple Ray Inference Engineer

Apple Inc. - Cupertino, CA

Apply Now

Job Description

Cupertino, California, United States Software and ServicesThe Apple Data Platform (ADP) group builds the data platform that enables the next generation of intelligent experiences on all Apple products and services. ADP empowers Apple engineers to deliver ML-driven products and innovations rapidly and at scale. We are looking for an experienced engineer who can bring their passion for machine learning, infrastructure, big data, and distributed systems to build and serve world class data+ML platforms/products at scale. You will work with many cross functional teams and lead the planning, execution and success of technical projects with the ultimate purpose of improving the ML experience for Apple customers - with a focus on designing, deploying, and optimizing model inference. Are you passionate about building scalable, reliable, maintainable infrastructure and solving data problems at scale? Come join us and be part of the Data Infrastructure journey.DescriptionApple Ray leverages open-source Ray to offer a unified framework for processing and deployment of complex data+ML pipelines. It enables the next generation of intelligent experiences for Apple products and services by combining data and processing layers, as well as a model inference platform, into one unified end-to-end workflow that eliminates the complexity of running multiple independent jobs while significantly improving the hardware resource efficiency and development speed. Tight integration of Apple Ray with Apple Data services makes it the go-to solution when serving complex and large-scale data and ML pipelines. The team enables future Apple intelligent products by making a cutting edge ecosystem of data+ML technologies for large-scale and efficient systems for all data and ML engineers within Apple. As a member of the Apple Ray team, your responsibilities will include:Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scaleExperiment with, deploy, and manage LLMs in a production contextBenchmark and optimize inference deployments for different workloads, e.g. online vs. batch vs. streaming workloadsDiagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performanceDesign and extend services to improve functionality and reliability of the platformMonitor system performance, optimize for cost and efficiency, and resolve any issues that ariseBuild relationships with stakeholders across the organization to better understand internal customer needs and enhance our product for end usersMinimum Qualifications5+ years of experience in distributed systems with deep knowledge in computer science fundamentalsExperience managing deployments of LLMs at scaleExperience with inference runtimes/engines, e.g. ONNXRT, TensorRT, vLLM, sglangExperience with ML Training/Inference profiling and optimization for different workloads and tasks, e.g. online inference, batch inference, streaming inferenceExperience with profiling ML models for different end use cases, e.g. RAG vs. code completion, etc.Experience with containerization and orchestration technologies, such as Docker and Kubernetes.Experience in delivering data and machine learning infrastructure in production environmentsExperience configuring, deploying and troubleshooting large scale production environmentsExperience in designing, building, and maintaining scalable, highly available systems that prioritize ease of useExperience with alerting, monitoring and remediation automation in a large scale distributed environmentExtensive programming experience in Java, Python or GoStrong collaboration and communication (verbal and written) skillsB.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experiencePreferred QualificationsUnderstanding of the ML lifecycle and state of the art ML Infrastructure technologiesFamiliarity with CUDA + kernel implementationExperience with inference optimization and fine-tuning techniques (e.g. pruning, distilling, quantization)Experience with deploying + optimizing ML models on heterogeneous hardware, e.g. GPUs, TPUs, Inferentia, etc.Experience with GPU and other type of HPC infrastructureExperience with training framework like PyTorch, Tensorflow, JAXDeep understanding of Ray and KubeRayAt Apple, base pay is one part of our total compensation package and is determined within a range. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant. #J-18808-Ljbffr

Created: 2025-09-17

➤

Login

Create Account

Apple Ray Inference Engineer