AI Compute Architect
Oracle - Nashville, TN
Apply NowJob Description
Job Description Join OCI in transforming the future of technology with our pioneering AI clusters and cutting-edge infrastructure. Our AI Infrastructure team is devoted to building a high-performance GPU platform that caters to AI, ML, and HPC workloads. This is a unique opportunity for you to play a pivotal role in a revolutionary project that allows customers to effortlessly scale from tens to thousands of GPUs without compromising performance. As a vital member of our team, you will take charge of designing and implementing innovative architectural improvements for GPU delivery, health monitoring, triage automation, and diagnostics. These enhancements are essential for effectively managing distributed AI, ML, and HPC workloads across extensive GPU networks, using state-of-the-art technologies like RoCE and Infiniband. Your collaboration with cutting-edge technologies will greatly influence our organization's future direction. Responsibilities Lead initiatives to ensure our AI infrastructure meets the evolving demands of enterprise and AI/ML clients. Engage with enterprise and AI/ML customers to understand their workload requirements and customize OCI Kubernetes and Slurm solutions accordingly. Guide the technical roadmap to enhance AI infrastructure efficiency. Work closely with organizational leaders to improve team and overall performance. Participate in or facilitate design reviews to identify the best technological solutions. Drive technical innovation to boost performance and reliability, ensuring seamless operations for customers during demanding workloads. Evaluate and improve OCI's architectural practices to establish it as a leader in high-demand AI workloads. Encourage a culture of resilience in engineering teams, emphasizing scalability, performance, and quick GPU delivery in all software systems. Mentor teams in adopting robust architectural practices to maintain stability in challenging environments. Stay current with industry trends to help assess and develop new technologies. Requirements BS or MS in Computer Science, Engineering, or a related field. 12+ years of software development experience. Expertise in Control Plane, Data Plane, or both. Excellent organizational, verbal, and written communication skills. Strong abilities in public speaking and executive presentations. Familiarity with networking protocols (TCP/IP, UDP, HTTP) and standard network architectures. Deep technical knowledge of distributed systems, high-performance computing, and GPU systems. Experience in designing, developing, troubleshooting, and debugging software on various platforms. Proficiency in coding in Java and working with REST APIs. Experience in creating architectures that ensure high availability, scalability, and adaptability to evolving business needs. Demonstrated history of product delivery and familiarity with the complete software development lifecycle. Experience managing large-scale, highly distributed service infrastructures. Knowledge of cloud platforms (AWS, OCI, GCP, Azure, etc.). Proven experience in mentoring for career advancement. Preferred Qualifications Experience with Nvidia training technologies (CUDA, NCCL). Background in AI model training infrastructure. Disclaimer: Some US customer-facing roles may require compliance with immunization and occupational health mandates. Salary Range: The hiring range in USD is from $136,600 to $338,500 per annum, with eligibility for bonus, equity, and compensation deferral. Oracle offers a comprehensive benefits package that includes: Medical, dental, and vision insurance. Short-term and long-term disability coverage. Life insurance and AD&D. Supplemental life insurance for family members. Flexible Spending Accounts for healthcare and dependent care. Pre-tax commuter benefits. 401(k) plan with company matching. Flexible vacation and paid time off policies. 11 paid holidays. Paid sick leave with carryover provisions. Paid parental leave. Adoption assistance and employee stock purchase plans. Financial planning and group legal options. Voluntary benefits including home, auto, and pet insurance. This position is open for application for at least three calendar days from the posting date or as long as it remains active. Career Level - IC6 About Us Oracle merges data, infrastructure, applications, and expertise to drive industry innovations that positively impact billions of lives. With AI integrated into our products, we empower customers to turn potential into a brighter future for all. Join a workforce that values innovation and inclusivity, offering competitive benefits and opportunities to give back to the community through volunteer activities. We are committed to employing individuals with disabilities and ensuring accessibility throughout the employment process. If you require assistance or accommodations, please contact us. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will be considered for employment without regard to protected characteristics.
Created: 2026-03-11