Lead Software Engineer - AI GPU Development
Oracle - Dover, DE
Apply NowJob Description
Job Description Join Oracle Cloud Infrastructure's (OCI) architecture development engineering team as a Lead Software Engineer specializing in GPU platform software and system development. We are pioneering innovations in AI, focusing on the next generation of AI accelerators and advanced hardware solutions. In this exciting role, you will engage in evaluating, prototyping, and optimizing state-of-the-art AI hardware and accelerators, including custom-designed AI chips. Your efforts will drive the evolution of next-generation Cloud AI Infrastructure platforms. You will play a key role in defining the platform, overseeing development processes, conducting design reviews, and performing system integration, performance testing, and characterization. Collaboration will be essential as you work closely with third-party GPU IC suppliers and partners, as well as internal hardware and software teams, to enhance Oracle's AI Cloud platform solutions. Your contributions will directly influence the future of AI hardware in machine learning and deep learning applications. Responsibilities Conduct system architecture evaluations and analyze proposed implementation paths. Collaborate directly with hardware design teams on architecture, implementation, deployment, and troubleshooting of AI hardware platforms. Perform comprehensive benchmarking and performance assessments of AI accelerators from emerging hardware vendors. Compare new AI accelerators against industry-standard hardware for training and inference workloads. Develop tools and processes for real-world performance evaluation of AI hardware. Assist in designing and refining performance optimization algorithms for AI models on hardware. Basic Qualifications BS or MS degree in Computer Science or a related technical field. 10+ years of experience in software development. Proficient in coding with Java, GoLang, C#, or other object-oriented languages. In-depth knowledge of AI/GPU platform architecture and their capabilities. Experience in managing large-scale, distributed service infrastructures. Hands-on experience with GPU supplier test code and open-source AI testing tools. Familiarity with designing and implementing modern server platforms with various architectures, including x86 and ARM. Proven ability in debugging complex hardware-software issues. Strong problem-solving skills, excellent communication, and a proactive mindset. Preferred Qualifications Technical lead experience on a large-scale cloud service. Experience in developing and maintaining services on a public cloud platform. Familiarity with AI accelerator chips and performance evaluation tools. Knowledge of AI model optimization techniques for hardware. Experience with firmware running and system diagnostics tools; adept in scripting for test customization. Disclaimer: Certain customer-facing roles may be subject to immunization and occupational health mandates. Candidates are typically placed within the hiring range based on several factors including skills, experience, and internal peer equity. About Us Oracle brings together data, infrastructure, applications, and expertise, driving innovations that make a significant impact. Explore your potential with a company leading in AI and cloud solutions that benefit billions. We are dedicated to a diverse and inclusive workforce, providing competitive benefits and supporting community contributions. We are committed to accommodating individuals with disabilities throughout the employment process. For assistance, please reach out regarding accessibility accommodations. Oracle is an equal opportunity employer, ensuring fair consideration for all qualified applicants without discrimination.
Created: 2026-03-04