Senior Principal Software Engineer, Networking for AI ...
Oracle - Trenton, NJ
Apply NowJob Description
Job Description The OCI (Oracle Cloud) AI Infrastructure Innovation team is at the forefront of developing next-generation AI and HPC networking for large-scale GPU superclusters. Our mission is to design and deliver cutting-edge RDMA-based networking solutions that enable our customers to achieve exceptional performance in AI training and inference. If you're passionate about large-scale distributed systems, high-speed networking, and AI workloads, this is your chance to shape the future of technology. Key Responsibilities Lead the architecture, system design, and implementation of high-performance RDMA solutions across OCI's AI and HPC platforms. Drive innovation in network and TCP performance while identifying necessary enhancements across kernel, NIC, switch, transport, protocol, storage, and GPU communications. Develop top-tier, high-performance software features that emphasize reliability, observability, and security. Set performance goals and success metrics; design benchmarks and conduct large-scale experiments to evaluate throughput, latency, and tail behavior. Collaborate with GPU platform, storage, database, and control-plane teams to deliver comprehensive solutions, influencing OCI-wide network architecture and standards. Mentor engineers, provide technical leadership and reviews, and contribute to a long-term roadmap and technical strategy. Required Qualifications Strong software engineering foundation with a deep understanding of data structures and algorithms, proven ability to optimize large-scale systems for high scale, low latency, and high throughput. Experience developing, deploying, and maintaining high-performance production software. Demonstrated leadership in complex problem-solving and mentoring others. BS/MS in Computer Science, Electrical/Computer Engineering, or equivalent practical experience. Preferred Qualifications Experience with RDMA networking (RoCE and/or InfiniBand), including congestion control, reliability, and performance tuning at scale. Familiarity with AI/HPC stacks and workloads such as NCCL, RCCL, MPI, Slurm, GPU communication patterns, collective operations, and large-scale training jobs. Experience integrating GPU Direct and NVMe-oF access in a production environment. Hands-on experience with observability and performance tooling (e.g., eBPF, perf, flame graphs, switch/NIC telemetry) and SLO-driven operations at scale. Disclaimer: Certain customer-facing roles may be subject to specific requirements, including immunization and occupational health mandates. Compensation Information US: Hiring range in USD from: $96,800 to $251,600 per annum. Eligibility for bonuses, equity, and compensation deferral may apply. Oracle provides extensive salary ranges to reflect variations in skills, experience, market conditions, and roles. Benefits Package Comprehensive medical, dental, and vision insurance Short-term and long-term disability coverage Life insurance and AD&D Flexible Spending Accounts for healthcare and dependent care Pre-tax commuter and parking benefits 401(k) Savings Plan with company match Flexible Vacation for full-time employees 11 paid holidays Paid sick leave: 72 hours upon hire, carries over to a max of 112 hours Paid parental leave and adoption assistance Employee Stock Purchase Plan Financial planning and group legal services Voluntary benefits including auto and pet insurance This role is open for applications for at least three calendar days from the posting date or until the position is filled. About Us At Oracle, we integrate data, infrastructure, applications, and expertise to drive advancements across various sectors, from innovation to essential healthcare solutions. Our commitment to AI enables customers to realize a better future. Join Oracle, where innovation thrives and each team member contributes to impactful solutions. We value diversity and inclusion and are dedicated to providing opportunities for all. Our competitive benefits support our employees' well-being, while we encourage community engagement through volunteer initiatives. We are committed to accessibility throughout our hiring process. If you need assistance or require accommodations, please reach out to us. Oracle is an Equal Employment Opportunity Employer, and we consider qualified applicants without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, and protected veteran status.
Created: 2026-03-10