Software Development Snr Manager
Oracle - Austin, TX
Apply NowJob Description
Job Description Oracle Cloud Infrastructure (OCI) is Oracleu2019s next-generation cloud platform, engineered to handle the most demanding enterprise workloads. Within OCI, the AI Platform organization is building a comprehensive cloud service to support the full lifecycle of AI and machine learning u2014 from GPU infrastructure and training pipelines to model serving and deployment tools u2014 enabling Oracle teams and customers to build and deploy AI at scale. We are seeking a Senior Manager of Software Development to lead the team responsible for the foundational AI infrastructure powering Oracleu2019s GenAI and ML initiatives. This role will focus on critical components of OCIu2019s AI platform, including large-scale GPU cluster management, self-service ML infrastructure, end-to-end model lifecycle capabilities including training and serving. Help shape the core infrastructure powering Oracleu2019s generative AI and machine learning solutions. Tackle some of the most challenging problems in AI infrastructure at enterprise scale. Collaborate with world-class teams and leaders driving innovation in cloud and AI. Be part of a high-visibility initiative central to Oracleu2019s future. This role requires strong technical and leadership skills, with a deep understanding of cloud-native infrastructure, distributed systems, and modern AI/ML workloads. You will collaborate across OCI and Oracleu2019s product teams to power internal and customer-facing AI solutions at scale. You will help define the vision and technical strategy for a key OCI service. You will recruit, inspire, and lead a high-performing engineering team building foundational services to support data scientists and AI experts. Youu2019ll manage resources, set priorities, and drive execution to meet the ambitious demands of this rapidly growing product offering. Responsibilities Responsibilities + Build and lead a high performing engineering team (7+ engineers). + Lead the development and operations of AI Platform supporting high-performance model training and inference. + Build self-service capabilities for engineers and data scientists to manage GPU workloads, monitor usage, and streamline ML workflows. + Design and deliver model lifecycle services u2014including training, fine-tuning, evaluation, and scalable model serving. + Collaborate with internal science, product, research, and infrastructure teams to ensure AI workloads are optimized for performance, cost, and reliability. + Ensure strong security, observability, and operational best practices across the platform. + Inspire a culture of
Created: 2025-09-15