Lead Software Engineer - IAG Platform & DevOps ...
Oracle - Washington, DC
Apply NowJob Description
Job Description We are looking for a Lead Software Engineer to take on a critical role in shaping our Identity & Access Governance (IAG) services. You will initially focus on IAG but will later expand into broader technical leadership across the organization. This position emphasizes software engineering with strong DevOps/SRE expertise. Your responsibilities will include developing robust software systems, deploying significant features into production, and ensuring their reliability, operability, and security by design. The ideal candidate has extensive experience in building and maintaining distributed cloud services and possesses a deep understanding of control plane architecture and service-to-service communication. You will lead the design of major service components, collaborate closely with Engineering Managers, Architects, and Technical Program Managers, and provide invaluable technical guidance to engineers of all levels. You should be just as comfortable creating architecture documentation and conducting peer reviews as you are with prototyping, coding, reviewing pull requests, enhancing build/deploy pipelines, and leading incident response when necessary. You will drive improvements in speed, quality, automation, and engineering standards, leaving both systems and teams better equipped. Responsibilities Architect and implement major features for IAG services and related platform dependencies, ensuring the software is scalable, secure, and highly operational. Establish technical standards for reliability, service maturity, and delivery metrics, encompassing SLIs/SLOs, error budgets, safe rollout strategies, and clear ownership delineations. Enhance the developer-to-production lifecycle by developing: CI/CD pipelines Automated testing and validation Infrastructure-as-code patterns Deployment strategies such as canary and progressive delivery Drive observability by integrating metrics, logs, and traces, and refining alerting practices to improve on-call effectiveness and enhance operational tools. Act as a technical escalation resource and first responder for urgent operational tasks, leading triage and coordination during complex production incidents. Conduct root cause analyses to transform incidents into engineering improvements, reducing recurrence through fixes and automation. Mentor and empower development teams to design operational systems, assist in service incubation, and elevate the quality of coding through thorough reviews and coaching. Address security and compliance requirements, including threat modeling, security evaluations, and establishing operational controls for regulated contexts. Qualifications A Bachelor’s degree in Computer Science or a related field (Master’s preferred), or equivalent experience. Over 10 years in software development, particularly in building and maintaining distributed production services. Strong proficiency in modern programming languages such as Java, Go, C++, Python, and a proven track record of successful production code delivery. A demonstrable ability to lead the full cycle of design and delivery for significant service capabilities. Thorough knowledge of distributed systems principles (data structures/algorithms, networking, concurrency, and failure management). Deep familiarity with cloud architecture patterns and operational designs, particularly related to control planes. Experience in establishing DevOps capabilities like CI/CD pipelines, automated testing, deployment automation, and infrastructure-as-code. Excellent debugging skills across networking and persistence layers, with a solid understanding of databases and distributed data consistency. Demonstrated leadership in handling high-severity incidents as a technical lead, including fast diagnosis and problem mitigation. Solid Linux knowledge, with an ability to adapt rapidly to Linux-based environments. Experience collaborating effectively with Architects, Engineering Managers, Product teams, and Program Managers to achieve timely and high-quality outcomes. Preferred Qualifications Experience developing and managing services on public cloud platforms (OCI preferred; AWS/Azure is also valuable). Hands-on experience with container orchestration (e.g., Kubernetes), service mesh/API gateways, and modern security practices. Familiarity with operating services across multi-availability domain/zones and/or multi-region strategies for regional resilience. A proven history of leading reliability initiatives such as SLO adoption and production readiness assessments. Experience building sophisticated CI/CD pipelines with strong testing and safe deployment practices. Background in regulated environments (e.g., FedRAMP, PCI DSS) with a focus on audit readiness and operational controls. Expertise in threat modeling or similar risk identification methodologies and translating findings into actionable engineering changes. Ability to obtain and maintain a U.S. Government security clearance preferred for work in regulated environments where applicable. Your role will involve crucial participation in defining and advancing standard procedures, specifying for major new projects, and developing software accordingly. Disclaimer: Certain positions may require compliance with specific health mandates.
Created: 2026-03-11