DevOps Automation Eng Infrastructure
palo_alto_networks - Santa Clara, CA
Apply NowJob Description
PALO ALTO NETWORKS is the fastest-growing security company in history. We offer the chance to be part of an important mission: ending breaches and protecting our way of digital life. If you are a motivated, intelligent, creative, and hardworking individual, then this job is for you!THE ROLE:We’re looking for Site Reliability Engineers (SREs) with creative and innovative problem-solving skills. As a member of Infra SRE, you will work with other SRE and help us design, build and maintain mission-critical infrastructure and tools as a platform. You will own development efforts in each sprint from planning to delivery and will partner with other engineering teams to provide technical vision in making their services more observable, scalable and reliable. In this role as a Cloud Platform SRE you'll take ownership for reliability, scalability, automation, uptime and availability of our Cloud App and microservices platform. You will have the opportunity to gain technical breadth while sharing your cloud platform expertise with other team members.You will not only identify problems but also develop and implement automation solutions in AWS that operate at scale. The best person for this role is someone that has a collaborative spirit and can seamlessly collaborate and pair with other engineering teams to build and manage a reliable, secure, and scalable platform for microservices.THE RESPONSIBILITIES:Design, build and maintain Infra in AWS to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations.Set up critical infrastructure, develop tools and framework to automate operational tasks, deployment of machines, services/appWork closely with engineering teams to ensure microservices are designed with scale, operability, and performanceCreate meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactivelyDefine Service Level Objectives for product(s) to constantly measure their reliability in production. Maximize services uptime and availability ensuring functional and performance SLAsDevelop custom code or scripts to automate infrastructure, monitoring servicesCross Functionality with Engineering Teams: Contribute to architecture diagrams and other documentation for security reviewsInitiate, lead scripting and automation to streamline system updates and upgradesQUALIFICATIONSBS or MS Degree in Computer Science or Engineering involving 7-10 years coding experience in DevOps or SRE role.Deep understanding of at least one of modern programming language: Java, C, C++, Python, C#.Fluency in Linux, AWS services, and systems management tools (Ansible, Puppet, Chef, etc.)Fundamental understanding of distributed systems including: the CAP Theorem, Microservices, and the Twelve Factor App.PREFERRED QUALIFICATIONS:Demonstrated ability to write programs using a high-level programming language like: C, Java, Python, RubyHands-on operational experience in creating and managing microservicesExcellent communication skills and the ability to work well in a teamStrong automation skills to automate routine tasks using Python or BASH scriptingSystematic problem-solving approach, strong customer focus, ownership, urgency, and drive to complete a taskDemonstrated capability to provide depth and breadth technical leadership to agile teamsSKILLS AND EXPERIENCEExpertise in configuration management with a framework such as Ansible, Chef, or Puppet5+ years Experience in Site Reliability, or infrastructure engineering for a commercial SaaS solutions5+ years Expertise in AWS cloud infrastructure and its related servicesSerious troubleshooting skills across different levels of stackDeep experience in monitoring distributed application architectureExperience monitoring cloud services with DatadogStrong experience with Linux and MySQLProficiency with a programming language like Python, Ruby, Java and shell scripting to automate tasksExperience in CI/CD automation and GitHubExperience in custom code or scripts for 'destructive testing' to ensure adequate resiliency in productionExcellent problem solving, critical thinking, communication, and teamwork skillsExcellent written and verbal communication, able to collaborate and rally supportBS or MS in Computer Science, related field, or equivalent professional experienceLearn more about Palo Alto NetworksHEREand check out ourFAST FACTS #LI-MT1 #J-18808-Ljbffr
Created: 2025-09-17