Cloud Reliability Engineer - Core Technology ...
Bank of America - Charlotte, NC
Apply NowJob Description
Job Description:Cloud Reliability Engineer (SRE)Job Description:Responsible for reliability and support of Internal Cloud, Public Cloud (Azure /IBM) and OpenShift Containers (Dockers) services.Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.Troubleshoot issues across the entire stack: hardware, software, application, and networkPerform deep dives into both systemic and latent reliability issues; perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.Identify and drive opportunities to improve automation for the cloud servicesProvide on-call coverage as per rotationBe a key stakeholder in the design of cloud services so that they are resilient from day 0 and identify/fix resiliency problems by collaborating with product teamsRequired Skills:BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.Minimum 6+ years of hands-on experience maintaining infrastructure servicesExcellent understanding of Linux /Windows operating systems administrationExperience with VMware, Azure cloud, OpenShift Docker, Kubernetes Experience with automation in one or more of the programming: Python, Java, Ansible and shell scripting and source control (git)Experience with SQL/NoSQL databases like MySQL MongoDB and CI/CD tools git /JenkinsSystematic problem-solving approach, sense of ownership and driveExcellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.Desired Skills:Experience with Ansible Tower, RedHat Satellite Foreman, capsule architecture knowledge is a plus.Experience with Hashicorp Vault /Terraform /Consul /Nomad is a plus.Job Band:H5Shift: 1st shift (United States of America)Hours Per Week:40Weekly Schedule:Referral Bonus Amount:0 --> Job Description:Cloud Reliability Engineer (SRE)Job Description:Responsible for reliability and support of Internal Cloud, Public Cloud (Azure /IBM) and OpenShift Containers (Dockers) services.Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.Troubleshoot issues across the entire stack: hardware, software, application, and networkPerform deep dives into both systemic and latent reliability issues; perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.Identify and drive opportunities to improve automation for the cloud servicesProvide on-call coverage as per rotationBe a key stakeholder in the design of cloud services so that they are resilient from day 0 and identify/fix resiliency problems by collaborating with product teamsRequired Skills:BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.Minimum 6+ years of hands-on experience maintaining infrastructure servicesExcellent understanding of Linux /Windows operating systems administrationExperience with VMware, Azure cloud, OpenShift Docker, Kubernetes Experience with automation in one or more of the programming: Python, Java, Ansible and shell scripting and source control (git)Experience with SQL/NoSQL databases like MySQL MongoDB and CI/CD tools git /JenkinsSystematic problem-solving approach, sense of ownership and driveExcellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.Desired Skills:Experience with Ansible Tower, RedHat Satellite Foreman, capsule architecture knowledge is a plus.Experience with Hashicorp Vault /Terraform /Consul /Nomad is a plus.Job Band:H5Shift: 1st shift (United States of America)Hours Per Week:40Weekly Schedule:Referral Bonus Amount:0 Job Description:Cloud Reliability Engineer (SRE)Job Description:Responsible for reliability and support of Internal Cloud, Public Cloud (Azure /IBM) and OpenShift Containers (Dockers) services.Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.Troubleshoot issues across the entire stack: hardware, software, application, and networkPerform deep dives into both systemic and latent reliability issues; perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization.Identify and drive opportunities to improve automation for the cloud servicesProvide on-call coverage as per rotationBe a key stakeholder in the design of cloud services so that they are resilient from day 0 and identify/fix resiliency problems by collaborating with product teamsRequired Skills:BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.Minimum 6+ years of hands-on experience maintaining infrastructure servicesExcellent understanding of Linux /Windows operating systems administrationExperience with VMware, Azure cloud, OpenShift Docker, Kubernetes Experience with automation in one or more of the programming: Python, Java, Ansible and shell scripting and source control (git)Experience with SQL/NoSQL databases like MySQL MongoDB and CI/CD tools git /JenkinsSystematic problem-solving approach, sense of ownership and driveExcellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.Desired Skills:Experience with Ansible Tower, RedHat Satellite Foreman, capsule architecture knowledge is a plus.Experience with Hashicorp Vault /Terraform /Consul /Nomad is a plus.Shift:1st shift (United States of America)Hours Per Week: 40
Created: 2021-11-29