Site Reliability Engineer
Wells Fargo - Irving, TX
Apply NowJob Description
Title: Senior Site Reliability Engineer Location: Charlotte, NC Alternate Location: Irving, TX Duration: 18 months Work Engagement: W2 Work Schedule: 3 days in office/2 days remote Benefits on offer for this contract position: Health Insurance, Life insurance, 401K and Voluntary Benefits In this contingent resource assignment, you may: Consult on complex initiatives with broad impact and large-scale planning for Systems Operations Engineering. Review and analyze complex multi-faceted, larger scale or longer-term Systems Operations Engineering challenges that require in-depth evaluation of multiple factors including intangibles or unprecedented factors. Contribute to the resolution of complex and multi-faceted situations requiring solid understanding of the function, policies, procedures, and compliance requirements that meet deliverables. Strategically collaborate and consult with client personnel. Required Qualifications: 5+ years of Systems Engineering or Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work or consulting experience, training, military experience, education. We are seeking a Senior Site Reliability Engineer (SRE) with a strong background in software engineering and a passion for solving complex problems at scale. This role blends software engineering with operational expertise to deliver stable, scalable, and resilient services, while reducing toil and shifting operations left. Key Responsibilities * Design and implement automated tooling to eliminate manual toil and optimize operations. * Build and enhance monitoring, alerting and overall observability. * Champion the SRE practice within COO Technology by modeling best practices, mentoring peers, and collaborating with embedded platform SRE teams. * Enhance system availability in a multi-cloud environment by evolving resiliency patterns. * Introduce and scale AIOps, including self-healing and autonomic systems using AI/ML, RPA, and unified communications. * Automate key SRE metrics and IT service operations processes, including customer impact analysis, availability tracking, SLO/SLI adherence, error budgeting, and incident response. * Support critical applications and customer journeys, lead Agile-based remediation efforts, conduct blameless postmortems, and drive root cause analysis to eliminate recurring issues. * Implement and guide through Non-Functional Requirements (NFRs) during modernization and uplift initiatives. * Help define, govern and enforce Permit to Operate Qualifications * Applicants must be authorized to work for ANY employer in the U.S. This position is not eligible for visa sponsorship. * Proven experience as an SRE * Hands on experience leading, operating and performing within an SRE team. * Strong hands-on experience with Kubernetes and OpenShift * Understanding of AutoSys * Excellent communication skills (clear, concise, professional) * Experience with Data platforms: Oracle, DB2, SQL, MongoDB, Hadoop, Cloudera, Spark, Teradata * Solid experience with Observability tools: Grafana, Prometheus, ELK/Splunk, AppDynamics, Cloud Google Logger, Elastic, Thousand eyes, Aternity * Financial services background preferred
Created: 2026-04-02