StaffAttract
  • Login
  • Create Account
  • Products
    • Private Ad Placement
    • Reports Management
    • Publisher Monetization
    • Search Jobs
  • About Us
  • Contact Us
  • Unsubscribe

Login

Forgot Password?

Create Account

Job title, industry, keywords, etc.
City, State or Postcode

Eng Sr - SW

BAE Systems USA - Nashua, NH

Apply Now

Job Description

Job Description Job Summary Thank you for your interest in BAE Systems! We're looking for a seasoned Senior Site Reliability Engineer (SRE) to ensure the reliable deployment, operation, and continuous improvement of our digital engineering software tools across BAE Systems factories in North America. The role blends deep technical expertise with strong leadership, guiding cross-functional teams to keep our mission-critical microservices, monitoring stacks, and data stores healthy and performant. Key Responsibilities Monitor, troubleshoot, and resolve production incidents, ensuring rapid root-cause analysis and long-term fixes. Design, build, and maintain automated deployment pipelines for the digital engineering software suite using asset/inventory management tools. Deploy, configure, and operate the observability stack (Prometheus, Grafana, Fluent Bit, Loki) to provide real-time metrics, logs, and tracing for all services. Monitor and troubleshoot PostgreSQL database health, performance, and replication issues; implement automated alerts and remediation. Use Consul to service-discover and health-check gRPC microservices; ensure service mesh reliability and failover handling. Define and track SLIs/SLOs, error budgets, and reliability targets for each factory site; drive root-cause analysis and post-mortems for incidents. Lead incident response, on-call rotations, and runbooks; mentor junior engineers in debugging distributed systems. Collaborate with software developers, factory operations, and external vendors to embed reliability into the software development lifecycle. Evaluate emerging tools and technologies that can improve observability, automation, or performance while staying aligned with our on-premise strategy (no public cloud platforms). Automate operational tasks and create self-service tooling to reduce manual overhead. Hybrid: Because reliable operation often requires on-site collaboration with factory teams and access to physical infrastructure, the role will be primarily on-site at key manufacturing locations with the flexibility to work remotely for tasks that do not require direct interaction with factory hardware. About BAE Systems Electronic Systems BAE Systems, Inc. is the U.S. subsidiary of BAE Systems plc, an international defense, aerospace and security company which delivers a full range of products and services for air, land and naval forces, as well as advanced electronics, security, information technology solutions and customer support services. Improving the future and protecting lives is an ambitious mission, but it's what we do at BAE Systems. Working here means using your passion and ingenuity where it counts - defending national security with breakthrough technology, superior products, and intelligence solutions. As you develop the latest technology and defend national security, you will continually hone your skills on a team-making a big impact on a global scale. At BAE Systems, you'll find a rewarding career that truly makes a difference. Electronic Systems (ES) is the global innovator behind BAE Systems' game-changing defense and commercial electronics. Exploiting every electron, we push the limits of what is possible, giving our customers the edge and our employees opportunities to change the world. Our products and capabilities can be found everywhere - from the depths of the ocean to the far reaches of space. At our core are more than 14,000 highly talented Electronic Systems employees with the brightest minds in the industry, we make an impact - for our customers and the communities we serve. This position will be posted for at least 5 calendar days. The posting will remain active until the position is filled, or a qualified pool of candidates is identified. Required Skills and Education Required Education, Experience, and Skills Bachelor's degree in Computer Science, Electrical Engineering, or related field Minimum 4 years of experience in site reliability, DevOps, or systems engineering within a high-volume, multi-site manufacturing or industrial environment. Deep expertise in Windows systems, networking, and version-control workflows. Experience with observability tools: Prometheus, Grafana, Fluent Bit, Loki. Proficiency in automation/orchestration tools such as Ansible (or equivalent inventory-management solutions). Strong scripting/programming skills (Python or similar) for building custom monitoring and remediation logic. Excellent communication, problem-solving, and documentation abilities; comfortable working in a fast-paced, deadline-driven environment. Preferred Skills and Education Preferred Education, Experience, and Skills Experience with Industry 4.0 and digital transformation initiatives in manufacturing. Prior work integrating on-premise monitoring stacks with microservice architectures. Excellent communication, problem-solving, and documentation abilities; comfortable working in a fast-paced, deadline-driven environment. Experience monitoring and maintaining PostgreSQL databases in production. Familiarity with service-discovery and health-checking using Consul, especially for gRPC services. Strong grasp of data collection, management, and analysis, including: Data collection and integration from various sources Data management and storage solutions Data analysis and visualization techniques Data-driven decision-making and problem-solving

Created: 2026-03-04

➤
Footer Logo
Privacy Policy | Terms & Conditions | Contact Us | About Us
Designed, Developed and Maintained by: NextGen TechEdge Solutions Pvt. Ltd.