Senior Site Reliability Engineer (Cloud Infra)
Hippocratic AI - Palo Alto, CA
Apply NowJob Description
About UsHippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health.Why Join Our TeamInnovative Mission: We are developing a safe, healthcare-focused large language model (LLM) designed to revolutionize health outcomes on a global scale.Visionary Leadership: Hippocratic AI was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from leading institutions, including El Camino Health, Johns Hopkins, Stanford, Microsoft, Google, and NVIDIA.Strategic Investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.World-Class Team: Our team is composed of leading experts in healthcare and artificial intelligence, ensuring our technology is safe, effective, and capable of delivering meaningful improvements to healthcare delivery and outcomes.For more information, visit value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA unless explicitly noted otherwise in the job descriptionAbout the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products.Key ResponsibilitiesDesign and implement infrastructure automation and deployment pipelines using tools such as Terraform, Ansible, and JenkinsImplement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platformWork closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or AzureDevelop and maintain security and compliance policies and procedures for our healthcare AI platformCollaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operationsImplement and maintain disaster recovery and business continuity plansDevelop and maintain documentation related to infrastructure, deployment, and operationsMentor and provide technical guidance to junior engineersQualificationsBachelor's or Master's degree in Computer Science, Computer Engineering, or a related fieldAt least 5 years of professional experience in DevOps engineering or a related fieldExpertise in infrastructure automation and deployment tools such as Terraform, Ansible, Jenkins, or GitLab CI/CDExperience with cloud platforms such as AWS, GCP, or AzureStrong knowledge of containerization technologies such as Docker and KubernetesExperience with monitoring and logging tools such as ELK, Grafana, or DatadogFamiliarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key VaultStrong problem-solving skills and ability to work independently and collaboratively in a team environmentExcellent communication and interpersonal skillsExperience implementing HIPAA and SOC2 compliance in a plusExperience working in an HPC Environment is a plus #J-18808-Ljbffr
Created: 2025-09-17