Site Reliability Engineer, Recommendation ...
Tiktok - Seattle, WA
Apply NowJob Description
About the team The USDS TikTok Recommendations Infra SRE team works with engineering and product teams to build and run large-scale, globally distributed, observable, fault-tolerant systems. SREs on this team will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time. Responsibilities: * Engage in and improve the whole lifecycle of Recommendation systems - from system design consulting through to launch reviews, deployment, operation and refinement * Deliver tools/software to improve the reliability and scalability of services, automate operations and improve R&D efficiency * Build availability of large-scale services deployed across global data centers * Plan, manage and optimize cloud resources utilization, ensuring SLA of large-scale clusters * Measure and monitor availability, latency and overall service health * Practice sustainable incident response and postmortems.
Created: 2026-04-02