StaffAttract
  • Login
  • Create Account
  • Products
    • Private Ad Placement
    • Reports Management
    • Publisher Monetization
    • Search Jobs
  • About Us
  • Contact Us
  • Unsubscribe

Login

Forgot Password?

Create Account

Job title, industry, keywords, etc.
City, State or Postcode

Lead Data SRE (Hybrid - Chennai, India)

Insight Global - Irvine, CA

Apply Now

Job Description

Job Description Role Overview The Data SRE Lead is responsible for ensuring the reliability, scalability, performance, and operational excellence of the organizationu2019s data platforms and pipelines. This role bridges Data Engineering and Site Reliability Engineering practices, applying SRE principles to modern data ecosystems (batch, streaming, warehousing, and ML data infrastructure). This a hybrid role sitting in the clients Chennai, India location 3 days per week. Key Responsibilities Reliability & Operations Define and own SLIs, SLOs, and SLAs for data platforms and pipelines Design and implement monitoring, alerting, and observability solutions Lead incident response, root cause analysis (RCA), and postmortems Reduce toil through automation and self-healing infrastructure Data Platform Stability Ensure high availability of: Data warehouses and lakehouses Streaming systems ETL/ELT pipelines Orchestration frameworks Implement capacity planning and performance tuning strategies Improve data pipeline reliability, freshness, and latency metrics Infrastructure & Automation Manage infrastructure-as-code (IaC) frameworks Improve CI/CD pipelines for data workflows Implement automated testing and validation for data infrastructure Drive resilience patterns such as retries, circuit breakers, and graceful degradation Leadership & Strategy Lead and mentor a team of Data SREs Define operational standards and reliability roadmaps Collaborate cross-functionally with Data, Engineering, and Product leadership Drive a culture of reliability and operational excellence We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: Skills and Requirements 8+ years in Site Reliability Engineering, Platform Engineering, or Data Engineering 3+ years in a technical leadership role Strong experience with: Cloud platforms (AWS, GCP, or Azure) Infrastructure as Code (Terraform, CloudFormation) Monitoring tools (Prometheus, Datadog, Grafana) Containerization & orchestration (Docker, Kubernetes) Deep understanding of distributed systems and failure modes Experience supporting large-scale data systems (batch & streaming) Experience with modern data platforms (Snowflake, BigQuery, Databricks) Experience with streaming systems (Kafka, Pub/Sub, Kinesis) Knowledge of data quality frameworks and data observability Familiarity with ML platform reliability

Created: 2026-03-09

➤
Footer Logo
Privacy Policy | Terms & Conditions | Contact Us | About Us
Designed, Developed and Maintained by: NextGen TechEdge Solutions Pvt. Ltd.