APIGEE Cloud Admin Management
Diamondpick - Buffalo Grove, IL
Apply NowJob Description
Max Vendor Rate is $63-$65 Remote or hybrid in Buffalo Grove, IL.Please note candidate preference on resume 5 Days / 40 Hours per week Subcontracting permitted Job Title : Google Site Reliability Engineer Technical Skillset: Knowledge/experience in GCP (Big Query, Cloud storage, Dataproc, GKE, Airflow/Composer ,Pub-sub, Cloud function, Cloud SQL etc). Knowledge/experience in Github & Visual Studio code. Knowledge/experience in MS-Copilot. Knowledge/experience in Prometheus, Grafana & Splunk. Knowledge in Python/Pyspark/Machine learning is an added advantage Job Description: Site Reliability Engineers combined software engineering with systems and infrastructure operations to build and run large, reliable, scalable services. Role focused on: Responsible for Incident Detection & Logging and meeting agreed SLA for incident tickets. Responsible for Bridge Activation & Communication (P1-P2). Postmortem Preparation (Within 24-72 Hours) & Root Cause Analysis. Responsible for critical monitoring activities ,Problem Management & Grafana Integration. Participate in oncall rotations, handle incidents, and drive timely mitigation and recovery. Automating operational work so services can scale without manual toil also operating highly available, low latency & secure systems. Defining and measuring reliability through SLIs/SLOs and error budgets. Build and maintain observability: metrics, logs, traces, dashboards, and alerts for critical services. Tune alerting to reduce noise while ensuring rapid detection of user impacting issues. Lead or contribute to post incident reviews and root cause analysis and ensure follow up actions are implemented to prevent recurrence. Added Advantage if resource is familiar on Tools Tidal, Service Now,Xmatters,Abinitio,Tableau,Opsgenie&Zeke. Soft skills: Clear written and verbal communication, particularly under pressure (e.g., during incidents). Ability to collaborate across multiple teams and influence engineering practices through expertise rather than authority. Strong communication, analytical ,Knowledge on entire Incident management life cycle process, Agile model experience, and problem-solving skills
Created: 2026-03-04