Site Reliability Engineer with ML platform - Only W2
Saransh Inc - Sunnyvale, CA
Apply NowJob Description
OverviewTitle: Site Reliability Engineer SRE – ML platformLocation: Austin, TX or Sunnyvale, CAEmployment type: Full-time • Seniority: Mid-Senior level • ONLY W2ResponsibilitiesContinuous Deployment using GitHub Actions, Flux, KustomizeDesign and implement cloud solutions, build MLOps on cloud AWSData science model containerization, deployment using docker, VLLM, KubernetesCommunicate with a team of data scientists, data engineers and architects, document the processesDevelop and deploy scalable tools and services for our clients to handle machine learning training and inferenceKnowledge of ML models and LLMQualifications6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWSGood understanding of Apache SOLRProficient with Linux administrationKnowledge of ML models and LLMAbility to understand tools used by data scientists and experience with software development and test automationAbility to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)Experience working with cloud computing and database systemsExperience building custom integrations between cloud-based systems using APIsExperience developing and maintaining ML systems built with open-source toolsExperience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and KubernetesExperience developing containers and Kubernetes in cloud computing environmentsFamiliarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)Ability to translate business needs to technical requirementsStrong understanding of software testing, benchmarking, and continuous integrationExposure to machine learning methodology and best practicesGood communication skills and ability to work in a team #J-18808-Ljbffr
Created: 2025-09-17