AI/ML Ops & Infrastructure Engineer - Q126
R2 Technologies - Alpharetta, GA
Apply NowJob Description
Job Title: AI/ML Ops & Infrastructure Engineer Company: R2 Technologies Location: Alpharetta, GA (Hybrid / Remote Options Available) Employment Type: Full-Time / Contractual About R2 Technologies: R2 Technologies is a Certified Minority Business Enterprise (MBE) headquartered in Alpharetta, GA. With over two decades of experience across global markets, we have built a reputation as a trusted partner for IT staffing excellence and cutting-edge digital product innovation. We are driven by innovation and operate on a simple philosophy: "We deliver what we promise, and we promise only what we can deliver." Beyond providing top-tier IT talent, R2 builds cutting-edge proprietary solutions like SmartEnt-an Enterprise AI & IoT Intelligence Platform utilizing advanced NLP and AI technologies. By partnering closely with our clients, we deliver technology-driven outcomes that are realistic, measurable, and impactful. Job Summary: The shift from classical Machine Learning to Generative AI requires a new breed of infrastructure engineering. R2 Technologies is looking for an AI/ML Ops & Infrastructure Engineer to build and manage the operational backbone for our advanced LLM and agentic systems. You will transition beyond basic CI/CD to implement full-lifecycle LLMOps-managing foundation models, fine-tuned adapters, routing logic, and guardrails. Your work will ensure that our AI solutions, including SmartEnt, run with high performance, optimal GPU utilization, and rigorous compliance. Key Responsibilities: Design and maintain highly scalable LLMOps pipelines for continuous integration, evaluation, and deployment of machine learning models and AI agents. Deploy and manage containerized AI applications and model inference servers (e.g., vLLM, Ray Serve, NVIDIA Triton) on Kubernetes across multi-cloud environments (AWS, GCP, Azure). Implement comprehensive observability and trace-level logging for multi-step agentic workflows using platforms like LangSmith, W&B Weave, or MLflow. Automate infrastructure provisioning and monitoring using tools like Terraform and agent-driven workflows (e.g., n8n, GitHub Actions). Optimize GPU computing costs, latency, and token usage for high-traffic AI inference endpoints. Enforce security guardrails, toxic output filtering, and robust access policies within the AI deployment infrastructure. Actively utilize AI-assisted coding tools (Copilot, Cursor) to automate infrastructure-as-code (IaC) and streamline Kubernetes management. Qualifications: Up to 3 years of hands-on experience in MLOps, DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure. Strong proficiency in containerization and orchestration (Docker, Kubernetes). Experience with ML/LLM operational platforms (MLflow, Weights & Biases, Databricks Mosaic AI, or SageMaker). Familiarity with serving open-source or fine-tuned LLMs and optimizing inference performance. Proven experience or strong familiarity working alongside AI coding assistants to enhance productivity. Scripting/programming skills in Python and bash, along with experience in CI/CD automation. Passion for the evolving landscape of AI infrastructure, cost-optimization (FinOps), and system reliability. Skills: Nvidia,Infrastructure
Created: 2026-03-12