StaffAttract
  • Login
  • Create Account
  • Products
    • Private Ad Placement
    • Reports Management
    • Publisher Monetization
    • Search Jobs
  • About Us
  • Contact Us
  • Unsubscribe

Login

Forgot Password?

Create Account

Job title, industry, keywords, etc.
City, State or Postcode

Member of Technical Staff - ML Infrastructure & ...

Embedding VC - San Mateo, CA

Apply Now

Job Description

Introducing Moonlake, AI for creating real-time interactive content Mission: Improve Throughput, Latency, & Cost - deploying our models 2-10× faster & cheaper without quality regressions. Scope of Work: - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs. - Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing. - Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning. - Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving. - Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback. Tech signals: Previous experience at Infra-heavy startups such as Databricks, Roblox We are committed to being an on-site, in-person team currently based in San Mateo

Created: 2026-03-04

➤
Footer Logo
Privacy Policy | Terms & Conditions | Contact Us | About Us
Designed, Developed and Maintained by: NextGen TechEdge Solutions Pvt. Ltd.