Voice Agent Engineer
Known, Inc - San Francisco, CA
Apply NowJob Description
Known - Conversational Systems Engineer, AI Voice San Francisco, CA (In-Person) 225k-330k Cash + Equity Known is a matchmaker that talks to users and supports them like a friend. Our mission is to empower humanity by applying general intelligence to human connection. Users join Known by telling us their life story. On average, our new users talk to our AI voice agent for 27 minutes, giving us a uniquely intimate multi-modal data set. We are a team of engineers who've created some of the most widely used AI-driven consumer products including Uber Eats, Uber, Faire and Afterpay. We love to work hard, with a high degree of autonomy and ownership. We work together in Cow Hollow, San Francisco. Learn more Known Our Launch Known's 10M Seed | TechCrunch "You Don't Need to Swipe Right" - Known | NYT Known | FastCompany Website About the Role We're looking for founding voice AI systems engineers to build and scale Known's core voice systems architecture, powering our voice-led onboarding and user experiences. This is a unique opportunity to work with a hyper-personalized data-set, combining voice transcripts, images, and structured user data to empower real-time, personalized AI voice-led conversations at scale. You'll work directly with Chen Peng, former head of ML at Uber Eats and Faire. What You'll Do You will be responsible for the first impression of a user's journey on Known. You'll have the autonomy to own: Low-Latency Orchestration: Architecting the real-time pipeline between STT (Speech-to-Text), LLM reasoning, and TTS (Text-to-Speech) to ensure conversational fluidness (Voice Personalization & Memory: Building systems that allow our AI to remember not just what a user said, but how they said it, incorporating tone and sentiment into long-term user profiles. Audio Intelligence: Implementing and fine-tuning Voice Activity Detection (VAD) and interrupt-handling logic so the AI feels responsive, empathetic, and polite during the onboarding interview. Streaming Infrastructure: Maintaining robust WebRTC or WebSocket-based systems to handle high-concurrency voice streams while maintaining audio fidelity. Evals for Voice: Developing custom evaluation frameworks to measure "conversational success," going beyond word error rate (WER) to assess personality, warmth, and engagement. Requirements We're looking for someone who obsesses over the "uncanny valley": 3-5 Years in ML/Systems: Proven experience deploying high-scale models in production, specifically focusing on audio processing or real-time streaming. The Voice Stack: Deep familiarity with modern STT/TTS frameworks (e.g., ElevenLabs, LiveKit, VITS and Sesame) and audio libraries like Librosa or FFmpeg. Agentic Conversational AI: Experience building "brain" logic for LLMs using tools like LangGraph or Haystack to manage complex, non-linear dialogue. Production Hardened: You've optimized model inference for speed using TensorRT, ONNX, or Triton, and you're comfortable in a Docker/Kubernetes/Cloud environment. Our Investors We're backed by Eurie Kim and Kirsten Green at Forerunner Ventures (the investors behind Decagon, Faire, and Oura), NFX and PearVC.
Created: 2026-03-06