StaffAttract
  • Login
  • Create Account
  • Products
    • Private Ad Placement
    • Reports Management
    • Publisher Monetization
    • Search Jobs
  • About Us
  • Contact Us
  • Unsubscribe

Login

Forgot Password?

Create Account

Job title, industry, keywords, etc.
City, State or Postcode

Machine Learning Data Engineer - Systems & Retrieval

Zyphra Technologies Inc. - Palo Alto, CA

Apply Now

Job Description

Zyphrais an artificial intelligence company based in Palo Alto, California.The Role:As aMachine Learning Data Engineer - Systems & Retrieval , you will build and optimize the data infrastructure that fuels our machine learning systems. This includes designing high-performance pipelines for collecting, transforming, indexing, and serving massive, heterogeneous datasets from raw web-scale data to enterprise document corpora. You’ll play a central role in architecting retrieval systems for LLMs and enabling scalable training and inference with clean, accessible, and secure data. You’ll have an impact across both research and product teams by shaping the foundation upon which intelligent systems are trained, retrieved, and reasoned over. You’ll work across:Design and implementation of distributed data ingestion and transformation pipelinesBuilding retrieval and indexing systems that support RAG and other LLM-based methodsMining and organizing large unstructured datasets, both in research and production environmentsCollaborating with ML engineers, systems engineers, and DevOps to scale pipelines and observabilityEnsuring compliance and access control in data handling, with security and auditability in mindRequirements:Strong software engineering background with fluency in PythonExperience designing, building, and maintaining data pipelines in production environmentsDeep understanding of data structures, storage formats, and distributed data systemsFamiliarity with indexing and retrieval techniques for large-scale document corporaUnderstanding of database systems (SQL and NoSQL), their internals, and performance characteristicsStrong attention to security, access controls, and compliance best practices (e.g., GDPR, SOC2)Excellent debugging, observability, and logging practices to support reliability at scaleStrong communication skills and experience collaborating across ML, infra, and product teamsBonus Skill Set:Experience building or maintaining LLM-integrated retrieval systems (e.g, RAG pipelines)Academic or industry background in data mining, search, recommendation systems, or IR literatureExperience with large-scale ETL systems and tools like Apache Beam, Spark, or similarFamiliarity with vector databases (e.g., FAISS, Weaviate, Pinecone) and embedding-based retrievalUnderstanding of data validation and quality assurance in machine learning workflowsExperience working on cross-functional infra and MLOps teamsKnowledge of how data infrastructure supports training pipelines, inference serving, and feedback loopsComfort working across raw, unstructured data, structured databases, and model-ready formatsWhy Work at Zyphra:Our research methodology is to make grounded, methodical steps toward ambitious goals. Both deep research and engineering excellence are equally valuedWe strongly value new and crazy ideas and are very willing to bet big on new ideasWe move as quickly as we can; we aim to minimize the bar to impact as low as possibleWe all enjoy what we do and love discussing AIBenefits and Perks:Comprehensive medical, dental, vision, and FSA plansCompetitive compensation and 401(k)Relocation and immigration support on a case-by-case basisOn-site meals prepared by a dedicated culinary team; Thursday Happy HoursIn-person team in Palo Alto, CA, with a collaborative, high-energy environmentIf you're excited by the challenge of high-scale, high-performance data engineering in the context of cutting-edge AI, you’ll thrive in this role. Apply Today! #J-18808-Ljbffr

Created: 2025-09-29

➤
Footer Logo
Privacy Policy | Terms & Conditions | Contact Us | About Us
Designed, Developed and Maintained by: NextGen TechEdge Solutions Pvt. Ltd.