Senior Engineer - Data, Schema & Knowledge Systems
IBM - San Jose, CA
Apply NowJob Description
Introduction At IBM Software, we transform client challenges into solutions. Building the worldu2019s leading AI-powered, cloud-native products that shape the future of business and society. Our legacy of innovation creates endless opportunities for IBMers to learn, grow, and make an impact on a global scale. Working in Software means joining a team fueled by curiosity and collaboration. Youu2019ll work with diverse technologies, partners, and industries to design, develop, and deliver solutions that power digital transformation. With a culture that values innovation, growth, and continuous learning, IBM Software places you at the heart of IBMu2019s product and technology landscape. Here, youu2019ll have the tools and opportunities to advance your career while creating software that changes the world. Your role and responsibilities We are seeking a Senior Software Engineer to own and evolve core platform systems spanning knowledge ingestion, memory architecture, evaluation infrastructure, and gateway data management. This role has deep architectural ownership across Rust- and Go-based services and directly impacts search quality, model evaluation, compliance, and platform extensibility.What Youu2019ll OwnKnowledge Base & Memory (Rust) Design and evolve the data schema and ingestion pipeline supporting large-scale documentation corpora, including document extraction, segmentation, and hybrid search (BM25 + vector). Improve corpus quality through deduplication, relevance tuning, quality scoring, source-of-truth tracking, and versioned corpus management. Own the memory architecture across working, semantic, and observational memory tiers, designing retrieval that is context-aware and budget-conscious. Evolve federated search capabilities, including multi-KB querying, relevance tuning, embedding model selection, and quality metrics. Build and scale an evaluation curation system for an LLM-as-judge framework, including versioned eval datasets, regression baselines, and authoring tooling.Gateway Data Management (Go / Rust) Design and implement a schema-driven entity registry with YAML-defined schemas, enabling new infrastructure connectors without code changes. Own declarative state machine configuration decoupled from hardcoded logic. Design a domain-agnostic evidence model to support audit and compliance requirements (e.g., PCI-DSS, SOX). Formalize metadata and provenance tracking across entities, including import/export and multi-connector support.Evaluation Infrastructure (Go) Extend evaluation frameworks for end-to-end coverage across composable pipelines. Design eval schemas, dataset management tooling, and regression thresholds. Partner with other teams on shared benchmarks, test corpora, and multi-model evaluation strategy. Track and report model quality metrics to support production deployment decisions.What the First 90 Days Look LikeMonth 1: Onboard onto the Rust and Go services. Understand the knowledge base ingestion pipeline end-to-end: document download, extraction, chapter splitting, indexing, federated search. Run the eval framework, review existing eval cases, understand the LLM-as-judge scoring rubric. Identify quality issues in the current documentation corpus.Month 2: Ship the entity registry refactor u2014 dynamic entity registration with YAML schema definitions. Design the eval curation system u2014 dataset versioning, case authoring tooling, regression baseline management. Begin expanding eval corpus coverage.Month 3: Ship the evidence model schema. Implement eval curation tooling. Begin knowledge base quality improvements u2014 deduplication, source-of-truth tracking, relevance tuning. Establish eval quality dashboard with cross-model comparison. Required technical and professional expertise u00b7 Data modeling instincts. You think naturally about schemas, entity relationships, state machines, and how data evolves over time. Youu2019ve designed data models that other engineers build against.u00b7 Information retrieval or search experience. Youu2019ve worked with search indexing, document processing, corpus management, or similar u2014 you understand how to make unstructured data findable and useful.u00b7 You can ship across languages. This role works in both Rust and Go. You donu2019t need to be an expert in both, but you need to be productive in at least one and willing to learn the other.u00b7 Quality measurement mindset. Youu2019ve built or worked with evaluation systems, quality metrics, regression detection, or A/B testing infrastructure. You understand how to measure whether something is getting better or worse. Preferred technical and professional experience You donu2019t need all of these coming in. The team will bring you up to speed:u00b7 IBM Z domain knowledge u2014 the documentation sets, infrastructure concepts, and operational patterns that the knowledge base servesu00b7 LLM evaluation methodology u2014 rubric-based scoring, LLM-as-judge patterns, baseline regression, multi-model comparisonu00b7 Our knowledge base ingestion pipeline (document extraction, chunking, vector + full-text indexing)IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Created: 2026-04-16