AI & NLP Fellowship: Data Engineering for Social Impact
Institute for Development Impact - I4DI - Washington, DC
Apply NowJob Description
AI & NLP Fellowship: Data Engineering for Social Impact2 days ago Be among the first 25 applicantsInstitute for Development Impact (I4DI) | DECipher ProjectAbout the ProjectDECipher is an AI-powered platform developed by the Institute for Development Impact (I4DI) to help global development professionals access and interpret decades of USAID-funded learning. It draws from one of the largest public document archives in international development, transforming raw PDFs into structured insights using modern machine learning techniques.At its core, DECipher is a public infrastructure project. It connects natural language processing with real-world policy and program decisions. The work is technical, but the impact is human. It supports smarter, more accountable development efforts worldwide.The OpportunityWe are offering a volunteer summer fellowship for individuals who want to gain real experience working with applied AI systems. Fellows will help us prepare a large, high-value dataset for fine-tuning domain-specific language models.This is not a theoretical exercise. You will be working directly with tens of thousands of documents, contributing to the quality and integrity of training data that powers an open-access AI tool for public benefit. While unpaid, this role offers serious technical learning and the chance to be part of something that is both ambitious and grounded.What You Will Work OnProcess and clean large volumes of unstructured PDF documentsDevelop and manage text extraction workflows using Python and NLP toolsReview document structure and metadata for consistency and qualityLabel and classify documents to support supervised and semi-supervised learningSupport QA and data validation steps critical for model fine-tuningWork with experienced engineers and researchers on a functioning AI pipelineWhat You Will LearnHow to build structured datasets for training large language modelsTechniques in OCR, document parsing, tokenization, and quality assuranceHow NLP systems are adapted to real-world, domain-specific use casesWhat it takes to make AI systems both reliable and accountableWho You AreCurrent student, recent graduate, or early-career professional with experience in Python and interest in NLP, machine learning, or data engineeringComfortable working with complex documents, legacy formats, and detailed guidelinesMotivated by mission-driven tech and open-access knowledgeLooking for more than just a credential, you want meaningful work and real learningWhat You Will GainApplied experience with large-scale data preparationA practical, portfolio-worthy contribution to an operational AI systemMentorship from a team experienced in responsible AI and development practiceFlexible hours and remote collaborationPossibility for extended work or future opportunities based on performanceFully remoteSummer 20258 to 12 week commitment30 to 40 hours per week, flexible schedulingHow to ApplySend a brief message describing your interest and experience, along with a resume and link to relevant work, to levelInternshipEmployment typeInternshipJob functionResearch, Analyst, and Information TechnologyIndustriesInternational Trade and Development #J-18808-Ljbffr
Created: 2026-04-20