Senior Data Engineer
Midjourney - San Francisco, CA
Apply NowJob Description
OverviewWe’re the data team behind Midjourney's image generation models. We handle the dataset side: processing, filtering, scoring, captioning, and all the distributed compute that makes high-quality training data possible.ResponsibilitiesLarge-scale dataset processing and filtering pipelinesTraining classifiers for content moderation and quality assessmentModels for data quality and aesthetic evaluationData visualization tools for experimenting on dataset samplesTesting/simulating distributed inference pipelinesMonitoring dashboards for data quality and pipeline healthPerformance optimization and infrastructure scalingOccasionally jumping into inference optimization and other cross-team projectsOur current stackPySpark, Slurm, distributed batch processing across hybrid cloud setup. We're pragmatic about tools - if there's something better, we'll switch.Who we're looking forData engineering/ML pipelines at scale, orCloud/infrastructure with distributed systems experienceDon't need exact tech matches - comfort with adjacent technologies and willingness to learn matters more. We work with our own hardware plus GCP and other providers, so adaptability across different environments is valuable.LocationLocation: SF office a few times per week (we may make exceptions on location for truly exceptional candidates)About the roleThe role offers variety, our team members often get pulled into different projects across the company, from dataset work to inference optimization. If you're interested in the intersection of large-scale data processing and cutting-edge generative AI, we'd love to hear from you.Seniority levelMid-Senior levelEmployment typeFull-timeJob functionInformation TechnologyIndustriesResearch Services #J-18808-Ljbffr
Created: 2025-09-17