Senior Data Scientist – Machine Learning Data ...
TurbineOne - San Francisco, CA
Apply NowJob Description
About the JobSenior Data Scientist – Machine Learning Operations –Company Intro: TurbineOne is the frontline perception company. We deliver decision advantage, better situational awareness, and stronger force protection. Our customers love how we automate the right portions of the military intelligence cycle while keeping them in the loop. The company is a small, fast-moving, and high-performance startup that is backed by the best DefenseTech venture capitalists.Job Title : Data ScientistReporting to the Machine Learning team leadGeographically flexible for home-officeResponsibilitiesIngesting, organizing, and maintaining large-scale training datasets from open-source resources and contract-specific artifactsCreating and managing data cataloging systems to ensure datasets are findable, accessible, and ready for ML training pipelinesDesigning and implementing data labeling workflows, including managing external labeling vendors and quality assurance processesBuilding and maintaining YOLO-style manifests and annotation formats for custom computer vision datasetsPerforming data cleaning, validation, and augmentation to ensure high-quality training dataConducting exploratory data analysis and generating insights about dataset characteristics, biases, and coverage gapsSupporting the ML research team with statistical analysis, experiment design, and model evaluationDeveloping data pipelines and automation tools for continuous data ingestion and processingCollaborating with ML engineers to optimize data loading and preprocessing for training efficiencyOn a Typical Day You WouldProcess incoming datasets from various sources, performing quality checks and organizing them into our data management systemCreate or review annotation schemas and coordinate with labeling teams to ensure consistent, high-quality labelsWrite Python scripts to clean, transform, and validate datasets for specific ML training requirementsAnalyze dataset statistics and create visualizations to identify potential issues or opportunities for improvementCollaborate with the ML research lead to design experiments and evaluate model performance across different data splitsDocument dataset characteristics, versioning, and lineage to maintain reproducibility and complianceQualifications5+ years of experience in data science, analytics, or related field with focus on ML data preparationStrong foundation in probability, statistics, and experimental designBachelor’s degree in Statistics, Mathematics, Computer Science, or related quantitative field (Master’s preferred)Proficiency with Python data stack: Pandas, NumPy, Jupyter Notebooks, and data visualization librariesExperience with ML frameworks (PyTorch, Scikit-learn) and familiarity with training workflowsHands-on experience with computer vision datasets and annotation formats (COCO, YOLO, Pascal VOC)Experience managing data labeling projects and working with annotation tools (Label Studio, CVAT, or similar)Familiarity with open-source ML models and experience applying them to real-world problemsStrong SQL skills and experience with data warehousing conceptsExperience with version control (Git) and collaborative development practicesExcellent communication skills for coordinating with technical and non-technical stakeholdersMeticulous attention to detail and strong organizational skills for managing complex datasetsWillingness to embrace the Startup Culture of moving fast, being insatiably curious, celebrating often, embracing uncertainty, and having a personal desire to improve other peoples’ livesNice to HaveExperience with defense or security-related datasetsKnowledge of edge computing constraints and data optimization techniquesExperience with distributed data processing frameworks (Spark, Dask)Familiarity with MLOps practices and toolsBackground in specific domains relevant to perception systems (satellite imagery, sensor fusion, etc.)Startup Culture and EligibilityWe’re a small, fully remote team and everything is our responsibilityOur team thrives on autonomy, trust and solid communicationEveryone on the Team needs to be very comfortable with constant change, moving fast, sharing failures, embracing grit, and building things themselvesMust be eligible to obtain a clearance with the U.S. government #J-18808-Ljbffr
Created: 2025-10-01