Senior/Staff Scientist, Data Science New Berkeley, CA
Glyphic - California, MO
Apply NowJob Description
At Glyphic Biotechnologies, we plan to create the protein revolution for which scientists and researchers have been waiting. We are developing a massively parallel, single-molecule proteome sequencing platform that will transform life science discovery and usher in a new era of insights into human biology and disease. To date, we have raised >$50M from venture partners and non-dilutive grant funding to achieve our vision of next generation proteome sequencing.What we are looking for in youGlyphic is seeking a highly motivated and experienced Senior/Staff Data Scientist to assist in the advancement of our cutting-edge single molecule proteome sequencing platform which has the potential to transform how we understand biology and develop new medicines.We're looking for a Senior Data Scientist who's excited about solving complex, real-world problems with cutting-edge technology. You'll work directly with our CTO and a collaborative team of scientists, engineers, and bioinformaticians who are passionate about pushing the boundaries of what's possible.This is a CA-based hybrid role and you'll spend ~20% of your time on-site with the team in Berkeley, CA (on average), with flexibility for additional collaboration as projects require.What you’ll doData Analysis and Insight Generation :Design and implement novel algorithms to analyze proteomics data that no one has ever seen before.Develop machine learning models that can extract meaningful insights from complex, noisy biological signals.Develop and optimize algorithms for analyzing high-dimensional chemistry and NGS data, including single cell, spatial data, and LCMS data outputsBuild models that reveal how parameters and molecular interfaces drive outcomes, including surface interactions and molecule-target binding.Design and execute biostatistical analyses using Python and/or R to uncover significant trends, model experimental outcomes, and inform data-driven decision-making.Apply machine learning to guide experiment design, identify key parameters, and optimize workflows for efficiency and reproducibility.Develop clear, insightful visualizations that make complex, high-dimensional results understandable and actionable for scientists and stakeholders.Help define metrics and visualizations that clarify high-dimensional relationships for scientists and stakeholders.Partner with wet lab, hardware, and software teams to translate experimental goals into computational strategies.Pipelines and Automation :Create ETL pipelines that clean, normalize, and integrate diverse datasets (sequencing reads, LCMS spectra, metadata) into analysis-ready bine off-the-shelf pipelines (basecalling, variant calling, deconvolution) with custom scripts to deliver end-to-end solutions.Continuously improve throughput and data quality by automating QC steps and integrating feedback from experiments.Establish best practices for code quality, testing, and deployment that will scale with our growing team.What you needRequired :PhD in Computer Science, Bioinformatics, Computational Biology, Biostatistics or related field with 4+ (Senior) or 6+ (Staff) years of hands-on experience.Proven ability to model and interpret high-dimensional datasets with numerous interacting variables, uncovering statistically robust patterns and causal petency in chemistry data science (e.g., interpreting LCMS data, utilizing deconvolution tools, understanding surface chemistry and molecule-target interactions).Competency in next generation sequencing, including familiarity with multi-omics, error modeling, and basecalling.Expertise in Python and/or R for biostatistical analysis, including data wrangling, statistical modeling, and visualization of high-dimensional experimental results.Experience designing ML models for experimental data and deploying pipelines (Snakemake, Nextflow).Familiarity with ML frameworks (PyTorch, TensorFlow) and data science libraries (pandas, numpy, scipy).Experience building automated data pipelines and infrastructure for scalable analysis (cloud, Docker/Kubernetes).Experience with cloud platforms (AWS, GCP, or Azure) and containerization tools (Docker, Kubernetes).Proficiency with data visualization tools (matplotlib, seaborn, plotly) and Jupyter notebooks.Familiarity with version control (git) and pipeline workflow systems (Snakemake, Nextflow, etc.)Nice to have :Ability to work in performant languages (C++, Rust, Julia, or CUDA).Ability to develop solutions that optimize the utilization of large-scale data storage, cloud processing infrastructure, and distributed computing.Deep learning experience with time-series data, signal processing, or sequence modeling.Ability to build and deploy scalable ML pipelines using PyTorch/TensorFlow for real-time protein sequence analysis.Experience with MLOps tools and practices for model deployment and monitoring.Experience building commercially successful life science tools that other scientists actually use and love.Previous startup or fast-paced industry (e.g., skunkworks) experience.We’re looking for teammates with :Excellent interpersonal skills – capable of building strong relationships and communicating effectively with stakeholders at all levels.High emotional and analytical intelligence – able to navigate complex team dynamics, partnerships, and challenges with creativity and logic.Resourceful adaptability – operates with urgency, remains flexible in evolving environments, and thrives in ambiguity.Collaborative spirit – enjoys working across disciplines and explaining complex concepts to diverse audiences.What you can expectWork environment :Flexible hybrid schedule with quarterly team gatheringsAccess to cutting-edge technology and computational resourcesCollaborative culture where your ideas and expertise are valuedDirect impact on product development and company directionProfessional growth :Work on problems that don't have solutions in textbooksYour algorithms will directly influence experimental design and product developmentDebug and optimize real experimental results, not just theoretical datasetsBridge the gap between cutting-edge research and practical applicationsLearn from a diverse team of world-class scientists and engineersContribute to first-of-their kind technologies, high-impact publications, and patentsCompensationEstimated Base Salary $168,000 - 238,000This is the pay range for this position that we reasonably expect to pay. Individual compensation is based on various factors including, experience, education, skillset, and geographic location. This range is for the SF Bay Area, California location and may be adjusted to the labor market in other geographic areas .Benefits and Perks:Employee Stock Option PlanEmployer Retirement Contributions to 401(k)Generous Paid Time OffPaid Maternity and Paternity LeaveOffice Snacks and BeveragesRegular Team Bonding ActivitiesWe are an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Individuals seeking employment at Glyphic Biotechnologies are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.Apply for this job*indicates a required fieldFirst Name *Last Name *Preferred First NameEmail *PhoneCountry *Phone *Resume/CV *Enter manuallyAccepted file types: pdf, doc, docx, txt, rtfEnter manuallyAccepted file types: pdf, doc, docx, txt, rtfLinkedIn ProfileWebsiteHow did you hear about this job?Are you legally authorized to work for Glyphic Biotechnologies in the United States? * Select...Will you require sponsorship for employment visa status now or in the future? * Select...If offered a position with Glyphic Biotechnologies, I understand my employment may be contingent upon a reference check, and any other required check or screening specified in the job description, the offer letter and/or discussed during the interview process. * Select...How many combined years of experience do you have in a data science role within the biotech or pharma industry? * Select...Have you led or contributed significantly to a data science project from conception to deployment? * Select... #J-18808-Ljbffr
Created: 2025-09-30