Reproducibility and Generalization of State-of-the-art scRNAseq Cell Type Annotation
Event Details
- Type
- Other Lectures and Symposia
- Speaker(s)
-
Simona Critea, Ph.D., head of genomics data science and AI at the Hale Center for Pancreatic Cancer Research, Dana-Farber Cancer Institute; group leader in data science and AI and senior research scientist, Harvard School of Public Health
- Speaker bio(s)
-
Simona Cristea is the Head of Genomics Data Science and AI at the Hale Center for Pancreatic Cancer Research at the Dana-Farber Cancer Institute, Group Leader in Data Science and AI, and Senior Research Scientist at Harvard School of Public Health. Simona’s interdisciplinary data science group is working on connecting AI-driven data analysis with mechanistic evolutionary modeling of cancer data. Simona is particularly interested in Large Language Models applied to single cell data with the goal of understanding and simulating tumor progression and identifying novel ways to prevent tumors from initiating.
Content: Single cell RNA Sequencing (scRNAseq) data is a main pillar of many biological investigations across academia and industry. Therefore, reliably annotating scRNAseq is crucial. The very specific problem of how to label single cells with relevant cell types remains of high interest for the entire biological community, and the core step of any scRNAseq bioinformatics pipeline. Even though an enormous number of scRNAseq cell type annotation algorithms have been developed, most medical and biology groups still annotate their data manually based on marker genes, in a time-intensive and labor-intensive process. The existing scRNAseq cell type annotation algorithms have concrete problems hindering their true adoption, such as their predictions are either too broad or too specific to be relevant, and they fail to reliably generalize to novel datasets, despite being trained on a huge amount of open-source data. In my talk, I will discuss specific generalization challenges of state-of-the-art scRNAseq cell type models, and offer an alternative approach, using state-of-the-art foundation models such as DeepSeek-R1 for annotating single cells.
- Open to
- Tri-Institutional