Dataset Description
|
Description
This dataset is the June 2025 Data Release of Cell Maps for Artificial Intelligence (CM4AI; CM4AI.org), the Functional Genomics Grand Challenge in the NIH Bridge2AI program. This Beta release includes perturb-seq data in undifferentiated KOLF2.1J iPSCs; SEC-MS data in undifferentiated KOLF2.1J iPSCs, iPSC-derived NPCs, neurons, cardiomyocytes, and treated and untreated MDA-MB468 breast cancer cells; and IF images in MDA-MB-468 breast cancer cells in the presence and absence of chemotherapy (vorinostat and paclitaxel).
External Data Links
Access external data resources related to this dataset:
Data Governance & Ethics
- Human Subjects: No
- De-identified Samples: Yes
- FDA Regulated: No
- Data Governance Committee: Jillian Parker (jillianparker@health.ucsd.edu)
- Ethical Review: Vardit Ravitsky (ravitskyv@thehastingscenter.org) and Jean-Christophe Belisle-Pipon (jean-christophe_belisle-pipon@sfu.ca)
Completeness
These data are not yet in completed final form:
- Some datasets are under temporary pre-publication embargo
- Protein-protein interaction (SEC-MS), protein localization (IF imaging), and CRISPRi perturbSeq data interrogate sets of proteins which incompletely overlap
- Computed cell maps not included in this release
Maintenance Plan
- Dataset will be regularly updated and augmented through the end of the project in November 2026
- Updates on a quarterly basis
- Long term preservation in the University of Virginia Dataverse, supported by committed institutional funds
Intended Use
This dataset is intended for:
- AI-ready datasets to support research in functional genomics
- AI model training
- Cellular process analysis
- Cell architectural changes and interactions in presence of specific disease processes, treatment conditions, or genetic perturbations
Limitations
Researchers should be aware of inherent limitations:
- This is an interim release
- Does not contain predicted cell maps, which will be added in future releases
- The current release is most suitable for bioinformatics analysis of the individual datasets
- Requires domain expertise for meaningful analysis
Prohibited Uses
- These laboratory data are not to be used in clinical decision-making or in any context involving patient care without appropriate regulatory oversight and approval
Potential Sources of Bias
Users should be aware of potential biases:
- Data in this release was derived from commercially available de-identified human cell lines
- Does not represent all biological variants which may be seen in the population at large
(2025-06-30)
|
Subject
| Medicine, Health and Life Sciences |
Keyword
|
AI, affinity purification, AP-MS, artificial intelligence, breast cancer, Bridge2AI, cardiomyocyte, CM4AI, CRISPR/Cas9, induced pluripotent stem cell, iPSC, KOLF2.1J, machine learning, mass spectroscopy, MDA-MB-468, neural progenitor cell, NPC, neuron, paclitaxel, perturb-seq, perturbation sequencing, protein-protein interaction, protein localization, single-cell RNA sequencing, scRNAseq, SEC-MS, size exclusion chromatography, subcellular imaging, vorinostat |
Related Publication
| References: Clark T, Parker J, Schaffer L, Obernier K, Al Manir S, Churas CP, Dailamy A, Doctor Y, Forget A, Hansen JN, Hu M, Lenkiewicz J, Levinson MA, Marquez C, Nourreddine S, Niestroy J, Pratt D, Qian G, Thaker S, Bélisle-Pipon JC, Brandt C, Chen J, Ding Y, Fodeh S, Krogan N, Lundberg E, Mali P, Payne-Foster P, Ratcliffe S, Ravitsky V, Sali A, Schulz W, Ideker T. Cell Maps for Artificial Intelligence: AI-Ready Maps of Human Cell Architecture from Disease-Relevant Cell Lines. 2024.doi: http://doi.org/10.1101/2024.05.21.589311 |
License/Data Use Agreement
|
CC BY-NC-SA 4.0
|