LatchBio

Share this post

Announcing 4000+ Single-cell Datasets

blog.latch.bio

Discover more from LatchBio

The Biocomputing Revolution.
Over 14,000 subscribers
Continue reading
Sign in

Announcing 4000+ Single-cell Datasets

Our partnership with Elucidata: Datasets from 10+ sources, 91 cell lines, 508 tissues, 712 diseases, and 106 drugs. Pre-harmonized, in consistent schemas, and annotated with ontology mappings.

Alfredo Andere
and
Hannah Le
Nov 16, 2022
12
Share this post

Announcing 4000+ Single-cell Datasets

blog.latch.bio
Share

There are more than 80TB of semi-structured and unstandardized datasets in the public domain, scattered across several databases, making it hard to search and interpret. A wide array of pre-processing steps make harmonizing multiple datasets almost impossible without complex data science skills.

Elucidata has spent thousands of human hours tackling these problems. Using Polly, the company’s in-house data harmonization engine, Elucidata is able to transform TBs of public datasets into a standardized tabular schema, giving biologists a single source of truth for biomolecular data.

Today we are excited to announce our partnership with them, which will allow any user of LatchBio to explore and use that data seamlessly.

We hope that together we can make it easy for scientists to adopt a data-focused approach for formulating new hypotheses, validating hypotheses, and integrating public data into their research.

Twitter avatar for @AlfredoAndere
Alfredo Andere @AlfredoAndere
Accessing curated single-cell data just got way easier. Today we are announcing our partnership with @elucidata. Instant access to 4000+ single-cell sequencing datasets from GEO, Human cell Atlas & Zenodo on @LatchBio. 91 cell lines, 508 tissues, 712 diseases, 106 drugs! (1/5)
5:13 PM ∙ Nov 16, 2022
12Likes3Retweets

From hundreds of terabytes of data across different platforms, we now have one source of data:

Harmonized

Elucidata's curation models provide rich and harmonized metadata annotations with scientific context. Point-and-click filters allow scientists to find datasets most relevant to their research.

Consistent Schemas

All single-cell datasets are processed (identifier mapping, normalization, quality check) through a standard pipeline and made available in consistent h5ad formats. The data is readily usable for visualization and downstream analysis.

Ontology Mapping

Every dataset is mapped to Organism (NCBI Taxonomy), Cell type (Cell Ontology), Disease (MeSH: Medical Subject Headings), Cell Line (The Cellosaurus), Tissue (BRENDA Tissue Ontology), and Drug (ChEBI: Chemical Entities of Biological Interest)

Explore Now!

You can explore it all now on console.latch.bio/datasets

This marks the beginning of our collaboration with Elucidata. We will continue adding many more datasets across a range of modalities. Do you have a dataset that you'd like to see on Latch? Let us know!

12
Share this post

Announcing 4000+ Single-cell Datasets

blog.latch.bio
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 LatchBio
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing