Over the past four years in the Gerstein Lab, I’ve poured a lot of time into two main projects under the NIH Extracellular RNA Communications Consortium. One of the main groups involved in this effort, the International Society for Extracellular Vesicles, just held their annual meeting in May. In the months before, we briefly wrote up both projects and my mentor traveled to Barcelona to present them. Just before the meeting, I was excited to see that the two abstracts were published in the Journal of Extracellular Vesicles. Both are currently being drafted as full-length manuscripts; fingers crossed that they go through smoothly!
PT03.10 The extracellular RNA-Seq processing pipeline of the Extracellular RNA Communication Consortium
Joel Rozowsky1; Robert R. Kitchen2; Jonathan Park1; Timur Galeev1; James A. Diao1; Jonathan Warrell1; William Thistlethwaite3; Sai L. Subramanian3; Aleksandar Milosavljevic3; Mark B. Gerstein1
1Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, USA
2Exosome Diagnostics, Boston, USA
3Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, USA
Background: We will present the tools of the Extracellular RNA Communication Consortium that have been developed for the analysis of extracellular RNA-Seq data and have been used in the construction of a comprehensive atlas of exRNAs in human body fluids. Due to the process of extracting, purifying and sequencing of RNA from extracellular biofluids, exRNAs are more vulnerable to contamination than cellular RNA samples. To address this, we have developed the extracellular RNA processing tool (exceRpt), optimized for the analysis of exRNA-seq data.
Methods: This involves three steps: (1) pre-processing: support for random-barcoded libraries, spike-in sequences for calibration or titration, and explicit removal of common laboratory contaminants and ribosomal RNAs. (2) Endogenous processing: alignment and quantitation of the full set of annotated, potentially spliced, endogenous RNA transcripts including all known miRNAs, tRNAs, piRNAs, snoRNAs, lincRNAs, mRNAs, retrotransposons and circular RNAs. (3) Exogenous processing: alignment to annotated exogenous miRNAs in miRBase and exogenous rRNA sequences in the RDP and alignments to the full genomes of all sequenced bacteria, viruses, plants, fungi, protists, metazoa and selected vertebrates.
Results: We have developed a novel algorithm for characterizing alignments to all available exogenous genomes in terms of the NCBI taxonomy tree. Existing approaches that remove degenerate sequences (i.e. those that co-occur across multiple species) result in a loss of potentially valuable data as the occurrence of reads aligning to multiple species/strains is very frequent. This is done independently for exogenous reads aligning to exogenous rRNA as well as exogenous genome sequences.
Summary/Conclusion: The exceRpt pipeline (available at genboree.org and github.gersteinlab.org/exceRpt) generates a variety of sample-level quality control metrics, produces abundance estimates for various RNA biotypes, including detailed reports of this processing. The exceRpt pipeline (including endogenous and exogenous processing steps) has been used to uniformly process all ~2500 exRNA-Seq data sets that are in the ERCC exRNA Atlas (exrna-atlas.org). We will also present the quality control metrics as applied to the current available extracellular RNA-Seq data sets of the ERCC in the exRNA Atlas.
OS27.06 ExRNA Atlas analysis provides an exRNA census and reveals six types of vesicular and non-vesicular exRNA carrier profiles detectable across human body fluids
Oscar D. Murillo1; William Thistlethwaite1; Rocco Lucero1; Sai Lakshmi Subramanian1; Neethu Shah1; Andrew R. Jackson1; Joel Rozowsky2; Robert R. Kitchen3; James Diao2; Timur Galeev2; Jonathan Warrell2; Kristina Hanspers4; Anders Riutta4; Alexander Pico5; Roger P. Alexander5; David Galas5; Andrew I. Su6; Louise C. Laurent7; Kendall Jensen8; Matthew Roth1; Mark B. Gerstein2; Aleksandar Milosavljevic1
1Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, USA
2Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, USA
3Exosome Diagnostics, Boston, USA
4Gladstone Institutes, San Francisco, USA
5Pacific Northwest Research Institute, Seattle, USA
6Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, USA
7University of California, San Diego, San Diego, USA
8Neurogenomics, Translational Genomics Research Institute, Phoenix, USA
Background: To gain insights into exRNA communication, the NIH Extracellular RNA Communication Consortium created the Extracellular RNA Atlas including 5309 exRNA-seq and qPCR profiles, most obtained from five body fluids (cerebrospinal fluid, saliva, serum, plasma, urine).
Methods: Extensive metadata, uniform processing and standardized data quality assessments facilitated integrative analysis of miRNA, tRNA, Y RNA, piRNA, snRNA, snoRNA and lincRNA abundance across 21 data sets represented in the Atlas. A computational deconvolution method was applied to infer ncRNA profiles of specific exRNA carriers (vesicular or not) and to estimate relative amounts of exRNA contributed to each Atlas sample by the carriers.
Results: We obtain a census of ncRNAs that includes, among others, 96 miRNAs abundantly detected (>10 RPM) in CSF, saliva, serum, and plasma, of those, 46 are detected in all five fluids, including urine. Deconvolution of ncRNA profiles reveals six major carrier types and a striking amount of their sample-to-sample abundance variability. In contrast, highly concordant exRNA profiles of all six carrier types can be detected across different studies and biofluids. Three (LD and HD exosomes and HDL particles) of the six were previously purified and profiled. We define three new carrier profiles, ABF, CP and XSA, that are yet to be profiled in isolation and carry miRNAs in higher abundance than the LD, HD and HDL. All six carrier profiles are detected across body fluids, with ABF and HD exosome profiles detected in all five body fluids; XSA and LD exosome profiles in all except saliva; CP in CSF and plasma; and HDL particle profiles in plasma and saliva. We demonstrate the potential of this knowledge and methodology to improve interpretation of individual case–control studies by reducing variance due to sample-to-sample variation in carrier abundance and by assigning differential (cases vs. controls) abundance of specific small ncRNAs to specific carrier types.
Summary/Conclusion: ExRNA Atlas analysis yields global insights into vesicular and non-vesicular exRNA communication by combining and deconvoluting data across multiple studies.
For context, last year’s analysis count be found in ISEV’s 2017 Abstract Book. The exRNA Atlas went from 12 studies to 21, and from 1567 exRNA profiles up to 5309! As of June (1 month later), the number increased again to 5750. Progress :)