Chapter title |
Traversing the k-mer Landscape of NGS Read Datasets for Quality Score Sparsification
|
---|---|
Chapter number | 31 |
Book title |
Research in Computational Molecular Biology
|
Published in |
Lecture notes in computer science, April 2014
|
DOI | 10.1007/978-3-319-05269-4_31 |
Pubmed ID | |
Book ISBNs |
978-3-31-905268-7, 978-3-31-905269-4
|
Authors |
Y. William Yu, Deniz Yorukoglu, Bonnie Berger, Yu, Y. William, Yorukoglu, Deniz, Berger, Bonnie |
Abstract |
It is becoming increasingly impractical to indefinitely store raw sequencing data for later processing in an uncompressed state. In this paper, we describe a scalable compressive framework, Read-Quality-Sparsifier (RQS), which substantially outperforms the compression ratio and speed of other de novo quality score compression methods while maintaining SNP-calling accuracy. Surprisingly, RQS also improves the SNP-calling accuracy on a gold-standard, real-life sequencing dataset (NA12878) using a k-mer density profile constructed from 77 other individuals from the 1000 Genomes Project. This improvement in downstream accuracy emerges from the observation that quality score values within NGS datasets are inherently encoded in the k-mer landscape of the genomic sequences. To our knowledge, RQS is the first scalable sequence based quality compression method that can efficiently compress quality scores of terabyte-sized and larger sequencing datasets. An implementation of our method, RQS, is available for download at: http://rqs.csail.mit.edu/. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Canada | 1 | 25% |
United Kingdom | 1 | 25% |
United States | 1 | 25% |
Unknown | 1 | 25% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 3 | 75% |
Members of the public | 1 | 25% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 20% |
Unknown | 8 | 80% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 3 | 30% |
Student > Doctoral Student | 2 | 20% |
Researcher | 2 | 20% |
Professor | 1 | 10% |
Student > Master | 1 | 10% |
Other | 1 | 10% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 6 | 60% |
Agricultural and Biological Sciences | 2 | 20% |
Medicine and Dentistry | 1 | 10% |
Unknown | 1 | 10% |