Chapter title |
MetaVW: Large-Scale Machine Learning for Metagenomics Sequence Classification
|
---|---|
Chapter number | 2 |
Book title |
Data Mining for Systems Biology
|
Published in |
Methods in molecular biology, January 2018
|
DOI | 10.1007/978-1-4939-8561-6_2 |
Pubmed ID | |
Book ISBNs |
978-1-4939-8560-9, 978-1-4939-8561-6
|
Authors |
Kévin Vervier, Pierre Mahé, Jean-Philippe Vert, Vervier, Kévin, Mahé, Pierre, Vert, Jean-Philippe |
Abstract |
Metagenomics is the study of microbial community diversity, especially the uncultured microorganisms by shotgun sequencing environmental samples. As the sequencers throughput and the data volume increase, it becomes challenging to develop scalable bioinformatics tools that reconstruct microbiome structure by binning sequencing reads to reference genomes. Standard alignment-based methods, such as BWA-MEM, provide state-of-the-art performance, but we demonstrate in Vervier et al. (2016) that compositional approaches using nucleotides motifs have faster analysis time, for comparable accuracy. In this work, we describe how to use MetaVW, a scalable machine learning implementation for short sequencing reads binning, based on their k-mers profile. We provide a step-by-step guideline on how we trained the classification models and how it can easily generalize to user-defined reference genomes and specific applications. We also give additional details on what effect parameters in the algorithm have on performances. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 40% |
Unknown | 3 | 60% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 4 | 80% |
Scientists | 1 | 20% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 53 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 15 | 28% |
Student > Ph. D. Student | 9 | 17% |
Student > Master | 8 | 15% |
Student > Bachelor | 4 | 8% |
Student > Doctoral Student | 3 | 6% |
Other | 5 | 9% |
Unknown | 9 | 17% |
Readers by discipline | Count | As % |
---|---|---|
Biochemistry, Genetics and Molecular Biology | 14 | 26% |
Agricultural and Biological Sciences | 11 | 21% |
Computer Science | 9 | 17% |
Medicine and Dentistry | 4 | 8% |
Chemistry | 2 | 4% |
Other | 3 | 6% |
Unknown | 10 | 19% |