Report for: Research in Computational Molecular Biology

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Chapter title	Resolving Multicopy Duplications de novo Using Polyploid Phasing
Chapter number	8
Book title	Research in Computational Molecular Biology
Published in	Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005-), May 2017
DOI	10.1007/978-3-319-56970-3_8
Pubmed ID	28808695
Book ISBNs	978-3-31-956969-7, 978-3-31-956970-3
Authors	Mark J. Chaisson, Sudipto Mukherjee, Sreeram Kannan, Evan E. Eichler
Abstract	While the rise of single-molecule sequencing systems has enabled an unprecedented rise in the ability to assemble complex regions of the genome, long segmental duplications in the genome still remain a challenging frontier in assembly. Segmental duplications are at the same time both gene rich and prone to large structural rearrangements, making the resolution of their sequences important in medical and evolutionary studies. Duplicated sequences that are collapsed in mammalian de novo assemblies are rarely identical; after a sequence is duplicated, it begins to acquire paralog specific variants. In this paper, we study the problem of resolving the variations in multicopy long-segmental duplications by developing and utilizing algorithms for polyploid phasing. We develop two algorithms: the first one is targeted at maximizing the likelihood of observing the reads given the underlying haplotypes using discrete matrix completion. The second algorithm is based on correlation clustering and exploits an assumption, which is often satisfied in these duplications, that each paralog has a sizable number of paralog-specific variants. We develop a detailed simulation methodology, and demonstrate the superior performance of the proposed algorithms on an array of simulated datasets. We measure the likelihood score as well as reconstruction accuracy, i.e., what fraction of the reads are clustered correctly. In both the performance metrics, we find that our algorithms dominate existing algorithms on more than 93% of the datasets. While the discrete matrix completion performs better on likelihood score, the correlation clustering algorithm performs better on reconstruction accuracy due to the stronger regularization inherent in the algorithm. We also show that our correlation-clustering algorithm can reconstruct on an average 7.0 haplotypes in 10-copy duplication data-sets whereas existing algorithms reconstruct less than 1 copy on average.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
As of 1 July 2024, you may notice a temporary increase in the numbers of X profiles with Unknown location. Click here to learn more.

Geographical breakdown

Country	Count	As %
United States	1	50%
Unknown	1	50%

Demographic breakdown

Type	Count	As %
Scientists	1	50%
Members of the public	1	50%

Mendeley readers

The data shown below were compiled from readership statistics for 16 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Netherlands	1	6%
Unknown	15	94%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	5	31%
Student > Bachelor	4	25%
Student > Ph. D. Student	3	19%
Professor > Associate Professor	1	6%
Unknown	3	19%

Readers by discipline	Count	As %
Biochemistry, Genetics and Molecular Biology	5	31%
Agricultural and Biological Sciences	4	25%
Computer Science	2	13%
Veterinary Science and Veterinary Medicine	1	6%
Unknown	4	25%

Research in Computational Molecular Biology

Table of Contents

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown