Homology-Based Annotation of Large Protein Datasets.
Data Mining Techniques for the Life Sciences
Methods in molecular biology, January 2016
Marco Punta, Jaina Mistry
Oliviero Carugo, Frank Eisenhaber
Advances in DNA sequencing technologies have led to an increasing amount of protein sequence data being generated. Only a small fraction of this protein sequence data will have experimental annotation associated with them. Here, we describe a protocol for in silico homology-based annotation of large protein datasets that makes extensive use of manually curated collections of protein families. We focus on annotations provided by the Pfam database and suggest ways to identify family outliers and family variations. This protocol may be useful to people who are new to protein data analysis, or who are unfamiliar with the current computational tools that are available.
|Readers by professional status||Count||As %|
|Student > Bachelor||2||29%|
|Student > Doctoral Student||1||14%|
|Student > Ph. D. Student||1||14%|
|Readers by discipline||Count||As %|
|Biochemistry, Genetics and Molecular Biology||3||43%|
|Agricultural and Biological Sciences||2||29%|