Data Mining of Macromolecular Structures.
Data Mining Techniques for the Life Sciences
Methods in molecular biology, January 2016
Bart van Beusekom, Anastassis Perrakis, Robbie P. Joosten
Oliviero Carugo, Frank Eisenhaber
The use of macromolecular structures is widespread for a variety of applications, from teaching protein structure principles all the way to ligand optimization in drug development. Applying data mining techniques on these experimentally determined structures requires a highly uniform, standardized structural data source. The Protein Data Bank (PDB) has evolved over the years toward becoming the standard resource for macromolecular structures. However, the process selecting the data most suitable for specific applications is still very much based on personal preferences and understanding of the experimental techniques used to obtain these models. In this chapter, we will first explain the challenges with data standardization, annotation, and uniformity in the PDB entries determined by X-ray crystallography. We then discuss the specific effect that crystallographic data quality and model optimization methods have on structural models and how validation tools can be used to make informed choices. We also discuss specific advantages of using the PDB_REDO databank as a resource for structural data. Finally, we will provide guidelines on how to select the most suitable protein structure models for detailed analysis and how to select a set of structure models suitable for data mining.
|Readers by professional status||Count||As %|
|Student > Ph. D. Student||3||16%|
|Student > Bachelor||2||11%|
|Professor > Associate Professor||2||11%|
|Student > Master||1||5%|
|Readers by discipline||Count||As %|
|Biochemistry, Genetics and Molecular Biology||7||37%|
|Medicine and Dentistry||3||16%|
|Agricultural and Biological Sciences||2||11%|