Criteria to Extract High-Quality Protein Data Bank Subsets for Structure Users.
Data Mining Techniques for the Life Sciences
Methods in molecular biology, January 2016
Oliviero Carugo, Kristina Djinović-Carugo
Oliviero Carugo, Frank Eisenhaber
It is often necessary to build subsets of the Protein Data Bank to extract structural trends and average values. For this purpose it is mandatory that the subsets are non-redundant and of high quality. The first problem can be solved relatively easily at the sequence level or at the structural level. The second, on the contrary, needs special attention. It is not sufficient, in fact, to consider the crystallographic resolution and other feature must be taken into account: the absence of strings of residues from the electron density maps and from the files deposited in the Protein Data Bank; the B-factor values; the appropriate validation of the structural models; the quality of the electron density maps, which is not uniform; and the temperature of the diffraction experiments. More stringent criteria produce smaller subsets, which can be enlarged with more tolerant selection criteria. The incessant growth of the Protein Data Bank and especially of the number of high-resolution structures is allowing the use of more stringent selection criteria, with a consequent improvement of the quality of the subsets of the Protein Data Bank.
|Readers by professional status||Count||As %|
|Student > Bachelor||2||20%|
|Student > Ph. D. Student||1||10%|
|Student > Postgraduate||1||10%|
|Readers by discipline||Count||As %|
|Biochemistry, Genetics and Molecular Biology||2||20%|
|Agricultural and Biological Sciences||1||10%|
|Medicine and Dentistry||1||10%|