Chapter title |
Text Mining to Support Gene Ontology Curation and Vice Versa
|
---|---|
Chapter number | 6 |
Book title |
The Gene Ontology Handbook
|
Published in |
Methods in molecular biology, January 2017
|
DOI | 10.1007/978-1-4939-3743-1_6 |
Pubmed ID | |
Book ISBNs |
978-1-4939-3741-7, 978-1-4939-3743-1
|
Authors |
Patrick Ruch, Ruch, Patrick |
Editors |
Christophe Dessimoz, Nives Škunca |
Abstract |
In this chapter, we explain how text mining can support the curation of molecular biology databases dealing with protein functions. We also show how curated data can play a disruptive role in the developments of text mining methods. We review a decade of efforts to improve the automatic assignment of Gene Ontology (GO) descriptors, the reference ontology for the characterization of genes and gene products. To illustrate the high potential of this approach, we compare the performances of an automatic text categorizer and show a large improvement of +225 % in both precision and recall on benchmarked data. We argue that automatic text categorization functions can ultimately be embedded into a Question-Answering (QA) system to answer questions related to protein functions. Because GO descriptors can be relatively long and specific, traditional QA systems cannot answer such questions. A new type of QA system, so-called Deep QA which uses machine learning methods trained with curated contents, is thus emerging. Finally, future advances of text mining instruments are directly dependent on the availability of high-quality annotated contents at every curation step. Databases workflows must start recording explicitly all the data they curate and ideally also some of the data they do not curate. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 1 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 1 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Mexico | 1 | 3% |
Unknown | 28 | 97% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 6 | 21% |
Researcher | 4 | 14% |
Student > Doctoral Student | 2 | 7% |
Other | 2 | 7% |
Student > Bachelor | 2 | 7% |
Other | 5 | 17% |
Unknown | 8 | 28% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 6 | 21% |
Biochemistry, Genetics and Molecular Biology | 3 | 10% |
Agricultural and Biological Sciences | 3 | 10% |
Medicine and Dentistry | 2 | 7% |
Social Sciences | 2 | 7% |
Other | 2 | 7% |
Unknown | 11 | 38% |