Cscbc 2009

Indirect Gene-Disease Association via Medical Subject Term Annotation of Literature Evidence

Final poster PDF aka There Is No Step 5 - cscbc2009-poster.pdf


Computational analysis of the interconnections between annotated biomedical literature data and gene databases allows the prediction of novel linkages. This initial phase of the project focuses on human genes playing a previously unknown functional role in the pathology of one or more diseases. We utilise the 38 000 human genes in Entrez Gene to connect to nearly 9 million articles in PubMed annotated with disease-related Medical Subject Heading (MeSH) terms. To connect these data sources, we use both manually and automatically annotated linkages, such as the reviewed user-submitted Gene Reference into Function (GeneRIF) and the Gene2PubMed annotations in Entrez Gene.

By distilling this interconnected network of relationships into an integrated database, we provide a framework to identify known, direct relationships between genes and medical subjects. We then compare the direct association profiles of genes to the profiles for diseases to uncover novel relationships by assessing the similarity of the subject terms profiles. We evaluate a variety of scoring methodologies, such as over-representation analysis, to assess putative links between gene and disease as well as to validate the effectiveness of our scoring functions. Predictions will also be verified against curated sources on gene-related diseases, and the gene-disease associations predictions are compared against newly discovered associations.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.