BTP / Genetics Graduate Retreat 2008

Presented slides on cardboard of preliminary results.


Computational analysis of the interconnections between annotated biomedical literature data and gene databases allows the prediction of novel linkages. This initial phase of the project focuses on human genes playing a previously unknown functional role in the pathology of one or more diseases. We utilise the 38 000 human genes in Entrez Gene to connect to nearly 9 million articles in PubMed annotated with disease-related Medical Subject Heading (MeSH) terms. To connect these data sources, we use both manually and automatically annotated linkages, such as the reviewed user-submitted Gene Reference into Function (GeneRIF) and the Gene2PubMed annotations in Entrez Gene.

By distilling this interconnected network of relationships into an integrated database, we will provide a framework to identify known, direct relationships between genes and disease. This will also allow us to extend this work to predicting novel relationships by the study of indirect linkages, by assessing the similarity of subject terms related to each gene and each disease. Scoring methodologies, such as over-representation analysis, will be developed to assess putative links between gene and disease as well as to validate the effectiveness of our scoring functions. Predictions will also be verified against curated sources on gene-related diseases such as the Online Mendelian Inheritance in Man.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.