Data Source Analysis

PubMed MeSH Annotation




  • morbidmap: All fields are free text. Links disease names with gene names and chromosomal location
  • genemap: links disease names (text) with OMIM identifiers
  • omim.txt: links OMIM identifiers with everything. Names seem to have the identifier prepended - looks like a text dump of the website (or likely vice-versa)
  • Plan: put morbidmap into a table. Map disease names to MIM identifiers. map gene ids to Entrez Gene (should be HUGO if everyone is sane)


  • RRF files: columns separated by "|" character
  • MRCOLS.RRF: file lists the columns and which tables the column appears in
  • MRFILES.RRF: describes the contents of each file and the columns in the files (columns use abbreviations from MRCOLS.RRF)
  • MRDEF.RRF: definitions? Has listings for mesh and GO terms
  • MRCONSO.RRF: Concepts and sources
    • need to keep the Concept UI, Source ID, Source Name, Source_ID_Type, Code
  • MRMAP.RRF: Mappings. These only include SNOMEDCT mappings

grep for files containing "OMIM"

  • MRCONSO.RRF: Seems to be in MESH concepts (only 3 occurrences)
  • MRDEF.RRF: Several MeSH and GO definitions reference OMIM
  • MRDOC.RRF: (typed key value metadata map)
  • MRSAB.RRF: OMIM shows up as a source, and in HUGO and NCI (attribute?)
  • MRSAT.RRF: Simple concept, term and string attributes. OMIM_ID are linked to CUIs as an attribute. e.g. C0026574

UMLS Concept Alzheimer's Disease (C0002395)[Semantic Type: Disease or Syndrom] is linked to synonym AD (C0002395) is linked to OMIM ID 104300. Now to try reconstituting it from the data files.

  • Discovery: OMIM was not subsetted. Redoing UMLS subset including OMIM, in addition to GO, MeSH, SNOMEDCT
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.