Adapt To Muscle

Loading Entrez Gene (Entire)

  • Too slow - < 200,000 out of 1 million+ genes loaded in > 1 week
  • TODO: Distribute - cut file into smaller pieces, then parse/load independently
  • Implementation: Implement in C? Parse direct from text files rather than ASN/XML?

Steps

  • Download Pubmed articles for "muscles[MeSH]" query
  • load new genes (NOT DONE)
  • load PubMed articles (as pubmed_muscle table)
  • load PubMed MeSH (into pubmed_mesh)
  • Put query terms into a Table
 CREATE VIEW muscle AS  SELECT * FROM mesh WHERE tree_num LIKE 'A10.690.%' OR tree_num='A10.690';
  • Do modified Muscle Query:
SELECT locus, muscle.term , COUNT(DISTINCT generif.pmid) AS pubmed_refs FROM gene,generif, pubmed_mesh, muscle WHERE generif.pmid=pubmed_mesh.pmid AND muscle.term=pubmed_mesh.term AND gene.gene_id=generif.gene_id GROUP BY locus, muscle.term ORDER BY pubmed_refs;
  • Get muscle-related PMIDS:
SELECT DISTINCT locus, muscle.term, generif.pmid FROM gene,generif, pubmed_mesh, muscle WHERE generif.pmid=pubmed_mesh.pmid AND muscle.term=pubmed_mesh.term AND gene.gene_id=generif.gene_id ORDER BY locus, muscle.term;
  • Sarcomere-releated PMIDS (specific MeSH term)
SELECT DISTINCT locus, pubmed_mesh.term, generif.pmid FROM gene,generif, pubmed_mesh WHERE generif.pmid=pubmed_mesh.pmid AND pubmed_mesh.term='Sarcomeres' AND gene.gene_id=generif.gene_id ORDER BY locus, pubmed_mesh.term;
  • All terms with the word fragment "muscle"
CREATE VIEW muscle2 AS SELECT * FROM mesh WHERE term LIKE '%muscle%';
SELECT DISTINCT locus, pubmed_mesh.term, generif.pmid FROM gene,generif, pubmed_mesh, muscle2 WHERE generif.pmid=pubmed_mesh.pmid AND pubmed_mesh.term=muscle2.term AND gene.gene_id=generif.gene_id ORDER BY locus, pubmed_mesh.term;
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.