Data Sources

SgTxGNN integrates data from multiple authoritative sources to provide comprehensive drug repurposing predictions and evidence.


Primary Sources

TxGNN Model

Source: Harvard Medical School, Zitnik Lab

The core prediction engine, published in Nature Medicine (2023).

Coverage:

  • 17,080 diseases
  • 7,957 drugs
  • 80,000+ drug-disease relationships

Singapore HSA

Source: Health Sciences Authority, Singapore

Drug registration data for Singapore-approved medications.

Coverage:

  • 11,466 registered drugs
  • 745 drugs mapped to DrugBank

Evidence Sources

ClinicalTrials.gov

Source: U.S. National Library of Medicine

Global registry of clinical trials.

Usage: Evidence collection for drug-disease pairs

PubMed

Source: U.S. National Library of Medicine

Biomedical literature database.

Usage: Literature evidence for repurposing candidates

DrugBank

Source: University of Alberta

Comprehensive drug and target database.

  • Website
  • Data used under academic license

Coverage:

  • Drug identifiers and mappings
  • Drug-target interactions
  • Drug-drug interactions

Safety Data Sources

DDInter

Source: Shanghai University

Drug-drug interaction database.

Coverage: 240,000+ interaction pairs

SIDER

Source: EMBL

Side effect database.


Data Processing

Drug Mapping

  1. HSA drug names extracted from registration data
  2. Normalised using chemical name standardisation
  3. Mapped to DrugBank identifiers
  4. Successfully mapped: 745 drugs (73.87% of unique ingredients)

Prediction Generation

  1. DrugBank IDs matched to TxGNN knowledge graph
  2. Knowledge Graph predictions generated: 22,136
  3. Deep Learning predictions generated: 29,100
  4. Results unified and deduplicated: 31,543

Update Schedule

Source Last Updated Frequency
TxGNN Model 2023 As published
HSA Data March 2026 Quarterly
DrugBank 2025 Annual
DDInter 2025 Annual

Licensing

Source License
TxGNN Academic use permitted
HSA Data Open Government License
DrugBank Academic license
PubMed/ClinicalTrials Public domain
DDInter Academic use

Data Quality

Validation Steps

  1. Drug name verification: Cross-referenced with multiple sources
  2. ID mapping validation: Verified against DrugBank
  3. Prediction deduplication: Removed duplicate entries
  4. Evidence verification: Checked source availability

Known Limitations

  • Not all HSA drugs have DrugBank mappings
  • Some generic names have multiple variants
  • TxGNN model trained on Western drug databases
  • Evidence collection may miss recent publications

Contact

For data-related questions:


Back to top

Copyright © 2026 Yao.Care. For research purposes only. Not medical advice.

This site uses Just the Docs, a documentation theme for Jekyll.