Data Sources

SgTxGNN integrates data from multiple authoritative sources to provide comprehensive drug repurposing predictions and evidence.

Primary Sources

TxGNN Model

Source: Harvard Medical School, Zitnik Lab

The core prediction engine, published in Nature Medicine (2023).

Coverage:

17,080 diseases
7,957 drugs
80,000+ drug-disease relationships

Singapore HSA

Source: Health Sciences Authority, Singapore

Drug registration data for Singapore-approved medications.

Coverage:

11,466 registered drugs
745 drugs mapped to DrugBank

Evidence Sources

ClinicalTrials.gov

Source: U.S. National Library of Medicine

Global registry of clinical trials.

Usage: Evidence collection for drug-disease pairs

PubMed

Source: U.S. National Library of Medicine

Biomedical literature database.

Usage: Literature evidence for repurposing candidates

DrugBank

Source: University of Alberta

Comprehensive drug and target database.

Website
Data used under academic license

Coverage:

Drug identifiers and mappings
Drug-target interactions
Drug-drug interactions

Safety Data Sources

DDInter

Source: Shanghai University

Drug-drug interaction database.

Website

Coverage: 240,000+ interaction pairs

SIDER

Source: EMBL

Side effect database.

Website

Data Processing

Drug Mapping

HSA drug names extracted from registration data
Normalised using chemical name standardisation
Mapped to DrugBank identifiers
Successfully mapped: 745 drugs (73.87% of unique ingredients)

Prediction Generation

DrugBank IDs matched to TxGNN knowledge graph
Knowledge Graph predictions generated: 22,136
Deep Learning predictions generated: 29,100
Results unified and deduplicated: 31,543

Update Schedule

Source	Last Updated	Frequency
TxGNN Model	2023	As published
HSA Data	March 2026	Quarterly
DrugBank	2025	Annual
DDInter	2025	Annual

Licensing

Source	License
TxGNN	Academic use permitted
HSA Data	Open Government License
DrugBank	Academic license
PubMed/ClinicalTrials	Public domain
DDInter	Academic use

Data Quality

Validation Steps

Drug name verification: Cross-referenced with multiple sources
ID mapping validation: Verified against DrugBank
Prediction deduplication: Removed duplicate entries
Evidence verification: Checked source availability

Known Limitations

Not all HSA drugs have DrugBank mappings
Some generic names have multiple variants
TxGNN model trained on Western drug databases
Evidence collection may miss recent publications

Contact

For data-related questions:

GitHub Issues