Data Sources
SgTxGNN integrates data from multiple authoritative sources to provide comprehensive drug repurposing predictions and evidence.
Primary Sources
TxGNN Model
Source: Harvard Medical School, Zitnik Lab
The core prediction engine, published in Nature Medicine (2023).
Coverage:
- 17,080 diseases
- 7,957 drugs
- 80,000+ drug-disease relationships
Singapore HSA
Source: Health Sciences Authority, Singapore
Drug registration data for Singapore-approved medications.
Coverage:
- 11,466 registered drugs
- 745 drugs mapped to DrugBank
Evidence Sources
ClinicalTrials.gov
Source: U.S. National Library of Medicine
Global registry of clinical trials.
Usage: Evidence collection for drug-disease pairs
PubMed
Source: U.S. National Library of Medicine
Biomedical literature database.
Usage: Literature evidence for repurposing candidates
DrugBank
Source: University of Alberta
Comprehensive drug and target database.
- Website
- Data used under academic license
Coverage:
- Drug identifiers and mappings
- Drug-target interactions
- Drug-drug interactions
Safety Data Sources
DDInter
Source: Shanghai University
Drug-drug interaction database.
Coverage: 240,000+ interaction pairs
SIDER
Source: EMBL
Side effect database.
Data Processing
Drug Mapping
- HSA drug names extracted from registration data
- Normalised using chemical name standardisation
- Mapped to DrugBank identifiers
- Successfully mapped: 745 drugs (73.87% of unique ingredients)
Prediction Generation
- DrugBank IDs matched to TxGNN knowledge graph
- Knowledge Graph predictions generated: 22,136
- Deep Learning predictions generated: 29,100
- Results unified and deduplicated: 31,543
Update Schedule
| Source | Last Updated | Frequency |
|---|---|---|
| TxGNN Model | 2023 | As published |
| HSA Data | March 2026 | Quarterly |
| DrugBank | 2025 | Annual |
| DDInter | 2025 | Annual |
Licensing
| Source | License |
|---|---|
| TxGNN | Academic use permitted |
| HSA Data | Open Government License |
| DrugBank | Academic license |
| PubMed/ClinicalTrials | Public domain |
| DDInter | Academic use |
Data Quality
Validation Steps
- Drug name verification: Cross-referenced with multiple sources
- ID mapping validation: Verified against DrugBank
- Prediction deduplication: Removed duplicate entries
- Evidence verification: Checked source availability
Known Limitations
- Not all HSA drugs have DrugBank mappings
- Some generic names have multiple variants
- TxGNN model trained on Western drug databases
- Evidence collection may miss recent publications
Contact
For data-related questions: