Methodology

How predictions are generated and validated

Overview

SgTxGNN uses a dual-method approach combining Knowledge Graph (KG) predictions and Deep Learning (DL) predictions to identify drug repurposing candidates. Predictions validated by both methods (KG+DL) have higher confidence.

Prediction Pipeline

Step 1: Knowledge Graph Prediction (KG)

The Knowledge Graph method uses TxGNN’s biomedical knowledge graph containing:

17,080 biomedical entities (drugs, diseases, genes, proteins)
80,127 drug-disease relationships
Biological pathway connections

KG predictions identify drugs that share biological pathways or targets with diseases.

Step 2: Deep Learning Prediction (DL)

The Deep Learning method uses TxGNN’s graph neural network model:

Trained on known drug-disease relationships
Learns complex patterns in the knowledge graph
Outputs confidence scores (0.0-1.0) for each drug-disease pair

Step 3: Dual Validation (KG+DL)

Predictions that appear in both KG and DL results are marked as “KG+DL” with higher confidence:

1,217 dual-validated predictions in SgTxGNN
These predictions have convergent evidence from two independent methods

Evidence Classification

L1-L5 Evidence Levels

Level	Definition	Criteria
L1	Multiple Phase 3 RCTs	≥2 completed Phase 3 trials with positive results
L2	Single RCT or Phase 2	1 RCT or ≥2 Phase 2 trials
L3	Observational Studies	Cohort or case-control studies
L4	Preclinical/Mechanistic	In vitro, animal studies, or mechanistic evidence
L5	Prediction Only	AI prediction without clinical evidence

Evidence Sources

Evidence is collected from:

ClinicalTrials.gov - Clinical trial registry
PubMed - Biomedical literature
DrugBank - Drug mechanism and interaction data
Singapore HSA - Local registration status

Prediction Quality

Confidence Scores

DL predictions include confidence scores:

>0.99: Very high confidence
0.95-0.99: High confidence
0.90-0.95: Moderate confidence
0.50-0.90: Lower confidence (still above threshold)

Filtering Criteria

All predictions meet these minimum criteria:

DL score ≥ 0.50 (above random chance)
Drug is registered with Singapore HSA
Drug has valid DrugBank mapping

Data Processing

Singapore HSA Data

Drug registration data from data.gov.sg
5,485 registered products processed
Active ingredients mapped to DrugBank IDs
745 unique drugs with successful mapping

TxGNN Integration

DrugBank IDs matched to TxGNN knowledge graph
Predictions generated for all mapped drugs
Results filtered by confidence threshold
Final dataset: 31,543 predictions

Limitations

Model Limitations

TxGNN trained on historical data (may miss recent discoveries)
Some drugs/diseases not in knowledge graph
Predictions are computational hypotheses, not clinical evidence

Data Limitations

HSA data may not include all marketed products
Some ingredient mappings may be imprecise
Evidence collection limited to English literature

Interpretation

L5 predictions require clinical validation
High DL scores don’t guarantee clinical efficacy
Always consult healthcare professionals

Reproducibility

Code & Data

Source code: GitHub
TxGNN model: Harvard Dataverse
HSA data: data.gov.sg

Version Information

Component	Version
TxGNN Model	v1.0 (Nature Medicine 2023)
HSA Data	March 2026
SgTxGNN	v1.0.0

Disclaimer
Predictions are computational hypotheses for research purposes only. Clinical validation is required before any therapeutic application.