← Back to feed
2026-05-19data

Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates

Saurabh Bhandari, Parveen Bhatti, Brian C. -H. Chiu, Yuan Ji

Key claim

spBART achieves high discrimination in genomic risk prediction.

The spBART model effectively combines interpretable low-dimensional covariates with complex high-dimensional predictors. It successfully identifies important genomic loci and achieves a high out-of-sample discrimination rate (AUC = 0.96) in multiple myeloma studies.

Novelty
8.0/10

The proposed spBART model integrates interpretable covariate inference with flexible high-dimensional modeling.

Reliability
7.0/10

The methodology includes a robust variable selection procedure and demonstrates strong performance in a real-world biomedical study.

Deep reliability assessment

The methodology supports the identification of epigenetic signatures associated with multiple myeloma risk while providing interpretable estimates for demographic covariates. However, claims of superior predictive performance may overstate the model's generalizability due to the pooling of data from studies with different designs.

Reproducibility

No, the paper does not mention any open source code or dataset availability.

Discussion questions

  1. What assumptions about the relationship between epigenetic markers and disease risk might be overly simplistic?
  2. How can builders leverage the findings of this study in practical applications for risk prediction in other diseases?
  3. What alternative explanations could account for the observed associations between the identified genes and multiple myeloma risk?

Key figure

Figure 1 illustrates a single regression tree that partitions the covariate space through binary splits, assigning constant predictions in each terminal leaf.

Benchmark results

multiple myeloma studiesAUC: 0.96vs standard BART+0.02SOTA
Read on arXiv →