Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates
Saurabh Bhandari, Parveen Bhatti, Brian C. -H. Chiu, Yuan Ji
Key claim
spBART achieves high discrimination in genomic risk prediction.
The spBART model effectively combines interpretable low-dimensional covariates with complex high-dimensional predictors. It successfully identifies important genomic loci and achieves a high out-of-sample discrimination rate (AUC = 0.96) in multiple myeloma studies.
The proposed spBART model integrates interpretable covariate inference with flexible high-dimensional modeling.
The methodology includes a robust variable selection procedure and demonstrates strong performance in a real-world biomedical study.
Deep reliability assessment
The methodology supports the identification of epigenetic signatures associated with multiple myeloma risk while providing interpretable estimates for demographic covariates. However, claims of superior predictive performance may overstate the model's generalizability due to the pooling of data from studies with different designs.
Reproducibility
No, the paper does not mention any open source code or dataset availability.
Discussion questions
- What assumptions about the relationship between epigenetic markers and disease risk might be overly simplistic?
- How can builders leverage the findings of this study in practical applications for risk prediction in other diseases?
- What alternative explanations could account for the observed associations between the identified genes and multiple myeloma risk?
Key figure
Figure 1 illustrates a single regression tree that partitions the covariate space through binary splits, assigning constant predictions in each terminal leaf.