Deep Neural Networks for Doubly Robust Estimation with Nonprobability Survey Samples
Yufang Dai, Shihua Luo, Wendy Lou, Zilin Wang, Xuewen Lu
Read on arXiv →Key claim
DNN framework improves robustness in survey sampling estimations.
This paper presents a deep neural network-based method for combining probability and nonprobability survey samples to estimate population means. The key result shows that the proposed estimators enhance robustness against parametric misspecification, particularly in nonlinear selection mechanisms.
In plain English
This paper presents a deep neural network-based method for combining probability and nonprobability survey samples to estimate population means. The key result shows that the proposed estimators enhance robustness against parametric misspecification, particularly in nonlinear selection mechanisms.
The proposed DNN-assisted framework significantly extends existing methods for integrating survey samples.
The study provides empirical validation through simulations and real-world data, supporting its claims.
Deep reliability assessment
The methodology supports a theoretically motivated DNN-assisted IPW and doubly robust estimator for finite-population means when nonprobability sample selection can be explained by observed covariates and the probability reference sample is valid. Claims of improved robustness are plausible for nonlinear selection mechanisms, but the method does not remove bias from unobserved confounding, poor covariate overlap, invalid survey weights, or simultaneous misspecification of both sampling-score and outcome components.
Reproducibility
No open-source code or repository is mentioned in the provided abstract, introduction, conclusion, or footnotes. The empirical application uses Pew Research Center and Behavioral Risk Factor Surveillance System data, but no specific dataset release links or replication package details are provided.
Discussion questions
- 1.The core assumption is that nonprobability sample participation is ignorable conditional on observed auxiliary variables X; in real online panels or app-based datasets, how defensible is that assumption when motivation, access, and survey fatigue are unobserved?
- 2.For builders using customer, web, or mobile telemetry data in SEA, when is adding a probability reference sample worth the cost compared with simpler calibration, post-stratification, or targeted data collection?
- 3.What empirical diagnostic or stress test would falsify the claimed advantage of the DNN doubly robust estimator—for example, lack of covariate overlap, worse performance under held-out probability-sample outcomes, or instability across architectures and random seeds?
Key figure
The key architecture is a feedforward DNN that maps auxiliary covariates X to a logit sampling-score estimate for nonprobability sample inclusion, which is then used in inverse-probability weighted and doubly robust finite-population mean estimators alongside probability-sample design weights.
