论文部分内容阅读
目的探讨基于Bootstrap方法的EM估计在缺失数据多重填补中的应用及R中进行缺失数据分析。方法应用R中的epicalc统计包和Amelia II统计包分析男性健康调查缺失数据,通过Bootstrap法进行放回抽样,用EM算法对产生的m个抽样个体进行迭代分析,最后运用R中的“plot”和“disperse”函数对观察值和缺失值的分布,迭代初值的收敛性进行探讨。结果当迭代次数m=5时,男性健康数据的多重填补观察值与缺失值的分布最接近,且所有迭代初值均收敛。结论基于Bootstrap抽样的EM算法得到的多重填补数据集对实际观察数据集具有较好的代表性,可以用于对缺失数据集的预测。
Objective To explore the application of EM estimation based on Bootstrap method in multiple filling of missing data and analysis of missing data in R. Methods Using the epicalc statistical package in R and the Amelia II statistical package to analyze the missing data of the male health survey, the data were retrieved by Bootstrap method, and the m sampling individuals generated were iteratively analyzed by EM algorithm. Finally, the “ ”And “ disperse ”functions to investigate the distribution of observed and missing values and the convergence of iterative initial values. Results When the number of iterations m = 5, the distributions of missing values and the multiple filling observations of male health data were the closest, and the initial values of all iterations converged. Conclusion The multiple padding data sets obtained by the EM algorithm based on Bootstrap sampling are fairly representative of the observed data sets and can be used to predict the missing data sets.