论文部分内容阅读
在定性分类变量的判别分析中常须对解释变量进行筛选。近年来发表过一些这种方法,但都是用于二分反应变量的,不能直接用于多类别(类别数>2)反应变量的情况上。本文从减少错分率出发,分析了引入变量对减少错分率的效力,导出一个适用于多类别反应变量的筛选程序。程序中把引入变量的实际效力及似然比判据X_L~2两个统计量配合起来,同时做为选入变量的标准,且对后者的自由度的计算进行了相应的修正。这样就在变量筛选过程中,随时都能把待选入变量的效力正确地反映出来了。在选入过程的终止标准问题上,没有采用通常的以一个概率值为终止标准的做法,而是采用了比容——样本容量与样本空间非零状态数之比——为标准;看来这样既能较好地保证予报的可靠性,又可在此基础上尽可能地减少错分率。最后在例算中与Lachin的筛选程序进行了某些比较。
In the discriminant analysis of qualitative categorical variables, it is often necessary to screen explanatory variables. Some of these methods have been published in recent years, but all of them are used for dichotomous response variables and cannot be directly used in the case of multiple categories (categories > 2). In this paper, starting from the reduction of the dislocation rate, the effectiveness of the introduction of variables in reducing the dislocation rate is analyzed, and a screening procedure suitable for multi-category reaction variables is derived. In the program, the actual effectiveness of the introduced variable and the likelihood ratio criterion X_L~2 are combined together. At the same time, the criterion of the selected variable is used, and the calculation of the latter’s degree of freedom is correspondingly revised. In this way, the effectiveness of the variable to be selected can be correctly reflected at any time in the variable screening process. Regarding the termination criteria of the opt-in process, the usual practice of using a probability as the termination criterion was not adopted. Instead, the specific volume—the ratio of the sample size to the number of non-zero states in the sample space—is adopted as the criterion; This can not only guarantee the reliability of the forecast, but also reduce the wrong division rate as much as possible. Finally, some comparisons were made with Lachin’s screening procedures in the calculations.