论文部分内容阅读
In the human sciences, the way people think, feel and act is thought to reflect their psychological properties. Data is often gathered using questionnaires, and different forms of factor analysis are used as the default analysis method. Factor analysis is based on the idea that the covariation of variables related to different forms of human behavior reflects the underlying psychological properties. But could we do without latent variables? In this presentation, I discuss the results of modeling data from a psychometric questionnaire as a Markov Random Field (MRF), in which the different forms of behavior function as random variables. The analysis is based on forming an undirected, weighted network that is encoded in a weights matrix. As a first stage of the analysis - and not yet related to MRFs - it is beneficial to use raw correlations among the variables as weights and to represent the weights matrix as a graph. Correlations can be used as weights because correlations are undirected and weighted entities, with zero correlation representing no relationship. This first step is useful for obtaining an overall understanding of the interrelationships of the variables in the data. The correlation matrix, however, suffers from the problem of confounding: the observed correlation between any two variables may be due to both of them being related to one or more other variables in the data. Because of this, as the second stage of the analysis, a partial correlation matrix is calculated. This matrix is directly related to the inverse of the correlation matrix, and can be calculated through the use of several linear regressions. Under the multivariate normality assumption, two variables are conditionally independent if their partial correlation (conditioning on all other variables included in the network) is zero. This applies at the population level, but in the sample data exact zeroes are unlikely. For this reason, the partial correlations are calculated based on estimating the corresponding regression models using the adaptive least absolute shrinkage and selection operator (adaptive LASSO) estimator. In adaptive LASSO estimation, a penalized likelihood is maximized, with the penalty being based on the value of the extended Bayesian Information Criterion (eBIC). Small partial correlations then shrink to zero with the aim of converging on the hypothesized population-level Markov Random Field model.The network thus formed can be described using various centrality and clustering coefficients. When working with weighted networks, many of these coefficients can be calculated based on the connection weights, even though they were originally formulated for the non-weighted case. The issue of calculating the value of such coefficients based on either the presence of connections or their weights is discussed. Finally, from the teachers perspective, the network models show promise as a teaching tool in the behavioral sciences: the graphs can be used as a visual tool that allows the students to obtain an intuitive understanding of highdimensional, complex data.