论文部分内容阅读
数据发掘和定量的构效关系(QSAR)是药物发现中重要的研究方法.本文首先介绍了一种极具实用价值的无模式分类与模式识别的基本方法,即k最近邻法.该法应用于QSAR研究时通过若干电子、拓扑结构参数计算分子间的欧氏距离来比较其相似性,并以最相似若干分子已知活性的加权平均来估算新结构的活性数据.整个运算过程利用模拟退火法并结合遗传算法和留一-交叉验证法优选出适当的结构参数和近邻数目.本文以一组已知活性的单核苷酸分子为例,对NCI和Maybridge化学数据库的分子运用kNN法进行数据发掘,通过计算数据库中分子与已知活性分子的相似性来进行筛选和活性预测,从而得到与已知结构最相似的分子,为搜寻潜在前导化合物提供重要参考信息.
Data Mining and Quantitative Structure-Activity Relationship (QSAR) is an important research method in drug discovery.This paper first introduces a very practical value of the basic method of patternless classification and pattern recognition, k nearest neighbor method. In the QSAR study, Euclidean distance between molecules was calculated by several electronic and topological parameters to compare their similarity, and the activity data of new structure were estimated by the weighted average of the known activity of the most similar molecules. The whole calculation process was simulated by simulated annealing Method and the genetic algorithm and leave-one cross-validation method to select the appropriate structure parameters and the number of neighbors.In this paper, a group of known activity of single nucleotide molecules, for example, NCI and Maybridge chemical database molecules using kNN method Data mining, through the calculation of the similarity between the molecules and known active molecules in the database for screening and activity prediction, to get the most similar to the known structure of the molecule, search for potential lead compounds provide important reference information.