论文部分内容阅读
研究了在门德尔遗传定理和哈代-维恩伯格平衡假设下,三元家庭基因型数据的单体分型和单体型频率估计问题.过去的研究仅仅关注个体间没有联系或者含有一般家系信息的基因型数据,而对这种特殊的三元家庭关注得不够考虑到HAPMAP数据库中有一部分数据就基于这种三元家庭,现在有越来越多的需求要求直接分析这种特殊的家系结构.提出一个两段式的三元家庭中单体型频率的估计方法:i)分型阶段,找出每一个三元家庭零重组单体构型;ii)频率估计阶段,在前一阶段得到的单体构型基础上,应用EM算法来估计单体型频率.在程序包TRIOHAP中用C语言实现了单体分型算法和EM算法,并且使用模拟和实际数据测试了TRIOHAP的有效性和效率.实验结果表明,TRIOHAP要比其他那些忽略了三元家庭信息的常见单体型频率估计软件运行快很多.进一步地,由于TRIOHAP利用了这些信息,其估计结果更加可靠.
The haplotypes and haplotype frequencies of ternary family genotypes were studied under the Mendelian genetic theorem and the Hardy-Weinberg equilibrium hypothesis.The past researches focused on the relationship between individuals without association or general families Information genotypic data, and insufficient attention to this particular ternary family Given that some of the data in the HAPMAP database is based on this ternary family, there is an increasing demand for a direct analysis of this particular pedigree Structure. A method of estimating the haplotype frequency in a two-stage ternary household is proposed: i) the typing stage to find out the configuration of zero-regiment monomer in each ternary family; ii) the frequency estimation stage, Based on the obtained monomer configurations, the haplotype frequencies were estimated using the EM algorithm.The haplotyping algorithm and EM algorithm were implemented in package TRIOHAP with C language and the validity of TRIOHAP was tested using both simulated and real data And efficiency.The experimental results show that TRIOHAP runs much faster than other common haplotype frequency estimation softwares that ignore the ternary household information.Furthermore, as TRIOHAP exploits this information, its estimation Results are more reliable.