论文部分内容阅读
乙型肝炎是一种十分严重的全球性传染疾病,乙型肝炎病毒(Hepatitis B virus,HBV)是导致乙型肝炎的直接原因。而HBV突变是乙肝病毒进化过程中的一个重要部分,近几年,国内外针对HBV突变进行了广泛研究。但是,对乙肝病毒序列中保守序列的研究为仍处于起步阶段。本文首先采用MEME(Multiple EM for motif elicitation)算法挖掘HBV基序(生物序列中的保守序列片段,即Motif),并提出了一种新的度量标准保守指数(Conserved index,CI),然后对HBV序列进行系统发育分析,最后对构建的系统发育树进行可靠性评价。结果表明,新的度量标准CI可以有效地利用MEME方法挖掘出多个保守序列,进行HBV序列的系统发育树构建,进而分析HBV序列之间的进化关系,并可以找出样本可能的祖先序列。本文的实验方法对HBV大数据集分析方法的研究有积极地启示作用。
Hepatitis B is a very serious global infectious disease. Hepatitis B virus (HBV) is the direct cause of Hepatitis B infection. HBV mutation is an important part of the evolution of hepatitis B virus. In recent years, extensive research has been conducted on HBV mutation at home and abroad. However, the study of conserved sequences in hepatitis B virus sequences is still in its infancy. In this paper, we first used the Multiple EM for Motif elicitation (MEME) algorithm to mine the HBV motif (the conserved sequence in the biological sequence, Motif) and proposed a new Conserved index (CI) Sequence phylogenetic analysis, and finally the reliability of the constructed phylogenetic tree evaluation. The results show that the new metric CI can effectively utilize MEME method to mine multiple conserved sequences, construct phylogenetic tree of HBV sequences, and then analyze the evolutionary relationship between HBV sequences and find out possible ancestral sequences of samples. The experimental method in this paper has a positive enlightenment on the research of HBV big data set analysis method.