论文部分内容阅读
目的获得瑞香狼毒Stellera chamaejasme转录组数据库代谢途径基因序列、SSR以及转座子等信息。方法以瑞香狼毒根作为受试材料,采用二代测序方法中的Illumina Hi Seq 2000进行转录组测序,并进行系统的生物信息学分析。结果共获得26 785 872个Clean reads片段,拼接得到47 053条Unigenes,平均长度为419 nt。将拼装所得到的Unigene序列利用BLAST工具分别与Nr、Swiss-Prot、KEGG、COG和GO数据库进行比对,分别有11 138和24 744条Unigene在Nr和Swiss-Prot数据库中比对得到了注释信息,可归于36个GO分类,涉及119个KEGG标准代谢通路,进一步分析发现15条萜类生物合成途径的关键酶基因。利用MISA软件发现3 480个SSR,数量最高的SSR类型为单碱基重复,为1 986条,出现频率为57.07%,最少的是六碱基重复SSR,只有5条,出现频率仅为0.14%。利用Repeat Masker在线工具针对瑞香狼毒转录组序列进行转座子预测分析,结果共发现有1 497条转座子,其中E值<1×10-5的序列有827条,包含22种类型转座子,数目最多的为LINE/L1类型(405条),占比为48.97%,占比最少的为DNA/Ginger、DNA/h AT、DNA/PIF-ISL2EU和LINE/Jockey以及LTR/Lenti类型分别只有1条。结论对瑞香狼毒进行高通量测序,获得了大量基因序列信息以及SSR和转座子信息,为今后分离克隆瑞香狼毒中佛波酯等有效成分生物合成的关键酶基因以及开展相关分子机制研究提供了数据资源和理论基础。
Objective To obtain the sequence of metabolic pathway genes, SSR and transposon in Stellera chamaejasme transcriptome database of Stellera chamaejasme. Methods The root of Stellera chamaejasme was used as the tested material. The sequencing of the transcriptome was carried out by Illumina Hi Seq 2000 in the second-generation sequencing method and the bioinformatics analysis was carried out systematically. As a result, a total of 26 785 872 Clean reads were obtained, and 47 053 Unigenes were spliced, with an average length of 419 nt. The assembled Unigene sequences were compared with Nr, Swiss-Prot, KEGG, COG and GO databases using BLAST tools, with 11 138 and 24 744 Unigene annotated respectively in the Nr and Swiss-Prot databases The information, which can be attributed to 36 GO categories involving 119 KEGG standard metabolic pathways, was further analyzed to find 15 key enzyme genes for the terpenoid biosynthesis pathway. A total of 3 480 SSRs were found by MISA software. The highest number of SSRs were single base repeats (1 986) with a frequency of 57.07%. The least number of SSRs was SSR with only six SSRs, with a frequency of only 0.14% . A total of 1 497 transposons were found using the Repeat Masker online tool for transposon prediction of Stellera chamaejasme transcriptome sequences. There were 827 sequences with E value <1 × 10-5, including 22 types The largest number of loci was LINE / L1 (405), accounting for 48.97% of the total, with the least proportion being DNA / Ginger, DNA / P ATL2EU, LINE / Jockey and LTR / Lenti Only one each. Conclusions High-throughput sequencing of Stellera chamaejasme caused a great deal of gene sequence information and SSR and transposon information, which will be the key enzyme genes for biosynthesis of phorbol ester and other related molecules in the future. Research provides the data resources and theoretical basis.