论文部分内容阅读
Background: Over the past decade,owing to rapid advances in the next-generation sequencing (NGS) technology,the cost of DNA sequencing has been reduced by over several orders of magnitude.NGS has the power to detect simultaneously DNA from many different samples.To efficiently use the capacity of sequencer and reduce the cost of sequencing library construction for large-scale sequencing,multiple individuals could be pooled together and sequenced,called pooled sequencing (pool-seq).Pool-seq is cost effective for sample preparations,especially for targeted sequencing projects,since the cost for target capturing is proportional to the number of samples.Among various pool-seq strategies,overlap pooling or disjoint pooling could be used to identify rare variation carriers in addition to estimate SNP (Single Nucleotide polymorphism) frequency in population studies[1].The identity of each sample is encoded within the pooling pattern rather than by its direct association with a particular sequence tag or barcode[2].Our work mainly focused on the informatics problem of pool-seq,especially overlapping pool-seq,including sequencing experiment design and data analysis.