论文部分内容阅读
本文介绍了一种基于瓦片算法的稠密矩阵并行QR分解及其实现方法。瓦片算法的思想是将完整的矩阵分块,并使每个块内的数据连续存储。各个瓦片块先独立进行分解,其他块接收当前块分解产生的数据,来更新自身块内的矩阵。我们分别实现了串行瓦片算法和并行瓦片算法,采用基于MPI和OpenM P混合并行编程模型,在“元”超级计算机上验证了该并行算法,并与PLASMA软件包进行对比,程序效率和可扩展性优于PLASMA。在多个节点上运行时,展现了良好的扩展性。
This paper introduces a dense matrix parallel QR decomposition based on tile algorithm and its implementation. The idea of the tile algorithm is to partition the complete matrix and store the data in each block continuously. Each tile is decomposed independently, and the other blocks receive the data generated by the current block decomposition to update the matrix in its own block. We implement the serial tile algorithm and the parallel tile algorithm separately. The hybrid parallel programming model based on MPI and OpenM P is used to verify the parallel algorithm on the “yuan” supercomputer, and compared with the PLASMA software package, the program Efficiency and scalability superior to PLASMA. When running on multiple nodes, shows good scalability.