论文部分内容阅读
由于性价比高、计算能力强,多核机群已经成为当今高性能计算的主流工具.然而,多核机群环境下不同的存储机制和通信延迟特点也为高效并行算法的设计带来了挑战.为充分利用多核机群的硬件资源获取最优性能,本文设计了一种有限元结构分析的层级负载均衡并行计算方法.该方法建立在对计算任务的层次性和粒度性充分挖掘的基础上.为与多核机群的硬件拓扑体系结构相适应,本文将计算任务划分为三个层次:节点间并行、片间并行和核间并行.其中,节点间并行和片间并行采用粗粒度并行计算方法,而核间并行采用细粒度并行计算方法.通过将计算任务映射到多核机群的不同硬件层面执行,该方法不仅有效实现了不同层面的负载均衡,而且大幅度降低了系统的通信开销.此外,它还大幅度减少了子区域的数目,有效提高了界面方程的数值收敛性.为验证算法的有效性,在“天河二号”超级计算机上进行了有限元结构线性静力分析大规模并行计算测试.结果表明:同传统区域分解法相比,层级负载均衡并行计算方法能够获得较高的加速比和并行效率.本文的研究主要集中在线性静力学问题上.对于非线性问题或者动力学问题,由于涉及多个迭代步,因此可以将本文算法封装为一个子函数进行调用.
Due to its high performance-price ratio and high computing power, multi-core cluster has become the mainstream of high-performance computing today.However, different memory mechanisms and communication delay in multi-core cluster environment also challenge the design of efficient parallel algorithm.In order to make full use of multi-core In this paper, we design a hierarchical load balancing parallel computing method of finite element structural analysis, which is based on the full mining of the hierarchical and granularity of computing tasks. Hardware topology architecture, this paper divides the computing tasks into three levels: parallelism between nodes, parallel between slices and parallel between cores.One of the parallel and interchip-parallel methods uses coarse-grained parallel computation, and the parallel between cores Fine-grained parallel computing method by mapping computing tasks to different hardware layers of a multi-core cluster, this method not only effectively implements load balancing at different levels but also greatly reduces the communication overhead of the system. In addition, it also significantly reduces The number of sub-regions, effectively improve the numerical convergence of interface equations.In order to verify the effectiveness of the algorithm The results show that compared with the traditional regional decomposition method, the hierarchical load balancing parallel computing method can achieve a higher speedup Ratio and parallel efficiency.The research of this paper mainly focuses on the problem of linear statics.In the case of nonlinear problems or dynamics problems, the algorithm in this paper can be encapsulated as a subroutine to be called because it involves multiple iteration steps.