Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multip

来源 :Earthquake Science | 被引量 : 0次 | 上传用户:fems0601
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
We have successfully ported an arbitrary highorder discontinuous Galerkin method for solving the threedimensional isotropic elastic wave equation on unstructured tetrahedral meshes to multiple Graphic Processing Units(GPUs)using the Compute Unified Device Architecture(CUDA)of NVIDIA and Message Passing Interface(MPI)and obtained a speedup factor of about 28.3 for the single-precision version of our codes and a speedup factor of about 14.9 for the double-precision version.The GPU used in the comparisons is NVIDIA Tesla C2070 Fermi,and the CPU used is Intel Xeon W5660.To effectively overlap inter-process communication with computation,we separate the elements on each subdomain into inner and outer elements and complete the computation on outer elements and fill the MPI buffer first.While the MPI messages travel across the network,the GPU performs computation on inner elements,and all other calculations that do not use information of outer elements from neighboring subdomains.A significant portion of the speedup also comes from a customized matrix–matrix multiplication kernel,which is used extensively throughout our program.Preliminary performance analysis on our parallel GPU codes shows favorable strong and weak scalabilities. We have successfully ported an arbitrary highorder discontinuous Galerkin method for solving the threedimensional isotropic elastic wave equation on unstructured tetrahedral meshes to multiple Graphics Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) of NVIDIA and Message Passing Interface (MPI) and obtained a speedup factor of about 28.3 for the single-precision version of our codes and a speedup factor of about 14.9 for the double-precision version. The GPU used in the comparisons is NVIDIA Tesla C2070 Fermi, and the CPU used is an Intel Xeon W5660. To effectively overlap inter-process communication with computation, we separate the elements on each subdomain into inner and outer elements and complete the computation on outer elements and fill the MPI buffer first. Whilst the MPI messages travel across the network, the GPU performs computation on inner elements, and all other other calculations that do not use information of outer elements from neighboring subdomains. A significan t portion of the speedup also comes from a customized matrix-matrix multiplication kernel, which is used extensively throughout our program. Preliminary performance analysis on our parallel GPU codes shows favorable strong and weak scalabilities.
其他文献
细胞周期依赖性激酶4(CDK4)与细胞周期素D1竞争性结合,对细胞周期进行调控,细胞核增殖抗原(Ki67)是新近发现的一种重要的反映肿瘤增殖活性的蛋白,与肿瘤的增殖、浸润、转移和
2011年岁末上映的《金陵十三钗》是张艺谋执导的以日本侵华战争为背景的一部电影作品。根据严歌苓同名小说改编。“金陵十三钗”作为影片名,不是指称秦淮河畔的香粉脂气,也绝
据实举报,是法制国家公民的正当权力。但在有位校长的嘴中,据实举报的学生却是“疯子”。报载,因为举报学校高考乱收费,且被媒体查实而将学校多收116元报名费的事实曝了光,往
阅读教学是汉语教学的重要组成部分,是初中汉语教学的一项重要内容。通过阅读教学,学生不断掌握知识和技能,不断深化对世界观的认识,能力不断提高,人格不断完善。提高学生的
期刊
【摘要】英语写作是初中英语教学的重点。本文针对当前初中英语写作教与学中存在的问题,结合初中英语课程标准及写作教学的实际,就如何提升英语写作教学水平提出了一些训练操作策略。  【关键词】初中英语 写作教学 问题 提升策略  英语写作是一个语言输出的过程,是语言活动的一种表达技能,其要求是相当高的,对中学生而言,无疑是有点难度的。新的英语课程标准对初中生写作技能的总体目标是:能根据写作要求,收集、准备
干部落伍于时代,则直接关系着所辖区域、所管行业的前进步伐与面貌,的确不可等闲视之。为配合重庆市委书记薄熙来提出的“重庆成为中国内地对外开放高地”的目 In the era o
纵然军事实力强大,如果不能确保和平,以色列的前途依然堪忧。哈马斯自2007年起控制了加沙地带,但其统治乏善可陈。这个伊斯兰政党严酷无情、思想狭隘、排斥异己。它的纲领是
委婉语是人类社会中普遍存在的一种语言现象,是用语言来调剂人际关系的一个重要手段。它是一种修辞格,更是一种文化现象。由于委婉语具有较强的说服力,拥有很好的修饰效果,所
2006年6月20日,“中国燃料电池公交车商业化示范项目”采购的3辆梅赛德斯-奔驰燃料电池车正式开始运行。2007年9月16日,距离这批氢燃料电池公交车首发日已经过去了将近1年零3
摘要:本文指出了英语课堂中影响学生参与积极性的常见问题并分析了成因,提出了提高学生课堂参与度的有效策略。  关键词:课堂参与度;有效性策略;提高  中图分类号:G427文献标识码:A 文章编号:1992-7711(2013)21-051-1  课堂教学是学校教育中最主要的教育活动形式。课堂教学效果很大程度上取决于学生的课堂参与度,离开了学生的积极参与来谈教学质量和教学效果,就好比是无水之源、无本之