An MPI+OpenACC-Based PRM Scalar Advection Scheme in the GRAPES Model over a Cluster with Multiple CP

来源 :清华大学学报自然科学版(英文版) | 被引量 : 0次 | 上传用户:mcl8023
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method (PRM) scalar advection scheme in the Global/Regional Assimilation and Prediction System (GRAPES) solves the moisture flux advection equation based on PRM.Computation of the scalar advection involves boundary exchange,and computation of higher bandwidth requirements is complicated and time-consuming in GRAPES.Recently,Graphics Processing Units (GPUs) have been widely used to solve scientific and engineering computing problems owing to advancements in GPU hardware and related programming models such as CUDA/OpenCL and Open Accelerator (OpenACC).Herein,we present an accelerated PRM scalar advection scheme with Message Passing Interface (MPI) and OpenACC to fully exploit GPUs\' power over a cluster with multiple Central Processing Units (CPUs) and GPUs,together with optimization of various parameters such as minimizing data transfer,memory coalescing,exposing more parallelism,and overlapping computation with data transfers.Results show that about 3.5 times speedup is obtained for the entire model running at medium resolution with double precision when comparing the scheme\'s elapsed time on a node with two GPUs(NVIDIA P100) and two 16-core CPUs (Intel Gold 6142).Further,results obtained from experiments of a higher resolution model with multiple GPUs show excellent scalability.
其他文献
Road pricing is an urban traffic management mechanism to reduce traffic congestion.Currently,most of the road pricing systems based on predefined charging tolls fail to consider the dynamics of urban traffic flows and travelers\' demands on the arrival
Integer overflow is a common vulnerability in Ethereum Smart Contracts (ESCs) and often causes huge economic losses.Smart contracts cannot be changed once it is deployed on the blockchain and thus demand further testing.Mutation testing is a fault-based t
Lesion detection in Computed Tomography (CT) images is a challenging task in the field of computer-aided diagnosis.An important issue is to locate the area of lesion accurately.As a branch of Convolutional Neural Networks (CNNs),3D Context-Enhanced (3DCE)
In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training errors in theory and train ResNet-50 on
Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases,which has great significance in diagnosing and treating diseases.However,traditional biometric methods are time consuming and expensive.Accor
N400 is an objective electrophysiological index in semantic processing for brain.This study focuses on the sensitivity of N400 effect during speech comprehension under the uni-and bi-modality conditions.Varying the Signal-to-Noise Ratio (SNR) of speech si
The prevalence of missing values in the data streams collected in real environments makes them impossible to ignore in the privacy preservation of data streams.However,the development of most privacy preservation methods does not consider missing values.A
Online advertising click-through rate (CTR) prediction is aimed at predicting the probability of a user clicking an ad,and it has undergone considerable development in recent years.One of the hot topics in this area is the construction of feature interact
The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems.However,the communication overhead is a major bottleneck that hampers the scalability of distributed machine
Most State-Of-The-Art (SOTA) Neural Machine Translation (NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtainable.However,the translation quality