A Static Analytical Performance Model for GPU Kernel

来源 :第10届全国计算机支持的协同工作学术会议暨中国计算机学会协同计算专委年度工作会议 | 被引量 : 0次 | 上传用户:chenglian_chen
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Graphics processing units(GPUs)have shown increased popularity play an important role as a kind of coprocessor in heterogeneous co-processing environment.Tens of thousands threads collaborative work in parallel to solve heavily dataparallel problems efficiently in GPUs architecture.The achieved performance,therefore,depends on the capability of multiple threads in parallel collaboration when processing algorithm,the effectiveness of latency of latency hiding,and the utilization of multiprocessors.In this paper,we propose a static analytical kernel performance(SAKP)model for GPU kernel.The model considers three important factors that affecting the performance of GPU kernel,which including the cost of computing instruction,memory accessing and synchronization.In the proposed model a set of kernel and device features for the target GPU is generated.In conjunction of kernel and device features we determine the performance limiting factor and we generate an estimation of kernels execution time.We performed experiments on matrix multiplication(MM)and histogram generation(HG)in NVIDIA GTX680 GPU card and showed an absolute error in predictions less than 6.8%.Meanwhile,we validated our proposed model is more accuracy and simple by comparing with other current kernel models.
其他文献
针对社交网络环境中,为用户推荐哪类好友会使用户更容易采纳问题,文章提出了一种社交网络中基于角色活跃度的好友推荐方法,该方法结合了社交网络环境中不同社群(团队)拓扑结构形成的社群角色同社群中同样角色不同用户行为形成的角色活跃度差异和用户兴趣做好友推荐.文章首先通过文本相似性为用户寻找兴趣的社群,然后利用E-GARGO模型构建了社群拓扑结构中角色活跃的定义,并给出了活跃度计算方法,根据计算方法为目标用
SPH(Smooth Particle Hydrodynamics,光滑粒子流体动力学)模型在生成流体动画时存在计算消耗大且容易丢失高精度流体细节等问题.多分辨率方法可以在细节区域分布精细粒子而在平坦区域分布粗糙粒子,从而在保持精度的情况下极大地减少粒子规模,是解决上述问题的有效方法.本文提出一种基于SPH模型的多分辨率流体动画生成方法,协同考虑了流体计算及表面构建两个关键因素,自底向上地建立多分
二值图像连通体标记在计算机视觉和图像处理方面均有应用.本文提出了一种新的基于CUDA(统一计算设备架构)的二值图像连通体标记算法.该算法首先通过搜索邻域内最小标记来完成初始标记,然后再根据据结构元素找到标号矩阵中同一连通体有可能出现不同标记的位置,使用原子操作对根元素进行合并,通过CPU(中央处理器)与GPU(图形处理器)的协同工作来判断合并程度并进行循环修改.最后再使用回溯的方法实现复杂形状的连
随着计算机视觉技术的快速发展和在机器人领域内的广泛应用,双目视觉协同定位(也称为视觉里程计)是机器人感知环境和识别目标的关键技术。提出了一种双目立体视觉协同定位算法.提取立体图像序列中的尺度不变特征,利用圈匹配技术对特征匹配和跟踪,采用四元数法求解相邻帧间的旋转平移矩阵,并通过最小重投影误差法多次迭代优化获得最优运动参数.在KITTI数据集上进行实验并评估,表明该方法比传统方法计算更准确,且鲁棒性
随着通信技术、嵌入式技术的发展,无线传感器网络(wireless sensor network,WSN)已经广泛的应用于军事、农业、交通等领域.无线传感器网络也是近年来出现的物联网、信息物理系统等新技术的研究基础.时间同步技术(timesynchronization,TS)在有线网络中虽然已经有了稳定的同步协议,但是由于无线传感器网络低功耗、低成本、网络不稳定等因素的影响,互联网中的时间同步协议并
The verification of business process models is an important step in the design phase of process-aware information systems.Currently,in the area of artifact-centric process model verification,the focus
Service-Oriented Architecture(SOA)can make enterprise legacy systems revive,so they can continue to serve for the enterprise in innovative and better ways.In the process of reengineering legacy system
In this paper,the dissemination effect of cooperative information and exits choosing rules are introduced into social force model to exhibit the pedestrians behaviors during emergency evacuation,espec
In cloud computing,execution times of tasks or jobs on virtual machines are usually uncertain.To obtain accurate execution times,an integrated learning effects model is developed which makes use of ex
Aiming at the problem that the two asymmetry workshops taking the finishing time of migration into account integrated scheduling algorithm,which didnt consider about the integrated scheduling problems