搜索资源列表
Automatic-parallel-compiled
- 这是一篇很有价值的博士论文,对于并行化编译器中并行程序自动生成和性能优化技术进行了较深入的研究。 并行化的最终日标是生成符合日标机体系结构特点的高效并行程序,因此如何产生高效并行代码是并行化编译研究的一项重要内容。 这篇文章以并行化编译器KAP为研究背景,以分布内存结构为目标,研究了并行化过程中的通信优化和消息、传递类型并行程序自动生成问题;以共享内存结构为目标,研究了并行化产生的openMP程序的编译优化问题。通过测试确定了影响openMP程序性能的主要因素,从并行化生成OpenMP并
matrix-multiplication-based-OpenMp-
- 基于C语言的,在共享内存的并行机上使用OpenMP并行环境实现矩阵乘法-C-based, shared memory parallelism using OpenMP on a parallel machine environment to achieve the matrix multiplication
simpleTemplates.tar
- 这个范例是模板化的模板project.It的版本,也显示了如何正确的模板化动态分配共享内存阵列。-This sample is a templatized version of the template project.It also shows how to correctly templatize dynamically allocated shared memory arrays.
Sort
- 使用内存共享模型,将主机端数据,分散到各GPU进行排序,然后将各并行结果返回,统一处理。附有C++AMP官方说明模型文档。-Using shared memory model, the host-side data, distributed to each GPU to sort, and then returns the results of the parallel, unified handling. With C++AMP official descr iption model docu
Distributed-Shared-Memory
- 分布式共享内存的在Cluster上的实现-An implementation of distributed shared memory on clusters of workstations, connected via and IP-based network.
cudamarix-mul
- 用cuda写的矩阵相乘,包含分块和共享内存的核函数,请大家参照-Written by cuda matrix multiplication, and shared memory block containing the kernel function, please reference
Colfax-HOW-Day-02
- we focus on the usage of the Intel Xeon Phi platform as a coprocessor in the offload programming model. We talk about the explicit offload model based on compiler pragmas, explaining how to offload functions, local scalars and arrays of known size, a
combation caculation
- Intel编译器的自动并行化功能可以自动的将串行程序的一部分转换为线程化代码,适用于多核或多处理器的共享内存系统,OpenMP是C/C++ 和Fortan等的应用编程接口,已经被大多数计算机硬件和软件厂家所标准化。(Automatic parallelization of serial program function can automatically convert a portion of the thread of Intel code compiler, shared memory s