Introduction to OpenCL
OpenCL at NERSC
OpenCL is an open standard for programming heterogeneous computers composed of CPUs, GPUs and other processors. OpenCL consists of a framework to define the platform as a host (typically a CPU) and one or more compute devices (e.g. a GPU) plus a C-based programming language for writing programs for the compute devices. Using OpenCL, a programmer can write parallel programs that use all the resources of the heterogeneous computer. We give an example of a C++ API but the concepts are relevant to OpenCL as a whole. OpenCL is current available on NERSC's DIRAC Testbed.
From: SC13 Tutorial -- OpenCL: A Hands-on Introduction
Tim Mattson, Intel Corp.; Alice Koniges, Berkeley Lab; Simon McIntosh-Smith University of Bristol.
SC13 attendees are shown working on Dirac. Tutorial Credits: This content is based on slides produced by Tom Deakin and Simon which were based on slides by Tim and Simon with help from Ben Gaster (Qualcomm) .
Dirac Instructions (comments are denoted with a #):
# Log into a login node for accessing Dirac (carver)
ssh -Y username@carver.nersc.gov
qsub -I -V -q dirac_int -l nodes=1:ppn=8
# Wait to be taken to a node
#you are there when your prompt says [username@dirac37] for example
Note: Module set-up is subject to default changes
module unload cuda
module load cuda/5.5
module unload pgi
module load gcc-sl6
Go to previous working directory if in a newly started PBS shell:
cd $PBS_O_WORKDIR
Make a directory for your exercises and grab them if you have not already
mkdir OpenCL_exercises
cd OpenCL_exercises
svn export http://portal.nersc.gov/svn/training/SC13/opencl
Compilation and first execution:
make; ./vadd
Example: vector addition
The hello world of program of data parallel programming is to add two vectors
C[i] = A[i] + B[i] for i=0 to N-1
For the OpenCL solution, there are two parts
– Kernel code
– Host code
Vector Addition – Kernel
__kernel void vadd( __global const float *a,
__global const float *b,
__global float *c)
{
int gid = get_global_id(0);
c[gid] = a[gid] + b[gid];
}
– Take the Vadd program we provide you. It will run a simple kernel to add two vectors together.
– Look at the host code and identify the API calls in the host code. Compare them against the API descriptions on the OpenCL C++ reference card.
• Expected output:
– A message verifying that the program completed successfully
Vector Addition – Host
• The host program is the code that runs on the host to:
– Setup the environment for the OpenCL program
– Create and manage kernels
• 5 simple steps in a basic host program:
– Define the platform … platform = devices+context+queues
– Create and Build the program (dynamic library for kernels)
– Setup memory objects
– Define the kernel (attach arguments to kernel function)
– Submit commands … transfer memory objects and execute kernels
The C++ Interface
• Khronos has defined a common C++ header file containing a high level interface to OpenCL, cl.hpp
• This interface is dramatically easier to work with1
• Key features:
– Uses common defaults for the platform and command-queue, saving the programmer from extra coding for the most common use cases
– Simplifies the basic API by bundling key parameters with the objects rather than requiring verbose and repetitive argument lists
– Ability to “call” a kernel from the host, like a regular function
– Error checking can be performed with C++ exceptions
1 especially for C++ programmers…