CUDA and OpenACC Compilers

This page contains information about compiling GPU-based codes with NVidia’s CUDA compiler and PGI’s OpenACC compiler directives. For information on how to run GPU Computing jobs in Midway, see GPU Computing Jobs

Compiling CUDA GPU code on Midway

To view available CUDA versions on Midway, use the command:

module avail cuda

At time of writing, the available CUDA versions are:

cuda/4.2(default)
cuda/5.0
cuda/5.5

A very basic CUDA example code is provided below cudamemset.cu:

#include <stdio.h>
#include <cuda.h>

int main(){

	int n = 16;
	
	// host and device memory pointers
	int *h_a;
	int *d_a;
	
	// allocate host memory
	h_a = (int*)malloc(n * sizeof(int));
	
	// allocate device memory
	cudaMalloc((void**)&d_a, n * sizeof(int));		
	
	// set device memory to all zero's
	cudaMemset(d_a, 0, n * sizeof(int));
	
	// copy device memory back to host
	cudaMemcpy(h_a, d_a, n * sizeof(int), cudaMemcpyDeviceToHost);	
	
	// print host memory
	for (int i = 0; i < n; i++){
		printf("%d ", h_a[i]);
	}
	printf("\n");
	
	// free buffers
	free(h_a);
	cudaFree(d_a);
	
	return 0;
	
}

CUDA code must be compiled with Nvidia’s nvcc compiler which is part of the cuda software module. To build a CUDA executable, first load the desired CUDA module and compile with:

nvcc source_code.cu

Compiling OpenACC GPU code on Midway

OpenACC is supported on Midway through the PGI 2013 compiler suite. To load the OpenACC compiler, use the command:

module load pgi/2013

A very basic OpenACC example code is provided below stencil.c:

#include <stdio.h>
#include <stdlib.h>

int main(){

	int i,j,it;
	
	// set the size of our test arrays
	int numel = 2000;
	
	// allocate and initialize test arrays
	float A[numel][numel];
	float Anew[numel][numel];
	for (i = 0; i < numel; i++){
		for ( j = 0; j < numel; j++){
			A[i][j] = drand48();
		}	
	}
	
	// apply stencil 1000 times
	#pragma acc data copy(A), create(Anew)
	for (it = 0; it < 1000; it++){
		
		#pragma acc parallel loop
		for (i = 1; i < numel-1; i++){
			for (j = 1; j < numel-1; j++){
				
				Anew[i][j] = 0.25f * (A[i][j-1] + A[i][j+1] + A[i-1][j] + A[i+1][j]);
				
			}	
		}
		
		#pragma acc parallel loop
		for (i = 1; i < numel-1; i++){
			for (j = 1; j < numel-1; j++){
				
				A[i][j] = Anew[i][j];
				
			}	
		}
		
	}
	
	// do something with A[][]
	
	return 0;

}

OpenACC code targeted at an Nvidia GPU must be compiled with the PGI compiler using at least the following options:

pgcc source_code.c -ta=nvidia -acc