UNIVERSITY OF NORTH CAROLINA AT CHARLOTTE
Department of Computer Science

ITCS 6010/8010 Topics in Computer Science:
GPU Programming for High Performance Computing  (CUDA programming)

Spring 2011

Tuesday/Thursday 5:00 pm - 6:15 pm, Woodward 154

Dr. Barry Wilkinson

This page is continually updated as the course proceeds. Watch for announcements. Modification date: May 5, 2011

Course description Academic calendar
Announcements Lecture materials Source material links
Assignments Tests Moodle

ANNOUNCEMENTS

May 5, 2011:  Class Test 1 posted here for reference.

Previous announcements         Follow-up from class discussions (modification date: Feb 16)

Lecture Materials

The following slides are provided as Powerpoint slides.  You may wish to print these sides out as 1 x 2 or 2 x 3 thumbnails. The slides are not ready for use until the day of the class.
Lecture slides
Week
Date
Slides
Topics
1/2
Jan 13/18, 2011
Outline
Assignment Preliminaries
Demos
Brief GPU History
CUDA Prog. Model
Outline: Course contents, textbook, assessment, office hours
Assignment preliminaries: Computer systems used, remote access, assignment submission.
Demos: Heat distribution and N-body problems running on GPU server with graphics, speed-up factor.
History of GPUs leading to their use and design for HPC.
Basic CUDA program structure, kernel calls, threads, blocks, grid, example code: vector addition.
2
Jan 18/20, 2011
Multidimensional thread structure
Multidimensional grid and blocks, threads, blocks, grid, thread addressing, predefined variables, example code: matrix addition/multiplication (demos).
2
Jan 20, 2011 Assignment 1
Assignment 1 using Windows and Linux environments to compile and execute simple CUDA programs. Visual Studio, Linux make files
3
Jan 25/27, 2011
Performance measurements
Thread synchronization
Device routines
Memory structures
Measuring performance, timing program execution, CUDA “events”, synchronous and asynchronous CUDA routines, bandwidth measures, computation measures – floating point operations/sec
Ways to achieve thread synchronization
Declaring routines to be called from device and local device variables
Memory structures and bandwidth optimization
4
Feb 1, 2011 Assignment 2
Graphical Output

Notes on creating graphical output (for Assignment 2)
4
Feb 1/3, 2011 Memory Coalescing Demo

Shared Memory Demo

Matrix Multiplication

Demonstration of memory coalescing, code, performance improvements

Demonstration of using shared memory, code, performance improvements

Matrix multiplication performance improvements
5
Feb 8/10, 2011
Atomics

Streams
Accessing shared data by multiple threads, atomics, critical sections, compare and swap instruction and usage, memory fence instruction and usage
Computation/memory transfer overlap using streams
6
Feb 15, 2011
Zero copy memory Zero copy memory
6
Feb 17, 2011
No formal class.  Available in Office
7
Feb 22, 2011

Review
7
Feb 24, 2011

Class test
8
March 1/3, 2011
Const. mem. experiment
2-D grid and 3-D blocks
Detecting Cuda Errors
Discussion on class project
Various topics


March 7 - 12, 2011

Spring break -- no classes
9
March 15/17, 2011
SIMD image processing algorithms
Study of SIMD algorithms specifically suitable for GPUs
Image processing
10
March 22/24, 2011
Thread divergence
Effect of control flow instructions, streaming multiprocessors, warps, loop unrolling, predicated instructions
11
March 29/31, 2011

Progress presentations and reports
12
April 5/7, 2011
OpenCL
Outline of OpenCL code, example adding two vectors
13
April 12/14, 2011

Brief discussion on project submission
14
April 19/21, 2011

Class project presentations
15
April 26/28, 2011

Class project presentations
Student evaluation of course after last presentation on April 28th.

Tues May 3, 2011

Last class. Review for final.


Top 
Source materials

CUDA C Quick Reference
NVIDIA CUDA C Programming Guide version 3.2 (see also NVIDIA site)

Videos

OpenCL tutorial from NVIDIA GTC 2010 conference (1 1/2 hrs duration)
For other conference videos, see http://www.nvidia.com/object/gtc2010-presentation-archive.html#session2018

Top 

GPU News Items (for discussion)

Amazon cluster GPU instance
(Dec 8, 2010): http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html
AMD: http://www.hpcwire.com/blogs/AMDs-Next-GPU-Computing-Move-114318849.html

Papers

Top 
Assignments

Each assignment is not ready for use until the date set.
 
Date set Assignment Topic Date due
(midnight)
Thurs Jan 20, 2011
Assignment 1
First CUDA programming assignment - vector addition on Windows and Linux systems Monday Jan 31, 2011
Tues Feb 1, 2011
Assignment 2
Grap. Output
Second CUDA assignment, to include synchronization, local variables and graphics
Monday Feb 14, 2011
Tues Feb 15, 2011
Assignment 3
Third CUDA assignment - Using atomics and shared memory
Wed.  Feb 23, 2011
Tues March 1, 2011
Project Instructions

Project ideas (continually updated)
Project
Various dates, see project instructions

Top 


Tests

Class test 1:  Thursday February 24, 2011

Topics:  All materials up to Feb 15, 2011 inclusive (including assignments)


Final Exam: Date: Tuesday May 10, 2011, 5:00 pm - 7:30 pm

Topics: Comprehensive.


Top