GitHub - karnajitsen/NVLink-benchmark: NVLink microbenchmark with IBM Power8 and NVIDIA P100 GPU

Configuration:

/config/parameters.config file contains all the parameters of measurement to be configured. In our parameter file, we have used following letters to denote unit of data size, B = Byte K = Kilo Bytes M = Mega Bytes G = Giga Bytes

An example of parameters and there description are as follows,

TPDATATYPE = double : data type for througput experiement LATDATATYPE = int : data type for latency experiement SIZE = 32M-8G,2 : Size of the array, data size. format - 32 = Starting minimum data size, 8 = Last maximum data size, "-" = Minimum and maximum data size separator, "," = data size and incrementing factor separator, "2" = incrementing factor multiple of 2, ACCESSPATTERN = LINEAR : Access pattern, not used so far THROUGHPUT = N : "Y" = throughput experiment will takes place, if "N" no throughput experiements CUDAMEMCOPY = N : "Y" = cudamemcpy experiment will takes place, if "N" no cudamemcpy experiements KER2MEMLOC = C0-C0;C0-G0;G0-G1; C0-G0-G1 : "Kernel location"-Memory location 1, Memory Location 2, ... ", Mapping Separator is ";" STREAM = READ,WRITE,COPY,ADD,TRIAD,QUADAD,PENTAD,HEXAD : STREAM type: One can remove any of these stream, but the order should be maintaned as of now. MEMORYMODE = NONUM : Memory models , NONUM = Zero-Copy , UM = Unified Memory REPETATIONS = 1 : # of Repeatations for the experiments to avoid measurement noise LATENCY = Y : Latency experiements will take place if "Y" else no latency experiements. Both THROUGHPUT and LATENCY flag can not be "Y" at the same time. ITERATIONS = 1 : # of iterations for pointer chasing, kept 1 by default for our experiments STRIDE = 4B-512K,2 : Starting minimum stride size - Last Maximum stride size, Incrementing value, Incrementing operator ( * - multiplications, + - Addition ) CHASE_ELEMENTS = 128-16384,2,* : Starting minimum # of elements chased - Last Maximum # of elements chased, Incrementing value, Incrementing operator ( * - multiplications, + - Addition ) START_THREAD = 1024 : Starting gpu block size END_THREAD = 1024 : Maximum GPU block size OMP_THREADS = 1-1,4 : Starting minimum Openmp thread - Last Maximum Openmp threads, Incrementing value to be multiplied MEM_ADVISE = N : cuMemAdvise() experiment is performed if "Y" with MEMORYMODE = UM , if "N" , it will not take place. THREAD_EXPERIMENT = N : Reduction of Redudant page fault experiments will take place if "Y" with MEMORYMODE = UM else no.

Execution:

Execution can be done executing simple make command after correct configuration. Different make commands are,

make nvl0 : Complete compilation and execution of the experiments for all the range of data size given in configuration file for NVLINK system.
make pcie0 : Complete compilation and execution of the experiments for all the range of data size given in configuration file for PCIe system.
make cmp0: Only compilation for NVLINK System.
make cmp1: Only compilation for PCIe system.

For individual execution, one needs to run following command,

./nvprog :

A : Data size for throughput / # of chasing elements for latency B : Stride length in elements C : # of Open mp threads D : 1 for throughput , 0 for latency

Output:

After execution, output will be stored inside /data/ folder as csv and text file with following naming conventions.

result-pchase-lat-nonum.txt or result-pchase-lat-nonum.csv - for zero copy latency experiment.

result-pchase-lat-um.txt or result-pchase-lat-um.csv - for Unified memory atency experiments.

result-stream-bw-nonmum.txt or result-stream-bw-nonmum.csv - zero-copy memory throughput output.

result-stream-bw-um.txt or result-stream-bw-um.csv - Unified memory throughput output.

result-stream-bw-memcpy.txt or result-stream-bw-memcpy.csv - cuda mem copy output.

For complete execution, data file from previous executions are always backed up.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
data		data
CUDAPtrchase.cu		CUDAPtrchase.cu
CUDAPtrchase.h		CUDAPtrchase.h
CUDAStream.cu		CUDAStream.cu
CUDAStream.h		CUDAStream.h
LICENSE		LICENSE
Makefile		Makefile
Ptrchase.h		Ptrchase.h
README.md		README.md
Stream.h		Stream.h
common.h		common.h
execute.c		execute.c
execute.sh		execute.sh
jobscr.sh		jobscr.sh
main.cpp		main.cpp
nvprog		nvprog
parameters.h		parameters.h
repeat.h		repeat.h

License

karnajitsen/NVLink-benchmark

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages