"compute-bound" & "memory-bound" kernels

"compute-bound" kernel spends most of its time in calculating, not accessing memory.

"memory-bound" kernels is divided into two kinds:
a) "bandwidth-bound", the transfer between device and global memory nearly reaches the limitation;
b) "latency-bound", fetching from the memory is the bottleneck.

Please refer following diagram: