Nvidia Nsight
Nsight Systems is a system-wide performance analysis that allow us to capture activity across the CPU, GPU, and OS! It uses low-overhead tracing/sampling to find bottlenecks: it can’t exactly tell you why there are slowdowns occurring, but where and when they occur.
Downloadable content
RTF report: nsys-rep