GPU Kernels

Feb 6, 2025

In Julia, GPU usage is already optimized for many processes through CUDA.jl. Simply by applying a function to a CuArray, operations (e.g. broadcasting and map-reducing) are executed on GPU-specialized code. Additionally, more complex tasks, such as operations in machine learning algorithms like self-attention, have optimized code through cuDNN.jl.

These are done through GPU kernels, which implement functions that exploit the CUDA architecture of GPUs.

Today, we will show how to write such kernels, for when already-optimized kernels do not already exist.

Downloadable content

Notebook: julia | pdf | html

Environement files: Manifest.toml | Project.toml

Julia scripts: demo_sandbox.jl | softmax_examples