mathr / blog / #

Throttling processes by GPU temperature

While rendering some GPU-intensive OpenGL stuff I got scared when my graphics card hit 90C so I paused the process until it had returned to something cooler. I got fed up pausing and restarting it by hand so I wrote this small script:

#!/bin/bash
kill -s SIGSTOP "${@}"
running=0
stop_threshold=85
cont_threshold=75
while true
do
  temperature="$(nvidia-smi -q -d TEMPERATURE | grep 'GPU Current Temp' | sed 's/^.*: \(.*\) C$/\1/')"
  if (( running ))
  then
    if (( temperature > stop_threshold ))
    then
      echo "STOP ${temperature} > ${stop_threshold}"
      kill -s SIGSTOP "${@}"
      running=0
    fi
  else
    if (( temperature < cont_threshold ))
    then
      echo "CONT ${temperature} < ${cont_threshold}"
      kill -s SIGCONT "${@}"
      running=1
    fi
  fi
  sleep 1
done |
ts

If you want to run it yourself, I advise checking the output from nvidia-smi on your system because its manual page says the format isn't stable. Moreover I suggest monitoring the temperature, at least until you're sure it's working ok for you. Usage is simple, just pass on the command line the PIDs of the processes you want to throttle by GPU temperature, typically these would be OpenGL applications (or Vulkan / OpenCL / CUDA / whatever else they come up with next).