mathr / blog / #

Low precision high range numerics

Fractal rendering using perturbation techniques for deep zooming needs floating point number types with large exponent range. The number types available are typically float (8 bit exponent), double (11 bit exponent), long double (usually either x87 80bit or 128 bit quad double, depending on CPU architecture) (15 bit exponent). Even 15 bits is not enough for very deep zooms, so software implementations are useful. Two are analyzed here: floatexp (which is a float with a separate 32bit exponent), and softfloat (two unsigned 32bit ints, with one containing sign and 31bit exponent).

I measured the CPU time in seconds to render 1920x1080 unzoomed Mandelbrot set with 100 iterations and 16 subframes, using a C++ perturbation technique inner loop templated on number type, on a selection of CPU architectures:

Ryzen 2700x desktopCore 2 Duo laptopAArch64 tabletAArch64 raspi3ArmHF raspi4
clang-11gcc-10clang-11clang-11gcc-10clang-11gcc-10clang-11gcc-10
float35.033.125.573.873.196.788.042.557.2
double42.4633.626.782.478.096.787.842.167.5
long double88.842.149.91324134314641464n/an/a
floatexp2162094539361138108113326551131
softfloat172120265634670762770365606

Relative time vs float for each column:

Ryzen 2700x desktopCore 2 Duo laptopAArch64 tabletAArch64 raspi3ArmHF raspi4
clang-11gcc-10clang-11clang-11gcc-10clang-11gcc-10clang-11gcc-10
float111111111
double1.211.021.051.121.07110.991.18
long double2.541.271.9617.918.415.116.6n/an/a
floatexp6.176.3117.812.715.611.215.115.416.8
softfloat4.913.6310.48.599.177.888.758.5910.6

Conclusions:

  • Use the fastest whose range is big enough.
  • On x86_64, the best types are float, double, long double, softfloat.
  • On aarch64, the best types are float, double, softfloat. Here long double is very slow.
  • On armhf, the best types are float, double, softfloat. Here long double appears to be an alias for double.
  • On no architecture is floatexp useful.
  • Code compiled with gcc is generally faster than clang, except for armhf.

Still to be investigated is the relative performance when using OpenCL (including both CPU and GPU devices), and when compiled for web assembly using Emscripten.