Low precision high range numerics

Fractal rendering using perturbation techniques for deep zooming needs floating point number types with large exponent range. The number types available are typically float (8 bit exponent), double (11 bit exponent), long double (usually either x87 80bit or 128 bit quad double, depending on CPU architecture) (15 bit exponent). Even 15 bits is not enough for very deep zooms, so software implementations are useful. Two are analyzed here: floatexp (which is a float with a separate 32bit exponent), and softfloat (two unsigned 32bit ints, with one containing sign and 31bit exponent).

I measured the CPU time in seconds to render 1920x1080 unzoomed Mandelbrot set with 100 iterations and 16 subframes, using a C++ perturbation technique inner loop templated on number type, on a selection of CPU architectures:

	Ryzen 2700x desktop		Core 2 Duo laptop	AArch64 tablet		AArch64 raspi3		ArmHF raspi4
	clang-11	gcc-10	clang-11	clang-11	gcc-10	clang-11	gcc-10	clang-11	gcc-10
float	35.0	33.1	25.5	73.8	73.1	96.7	88.0	42.5	57.2
double	42.46	33.6	26.7	82.4	78.0	96.7	87.8	42.1	67.5
long double	88.8	42.1	49.9	1324	1343	1464	1464	n/a	n/a
floatexp	216	209	453	936	1138	1081	1332	655	1131
softfloat	172	120	265	634	670	762	770	365	606

Relative time vs float for each column:

	Ryzen 2700x desktop		Core 2 Duo laptop	AArch64 tablet		AArch64 raspi3		ArmHF raspi4
	clang-11	gcc-10	clang-11	clang-11	gcc-10	clang-11	gcc-10	clang-11	gcc-10
float	1	1	1	1	1	1	1	1	1
double	1.21	1.02	1.05	1.12	1.07	1	1	0.99	1.18
long double	2.54	1.27	1.96	17.9	18.4	15.1	16.6	n/a	n/a
floatexp	6.17	6.31	17.8	12.7	15.6	11.2	15.1	15.4	16.8
softfloat	4.91	3.63	10.4	8.59	9.17	7.88	8.75	8.59	10.6

Conclusions:

Use the fastest whose range is big enough.
On x86_64, the best types are float, double, long double, softfloat.
On aarch64, the best types are float, double, softfloat. Here long double is very slow.
On armhf, the best types are float, double, softfloat. Here long double appears to be an alias for double.
On no architecture is floatexp useful.
Code compiled with gcc is generally faster than clang, except for armhf.

Still to be investigated is the relative performance when using OpenCL (including both CPU and GPU devices), and when compiled for web assembly using Emscripten.