# CarrotTuba
Various number types for OpenCL.
git clone https://code.mathr.co.uk/carrottuba.git
# 1 Real Types
fXXeYY
has XX
mantissa bits and YY
exponent bits.
XX
can be in 24, 32, 49, 53, 74, 99, 107, 161, 215,
where higher precisions are made by
“double-double” / “compensated” arithmetic techniques
(several floats or doubles in a trenchcoat)
or (for XX
32) software floating point based on uints.
YY
can be in 8, 11, 16, 31, 32,
where higher ranges are made by
storing a separate exponent and
keeping the mantissa in a small range.
Need to use canonicalize()
regularly with extended range types.
# 2 Complex Types
cXXeYY
is like fXXeYY
but complex
(a real part and an imaginary part).
By default extended range types have one extended exponent shared between both parts, which is much faster than making a complex number out of two extended range reals.
# 3 Dual Types
Dual number types have two parts,
the .x
part operates as normal,
the .dx
part computes derivatives
They can have different types, and to avoid the combinatorial explosion one needs to instantiate them on demand.
Typically
one uses a high precision, low range type for .x
and a low precision, high range type for .dx
.
# 4 Arbitrary Precision
Natural numbers implemented with uint limbs. Multiplication uses Karatsuba algorithm for greater efficiency at very high precisions. Tricks are needed to unroll recursion (forbidden in OpenCL) to a statically-known maximum depth.
Signed fixed point numbers implemented with uint limbs (2s complement). Multiplication goes via natural as it is faster even though it needs to compute insignificant parts that will be truncated.
To be implemented: signed floating point numbers.
Memory for arbitrary values (including temporary space) is strided by thread count for efficiency on GPUs.
Probably it’s better to use algorithms with aymptotic efficiency improvements like perturbation techniques and bilinear approximation, rather than high precision calculations throughout. But maybe it’s useful for verifying correctness of those algorithms.
# 5 Examples
# 5.1 Mandelbrot
As OpenCL C / CPP macro hell for generic templating over types:
#define MANDELBROT(XR,XC,DXR,DXC,XDX) \
INLINE f24e8 f24e8_mandelbrot_##XDX(XDX c, int n) \
{ \
XR R2 = XR##_ldexp_##XR##_int(XR##_one(), 8); \
XDX z = XDX##_zero(); \
for (int i = 0; i < n; ++i) \
{ \
XR z2 = XR##_norm2_##XDX(z); \
if (bool_gt_##XR##_##XR(z2, R2)) \
{ \
DXR d2 = DXR##_norm2_##DXC(z.dx); \
f24e8 fz2 = f24e8_set_##XR(z2); \
f24e8 fd2 = f24e8_set_##DXR(d2); \
return sqrt(fz2 / fd2) * log(fz2); \
} \
z = XDX##_add_##XDX##_##XDX(XDX##_sqr_##XDX(z), c); \
z = XDX##_canonicalize_##XDX(z); \
} \
return 0; \
} \
Corresponding to C++ with operator overloading (untested pseudo-code):
template <typename T>
f24e8 mandelbrot(const T &c, int n)
{
auto R2 (ldexp(T::norm_type(1), 8));
T z (0);
for (int i = 0; i < n; ++i)
{
auto z2 = norm2(z);
if (z2 > R2)
{
auto d2 = norm2(z.dx);
f24e8 fz2 (z2);
f24e8 fd2 (d2);
return sqrt(fz2 / fd2) * log(fz2);
}
z = sqr(z) + c;
z = canonicalize(z);
}
return 0;
}
where T = dual<complex<compensated<4, f24e8>, extended<complex<f24e8>, i32>>
or so.
# 5.2 Burning Ship
Burning Ship can be implemented with
T = complex<dual<compensated<4, f24e8>, vector<2, extended<f24e8, i32>>>>
or so.
Need two dual parts for each each part of the complex,
to handle non-conformal skewing
(custom view skew matrix not yet implemented).
As OpenCL C / CPP macro hell for generic templating over types:
#define BURNINGSHIP(XR,XDXV,DXR,DXV,DXM,XDX) \
INLINE f24e8 f24e8_burningship_##XDX(XDX c, int n) \
{ \
int i; \
XR R2 = XR##_ldexp_##XR##_int(XR##_one(), 8); \
XDX z = XDX##_zero(); \
for (i = 0; i < n; ++i) \
{ \
XR z2 = XR##_norm2_##XDX(z); \
if (bool_gt_##XR##_##XR(z2, R2)) \
{ \
DXV v = {{ DXR##_set_##XR(z.x.x), DXR##_set_##XR(z.y.x) }}; \
DXM j = {{ z.x.dx, z.y.dx }}; \
DXV u = DXV##_mmul_##DXM##_##DXV(DXM##_inverse_##DXM(DXM##_transpose_##DXM(j)), v); \
DXR u2 = DXR##_norm2_##DXV(u); \
f24e8 fz2 = f24e8_set_##XR(z2); \
f24e8 fu2 = f24e8_set_##DXR(u2); \
return sqrt(fu2) * log(fz2); \
} \
z.x = XDXV##_abs_##XDXV(z.x); \
z.y = XDXV##_abs_##XDXV(z.y); \
z = XDX##_add_##XDX##_##XDX(XDX##_sqr_##XDX(z), c); \
z = XDX##_canonicalize_##XDX(z); \
} \
return 0; \
} \
Corresponding to C++ with operator overloading (untested pseudo-code):
template <typename T>
f24e8 burningship(const T &c, int n)
{
auto R2 (ldexp(T::norm_type(1), 8));
T z (0);
for (int i = 0; i < n; ++i)
{
auto z2 = norm2(z);
if (z2 > R2)
{
T::part_type::derivative_type v{ z.x.x, z.y.x };
vector<2, T::part_type::derivative_type> j{ z.x.dx, z.y.dy };
auto u = inverse(transpose(j)) * v;
auto u2 = norm2(u);
f24e8 fz2 (z2);
f24e8 fu2 (u2);
return sqrt(fu2) * log(fz2);
}
z.x = abs(z.x);
z.y = abs(z.y);
z = sqr(z) + c;
z = canonicalize(z);
}
return 0;
}