# CarrotTuba

Various number types for OpenCL.

code.mathr.co.uk/carrottuba

git clone https://code.mathr.co.uk/carrottuba.git

# 1 Real Types

fXXeYY has XX mantissa bits and YY exponent bits.

XX can be in 24, 32, 49, 53, 74, 99, 107, 161, 215, where higher precisions are made by “double-double” / “compensated” arithmetic techniques (several floats or doubles in a trenchcoat) or (for XX 32) software floating point based on uints.

YY can be in 8, 11, 16, 31, 32, where higher ranges are made by storing a separate exponent and keeping the mantissa in a small range.

Need to use canonicalize() regularly with extended range types.

# 2 Complex Types

cXXeYY is like fXXeYY but complex (a real part and an imaginary part).

By default extended range types have one extended exponent shared between both parts, which is much faster than making a complex number out of two extended range reals.

# 3 Dual Types

Dual number types have two parts, the .x part operates as normal, the .dx part computes derivatives

They can have different types, and to avoid the combinatorial explosion one needs to instantiate them on demand.

Typically one uses a high precision, low range type for .x and a low precision, high range type for .dx.

# 4 Arbitrary Precision

Natural numbers implemented with uint limbs. Multiplication uses Karatsuba algorithm for greater efficiency at very high precisions. Tricks are needed to unroll recursion (forbidden in OpenCL) to a statically-known maximum depth.

Signed fixed point numbers implemented with uint limbs (2s complement). Multiplication goes via natural as it is faster even though it needs to compute insignificant parts that will be truncated.

To be implemented: signed floating point numbers.

Memory for arbitrary values (including temporary space) is strided by thread count for efficiency on GPUs.

Probably it’s better to use algorithms with aymptotic efficiency improvements like perturbation techniques and bilinear approximation, rather than high precision calculations throughout. But maybe it’s useful for verifying correctness of those algorithms.

# 5 Examples

# 5.1 Mandelbrot

As OpenCL C / CPP macro hell for generic templating over types:

#define MANDELBROT(XR,XC,DXR,DXC,XDX) \
INLINE f24e8 f24e8_mandelbrot_##XDX(XDX c, int n) \
{ \
  XR R2 = XR##_ldexp_##XR##_int(XR##_one(), 8); \
  XDX z = XDX##_zero(); \
  for (int i = 0; i < n; ++i) \
  { \
    XR z2 = XR##_norm2_##XDX(z); \
    if (bool_gt_##XR##_##XR(z2, R2)) \
    { \
      DXR d2 = DXR##_norm2_##DXC(z.dx); \
      f24e8 fz2 = f24e8_set_##XR(z2); \
      f24e8 fd2 = f24e8_set_##DXR(d2); \
      return sqrt(fz2 / fd2) * log(fz2); \
    } \
    z = XDX##_add_##XDX##_##XDX(XDX##_sqr_##XDX(z), c); \
    z = XDX##_canonicalize_##XDX(z); \
  } \
  return 0; \
} \

Corresponding to C++ with operator overloading (untested pseudo-code):

template <typename T>
f24e8 mandelbrot(const T &c, int n)
{
  auto R2 (ldexp(T::norm_type(1), 8));
  T z (0);
  for (int i = 0; i < n; ++i)
  {
    auto z2 = norm2(z);
    if (z2 > R2)
    {
      auto d2 = norm2(z.dx);
      f24e8 fz2 (z2);
      f24e8 fd2 (d2);
      return sqrt(fz2 / fd2) * log(fz2);
    }
    z = sqr(z) + c;
    z = canonicalize(z);
  }
  return 0;
}

where T = dual<complex<compensated<4, f24e8>, extended<complex<f24e8>, i32>> or so.

# 5.2 Burning Ship

Burning Ship can be implemented with T = complex<dual<compensated<4, f24e8>, vector<2, extended<f24e8, i32>>>> or so. Need two dual parts for each each part of the complex, to handle non-conformal skewing (custom view skew matrix not yet implemented).

As OpenCL C / CPP macro hell for generic templating over types:

#define BURNINGSHIP(XR,XDXV,DXR,DXV,DXM,XDX) \
INLINE f24e8 f24e8_burningship_##XDX(XDX c, int n) \
{ \
  int i; \
  XR R2 = XR##_ldexp_##XR##_int(XR##_one(), 8); \
  XDX z = XDX##_zero(); \
  for (i = 0; i < n; ++i) \
  { \
    XR z2 = XR##_norm2_##XDX(z); \
    if (bool_gt_##XR##_##XR(z2, R2)) \
    { \
      DXV v = {{ DXR##_set_##XR(z.x.x), DXR##_set_##XR(z.y.x) }}; \
      DXM j = {{ z.x.dx, z.y.dx }}; \
      DXV u = DXV##_mmul_##DXM##_##DXV(DXM##_inverse_##DXM(DXM##_transpose_##DXM(j)), v); \
      DXR u2 = DXR##_norm2_##DXV(u); \
      f24e8 fz2 = f24e8_set_##XR(z2); \
      f24e8 fu2 = f24e8_set_##DXR(u2); \
      return sqrt(fu2) * log(fz2); \
    } \
    z.x = XDXV##_abs_##XDXV(z.x); \
    z.y = XDXV##_abs_##XDXV(z.y); \
    z = XDX##_add_##XDX##_##XDX(XDX##_sqr_##XDX(z), c); \
    z = XDX##_canonicalize_##XDX(z); \
  } \
  return 0; \
} \

Corresponding to C++ with operator overloading (untested pseudo-code):

template <typename T>
f24e8 burningship(const T &c, int n)
{
  auto R2 (ldexp(T::norm_type(1), 8));
  T z (0);
  for (int i = 0; i < n; ++i)
  {
    auto z2 = norm2(z);
    if (z2 > R2)
    {
      T::part_type::derivative_type v{ z.x.x, z.y.x };
      vector<2, T::part_type::derivative_type> j{ z.x.dx, z.y.dy };
      auto u = inverse(transpose(j)) * v;
      auto u2 = norm2(u);
      f24e8 fz2 (z2);
      f24e8 fu2 (u2);
      return sqrt(fu2) * log(fz2);
    }
    z.x = abs(z.x);
    z.y = abs(z.y);
    z = sqr(z) + c;
    z = canonicalize(z);
  }
  return 0;
}