A couple of years ago I wrote about fixed-point numerics. The multiplication presented there, to multiply numbers with \(n\) limbs uses \(O(n^2)\) individual limb-by-limb multiplications, which isn't very efficient.

Multiplying two \(n\)-limb numbers gives a number with \(2n\) limb. For simplicity, assume \(n\) is a power of \(2\), say \(n = 2^k\). For \(x\) with \(n\) limbs, write \(\operatorname{hi}(x)\) to mean the top half of \(x\) with \(n/2\) limb, and \(\operatorname{lo}(x)\) to mean the bottom half of \(x\) with \(n/2\) limbs. Suppose the numbers represent fractions in \([0..1)\), that is, there are no limbs before the point and \(n\) limbs after. Full multiplication would give a fraction with \(2n\) limbs after the point, but maybe we don't care for the extra precision and truncating to the first \(n\) limbs would be perfectly fine. Maybe we don't even care about errors in the last few bits.

Writing \(x \times y\) for the full multiplication that gives \(2n\) bits, we want to compute \(\operatorname{hi}(x \times y)\) as efficiently as possible. To be precise, we want to minimize the number of limb-by-limb multiplications, because they are relatively expensive compared to additions.

As a baseline, here's a simple recursive implementation of full multiplication:

\(x \times y\) | \(=\) | \(\operatorname{hi}(x)\times\operatorname{hi}(y)\) | ... | ... | |

\(+\) | ... | \(\operatorname{hi}(x)\times\operatorname{lo}(y)\) | ... | ||

\(+\) | ... | \(\operatorname{lo}(x)\times\operatorname{hi}(y)\) | ... | ||

\(+\) | ... | ... | \(\operatorname{lo}(x)\times\operatorname{lo}(y)\) |

You can see that the cost of an \(n\)-limb multiply is \(4\) times the cost of an \(n/2\)-limb multiply, which means that the overall cost works out at \(n^2\) limb-by-limb multiplies.

Now assuming truncation and errors in the last few bits are fine, the recursive implementation becomes:

\(\operatorname{hi}(x \times y)\) | \(=\) | \(\operatorname{hi}(x)\times\operatorname{hi}(y)\) | ... | ... | |

\(+\) | ... | \(\operatorname{hi}(\operatorname{hi}(x)\times\operatorname{lo}(y))\) | ... | ... | |

\(+\) | ... | \(\operatorname{hi}(\operatorname{lo}(x)\times\operatorname{hi}(y))\) | ... | ... |

The cost of an \(n\)-limb truncated multiply is \(1\) times the cost of an \(n/2\)-limb full multiply plus \(2\) times the cost of an \(n/2\)-limb truncated multiply. Working out the first few terms \[1,3,10,36,136,528,\ldots\] shows that it has about half the cost of the full multiply's \[1,4,16,64,256,1024,\ldots\] and in fact the overall cost works out at \(n(n+1)/2\) limb-by-limb multiplies, a significant improvement! This truly is a clever trick!

However, for full multiplication there is a way to reduce multiplications even more dramatically. The trick is called Karatsuba multiplication and is based on the algebraic identity \[ (a + b)\times(u + v) = a \times u + b \times u + a \times v + b \times v \] Now set \(a = \operatorname{hi}(x), b = \operatorname{lo}(x), u = \operatorname{hi}(y), v = \operatorname{lo}(y) \) and notice that the four terms on the right are what we are computing with the first recursive implementation above.

Now rewrite the algebra to move terms around: \[ b \times u + a \times v = (a + b)\times(u + v) - (a \times u + b \times v) \] so we can replace two multiplications on the left by one multiplication on the right, with the two other multiplications on the right being things we need to compute anyway.

(There are two extra \(n/2\)-bit additions for \(a+b\) and \(u+v\), and one extra \(n\) bit-bit subtraction, which is guaranteed to give a non-negative result if all of \(a,b,u,v\) are non-negative, but the cost of limb-by-limb addition and subtraction are usually much less than limb-by-limb multiplication, and \(n\)-limb addition needs \(n\) limb-by-limb additions, which is small.)

This means the cost of \(n\)-limb full multiplication is \(3\) times the cost of \(n/2\)-limb full multiplication, the first terms are \[ 1,3,9,27,81,243,\ldots \] and the cost works out to \(n^{\log_2{3}}\), which is approximately \(n^{1.585}\). This reduced power (vs \(n^2\)) is an asymptotic improvement, while the truncation trick only improved constant factors. Example, suppose \(n = 256\), then original needs \(65536\) limb-by-limb multiplications giving \(512\) limbs, truncated needs \(32896\) limb-by-limb multiplications giving \(256\) limbs, Karatsuba needs \(6561\) limb-by-limb multiplications giving \(512\) limbs. So an order of magnitude improvement!

But! This assumes the truncated multiplication is using the original full multiplication. Replacing the full multiplication with the Karatsuba multiplication, we get the cost of \(n\)-limb truncated multiplication is \(1\) times the cost of \(n/2\)-limb Karatsuba multiplication, plus \(2\) times the cost of \(n/2\)-limb truncated multiplication. This works out as... exactly the same cost as \(n\)-limb Karatsuba multiplication, only with a less precise answer!

So the Karatsuba trick is very much cleverer than truncated multiply. A truncated multiply using Karatsuba might still be worth it: it needs half the space for output, and the number of limb-by-limb additions will be smaller. There are even more complicated multiplication algorithms out there than Karatsuba that get even better asymptotic efficiencies, go see how the GMP library does it.

Addendum: with 16-bit limbs, the cost of limb-by-limb multiply on a Motorola 68000 CPU (as present in Amiga A500(+)), is 54 cycles average case (worst case 70 cycles) add 12 cycles to load inputs into registers from memory), the cost of limb-by-limb addition with carry is 4 cycles (18 cycles if operating on memory directly, 30 cycles to operate on pairs of limbs (32-bits) in memory at a time). Reference: 68000 instruction timing.

]]>The Mandelbrot set has two prominent solid regions. There is a cardioid, which is associated with fixed point (period 1) attractors, and a circle to the left, which is associated with period 2 attractors. The rest of the cardioid- and circle-like components in the Mandelbrot set are distorted.

These shapes can be described as implicit functions. For example, the circle is centered on \(-1+0i\) and has radius \(\frac{1}{4}\), and the function \[C_2(x, y) = (x - (-1))^2 + (y - 0)^2 - \left(\frac{1}{4}\right)^2\] is negative inside, zero on the boundary, and positive outside, and the same applies to the more complicated function for the cardioid: \[C_1(x, y) = \left( \left(x - \frac{1}{4}\right)^2 + y^2 \right)^2 + \left(x - \frac{1}{4}\right) \left(\left(x - \frac{1}{4}\right)^2 + y^2\right) - \frac{1}{4} y^2 \]

These implicit functions can be used to accelerate Mandelbrot set rendering. You can test if each \(c=x+iy\) is in the cardioid or circle quickly and easily, saving iterating the pixel all the way to the maximum iteration count (being interior to the Mandelbrot set means iterations of \(z \to z^2 + c\) will never escape to infinity).

But we can accelerate further. If the whole viewing rectangle of a zoomed in view is far away from the circle and cardioid, then these per-pixel cardioid and circle tests are a waste of time, as they will never say they are inside. By analysing the coordinates of an axis-aligned bounding box (AABB) it's possible to decide when it's worth doing per pixel tests (that is, when the boundary of the shapes passes through the box - otherwise it's 100% interior or (more likely) 100% exterior to the shapes). This can save \(O(W H)\) work.

For example for the circle, if the lower edge of the box is above \(\frac{1}{4}\), or the upper edge of the box is below \(-\frac{1}{4}\), or the right edge of the box is left of \(-\frac{5}{4}\), or the left edge of the box is right of \(-\frac{3}{4}\), clearly it cannot overlap the circle. And if all corners are inside the circle, the whole box must be inside the circle. If some corners are inside and some are outside, then the boundary passes through.

But if all corners are outside it gets complicated: the box could be surrounding the whole circle, or a bulge of the circle could pop into an edge of the box. So the next step is to consider the vertices of the circle (the points most left/right/top/bottom): axis alignment means that if a bulge pops into the box, the vertex must lead the way. All in all there are many cases to consider, but it's not insurmountable.

Similarly for the cardioid, with the added complication that the vertices are not rational. However, squaring the coordinates does give dyadic rationals, so comparing \(y^2\) with \(\frac{3}{64}\) and \(\frac{27}{64}\) can do the trick.

For deep zooms, coordinates need high precision (lots of digits, most of which are the same for nearby pixels). Perturbation techniques mean using a high precision reference, with low precision differences to nearby points. This can also be applied to the implicit functions for the cardioid and circle: symbolically expand and cancel the large terms \(X, Y\) leaving only small terms of the scale of \(x, y\) in: \[c(X, Y, x, y) = C(X + x, Y + y) - C(X, Y)\] then evaluate \(C(X, Y)\) in high precision, round it to low precision, and add \(c(X, Y, x, y)\) evaluated in low precision. For accuracy, some coefficients in \(c\) will need to be calculated at high precision before rounding to low precision.

For example, the cardioid: \[c_1(X, Y, x, y) = a_x x + a_y y + a_{x^2} x^2 + a_{xy} xy + a_{y^2} y^2 + a_{x^3} x^3 + a_{x^2y} x^2y + a_{xy^2} xy^2 + a_{y^3} y^3 + x^4 + 2x^2y^2 + y^4 \] where \[a_x = (32XY^2+32X^3-6X+1)/8; a_y = (32Y^3+(32X^2-6)Y)/8; a_{x^2} = (16Y^2+48X^2-3)/8; \ldots \] I used wxMaxima to find all of the \(a\) coefficients, and calculate them one time per view using the reference, along with \(C_1(X, Y)\). For accuracy, with fixed point calculations you need about 4 times the number of fractional bits for intermediate calculations, and the values at low precision need to be relatively high accuracy (in my tests 24 bits was not enough to achieve good images, 53 seemed ok, and I use 64 bits just to be safe).

The previous discussion about rejecting interior checks for the whole view can be applied with perturbation too, but some magic numbers need to be calculated at high precision before rounding to low precision, namely the special points (vertices and cusps): \[ X + \frac{5}{4}, X + 1, X + \frac{3}{4}, X + \frac{1}{8}, X - \frac{1}{4}, X - \frac{3}{8} \] and \[ Y + \frac{1}{4}, Y - \frac{1}{4}, Y^2 - \frac{3}{64}, Y^2 - \frac{27}{64} \] the addends are all dyadic rationals so can be represented exactly in binary fixed point or floating point.

The circle and cardioid also have parametric forms. Here's the cardioid: \[C_1(t) = \left(\frac{\sin(t)^2-(\cos(t)-1)^2+1}{4}, \frac{(\cos(t)-1)\sin(t)}{2}\right) \] If you could work out the distance to the nearest point of the curve, then all views with a smaller circumradius and same center would be 100% exterior (or 100% interior). For the cardioid, considering that the dot product of the tangent of the curve at the nearest point and the vector from the point to the nearest point of the curve must be zero (perpendicular), it means solving \[(4\cos(t)-4\cos(2t))y+(4\sin(2t)-4\sin(t))x-\sin(t)=0\] which can be rearrange to a 9 or 10 degree polynomial in \(\tan(t)\) using trigonometric identities. This is altogether a hard problem to solve, most practical is bisection of the trigonometric form on segments between by \(t = k\frac{\pi}{3}\). Linear distance estimate using Taylor expansion gives a closed form \(d(x,y)\), but it's not accurate, especially near the cusp. Quadratic Taylor expansion gives a high degree polynomial to solve. On the other hand, exact distance to a circle is easy.

On math.stackexchange.com, user Vincent asked a question that caught my attention:

I wanted to test different Networks with the same number of parameters but with different depths and widths.

A good introduction is in
Neural Networks and Deep Learning,
but in short: a neural network is essentially a chain of matrix
multiplies with non-linear "activation functions" in between (which
don't change the length of the vector of data passing through).
Matrices are 2D grids of numbers. Matrices can only be multiplied
together if the number of columns of the first matrix is equal to the
number of rows of the second matrix, so you can express the "shape" of
the neural network by a vector of positive integers, where each pair of
neighbouring values corresponds to the dimensions of one of the
matrices. The total number of parameters of the neural network is the
sum of the products of each such pair, for example the shape [3,7,5,1]
has size 3×7 + 7×5 + 5×1 = 61, which in the programming language Haskell
can be written `size = sum (zipWith (*) shape (tail shape))`

.

The question is essentially asking that, given the size, to construct some different shapes with that size, in particular shapes of different length (corresponding to different depths of network). But to see how the problem looks, I decided to generate all possible shapes for a given size, starting with some hopelessly naive Haskell code:

import Control.Monad (replicateM) shapes p = [ s | m <- [0..p] , s <- replicateM (m + 2) [1..p] , p == sum (zipWith (*) s (tail s)) ] main = mapM_ (print . length . shapes) [1..]

This code does "work", but as soon as the size p gets large, it takes forever and runs out of memory. On my desktop which has 32GB of RAM, I can only print 8 terms before OOM, which takes about 6m37s. These terms are:

1, 3, 5, 10, 14, 27, 37, 65

So a smarter solution is needed. I decided to implement it in C, because it's easier to do mutation there than Haskell. The core of the algorithm is the same as the Haskell above, but with one important addition: pruning. If the sum exceeds p before the end of the shape vector is reached, it doesn't make any difference what the suffix is: because all the dimensions are positive, the sum can never get smaller again.

I loop through depths (length of shape) until the maximum depth, as in the Haskell, and for each depth I start with a shape of all 1s, with last element 0 as an exception (it will be incremented before use). Each iteration of a loop, add 1 to the last element of the shape, if it gets bigger than the target p I set it back to 1 and propagate a carry 1 to the previous element (and so on). If the carry propagates beyond the first element, that means we've searched the whole shape space and we exit the loop.

Pruning is implemented by accumulating the sum from the left. If the sum of the first P products exceed the target, then set the whole shape vector starting from the (P+1)th index to the target, so at the next iteration of the loop, the last one is incremented and they all wrap around to 1, with the Pth item eventually incremented by 1. If the sum of all the products is equal to P, increment a counter (which is output at the end). I verified that the first 8 terms output with pruning matched the 8 terms output by my Haskell (without pruning), which is not a rigourous proof that the pruning is valid, but does increase confidence somewhat.

Because the C algorithm uses much less space than the Haskell (I do not know why that is so bad), and is much more efficient (due to pruning), it's possible to calculate many more terms. So much so that issues of numeric overflow come into play. Using unsigned 8 bit types for numbers allows only 11 terms to be calculated, because the counter overflows (term 12 is 384 > 2^8-1 = 255). The terms increase rapidly, so I decided to use 64 bits unsigned for the counter, which should be enough for the forseeable future (and just in case, I do check for overflow (the counter would wrap above 2^64-1) and report the error).

For the other values like shape dimensions I used the C preprocessor
with macro passed in at compile time to choose the number of bits used,
and check each numeric operation for overflow. For example, with 8 bits
trying to calculate the number with p = 128 fails almost immediately,
because the product 128 * 2 should be 256 > 2^8-1. Overflow checking is
coming soon to the C23 standard library, but for older compilers there
are `__builtin_add_overflow(a,b,resultp)`

and
`__builtin_mul_overflow(a,b,resultp)`

that do the job in the
version of gcc that I have.

However, even with all these optimisations it's still really slow,
because it takes time at least the order of the output count (because it
is only incremented by 1 each time), and the output count grows rapidly.
It took around 2 hours to calculate the first 45 terms. Just by
counting lines as the number of digits increases, I could see that
increasing the size by 5 multiplies the count by about 10, so the
asymptotics are about O(10^{p/5}). Here's a plot:

Two hours to calculate 45 terms is terrible, and I don't really need to calculate the actual shapes if I'm only interested in how many there are. So I started from scratch: how does the count change when you combine shapes. To do this I scribbled some diagrams, at first trying to combine two arbitrary shapes end-to-end, but that ended in failure. Success came when I considered the basic shape of length 2 (a single matrix) and considered what happens when appending an item. Then I made this work in reverse. Switching back to Haskell because it has good support for (unbounded) Integer, and good memoization libraries, I came up with this:

import Data.MemoTrie -- package MemoTrie on Hackage -- count shapes of length n with p parameters that start with a and end with y count = mup memo3 $ \n p a y -> case () of _ | n <= 1 || p <= 0 || a <= 0 || y <= 0 -> 0 -- too small | n == 2 && p /= a * y -> 0 -- singleton matrix mismatch | n == 2 && p == a * y -> 1 -- singleton matrix matches | otherwise -> -- take element y off the end leaving new end x sum [ count (n - 1) q a x | x <- [1 .. p] , let q = p - y * x , q > 0 ] total p = sum [ count n p a y | n <- [2.. p + 2], a <- [1 .. p], y <- [1 .. p] ]

This works so much faster, that 45 terms takes about 10 seconds using about 0.85 GB of RAM (and the results output are the same). Calculating just the 100th term (which is 28457095794860418935) took about 220 seconds using about 8.8 GB of RAM, but if you calculate terms in sequence the memoization means values calculated earlier can be reused, speeding the whole thing up: calculating the 101th term (which is 44259654087259419852) as well as the 100th term in one run of the program took about 260 seconds using about 16.4 GB of RAM. Calculating the first 100 terms in one run took about 390 seconds using about 20 GB of RAM.

A long-winded digression about fitting and residuals without the images that would make it comprehensible - gnuplot crashed before I could save them, losing its history in the process...

Using gnuplot I fit a straight line to the (natural) logarithm of the data points, which matched up pretty well, provided I skip the first few numbers (I'm only really interested in the asymptotics for large x so I think that's a perfectly reasonable thing to do):

gnuplot> fit [20:100] a*x+b "data.txt" using 0:(log($1)) via a, b ... Final set of parameters Asymptotic Standard Error ======================= ========================== a = 0.441677 +/- 1.781e-06 (0.0004033%) b = 1.06893 +/- 0.0001137 (0.01064%) ... gnuplot> print a 0.441677054485047 gnuplot> print b 1.0689289118356

Investigating the residuals showed an interesting pattern, they oscillate around zero getting smaller rapidly, until about x=35, after which they're all smaller than 0.000045. But then they are positive for a while and gradually decreasing to a minimum at x=73 or so, then the sign changes and they start increasing again. I thought with a better fit curve the oscillations would continue getting smaller, but I'm not sure how much more data I need. With the upper fit range limit set to 100, the asymptotic standard errors decrease by about a factor of 10 when I raise the lower fit range limit by 10. With the lower fit range limit set to 90, the earlier residuals are similar to those with the wider fit range limit, and the later residuals continue to oscillate getting smaller exponentially (magnitudes form a straight line on a graph with logarithmic scale).

gnuplot> fit [90:100] a*x+b "data.txt" using 0:(log($1)) via a, b ... Final set of parameters Asymptotic Standard Error ======================= ========================== a = 0.441676 +/- 4.801e-12 (1.087e-09%) b = 1.06901 +/- 4.539e-10 (4.246e-08%) ... gnuplot> print a 0.441675976841257 gnuplot> print b 1.0690075089073

Fitting a curve to the residuals seemed to improve things a bit, the final form of the function I came up with was something like:

\[ \exp(a p + b + (-1)^p \exp(c p + d)) \]

with a and b as above and

c = -0.246716071616843 d = -1.18363693943899

The residuals of that with the data show no simple pattern - there is some low frequency oscillation at the left end? Not sure. I still don't really know enough statistics to analyze this kind of thing.

Moving back to the original problem, neural networks often have a bias term. This means an extra element that is always 1 is appended to the input vector of each stage, but not the output. This means that for example the matrices in the shape [a,b,c] would have dimensions (a+1)×b and (b+1)×c, instead of a×b and b×c without bias.

In Haskell one would calculate
`sum (zipWith (*) (map (+ 1) s) (tail s))`

, even though the
naive Haskell is slow I did it to get the first few terms for validation
purposes. I added it to the C version, and got some more terms, and
finally updated the fast Haskell version. Here are the first few terms
as calculated by naive Haskell before it ran out of memory:

0, 1, 1, 3, 2, 6, 5, 9

which match those calculated by C which isn't terribly slow here, at
least to start with, because the growth of the output is much less
rapid, more like O(10^{p/9}):

The fast (memoizing) Haskell took about 420 seconds to calculate the first 100 terms using about 19.7 GB of RAM. It took about 17 seconds to calculate the first 50 terms using about 0.82 GB of RAM. The C took about 15 seconds (slightly faster) to calculate the first 50 terms, using less than about 1.4 MB of RAM (almost nothing in comparison). The 50th term is 538759 (both programs match), the 100th term is 240740950572. The C slows down by at least the size of the output, which is exponential in the input, so I didn't wait to see how long it would take to calculate 100 terms.

Here you can download the code for these experiments, and the output data tables:

- Makefile
- exhaustive.hs slow Haskell that exhausts RAM
- pruning.c slow C that needs almost no RAM
- memoizing.hs fast Haskell that needs lots of RAM
- unbiased.txt 100 terms starting 1, 3, 5, 10, ... (bias = 0)
- biased.txt 100 terms starting 0, 1, 1, 3, ... (bias = 1)
- plots.gnuplot source code for plots
- unbiased.png logarithmic plot (100 terms, bias = 0)
- biased.png logarithmic plot (100 terms, bias = 1)

Since last year's article on deep zoom theory and practice, two new developments have been proposed by Zhuoran on fractalforums.org: Another solution to perturbation glitches.

The first technique ("**rebasing**"), explained in the
first post of the forum thread, means resetting the reference iteration to the
start when the pixel orbit (i.e. \(Z+z\), the reference plus delta) gets
near a critical point (like \(0+0i\) for the Mandelbrot set). If there is
more than one critical point, you need reference orbits starting at each
of them, and this test can switch to a different reference orbit. For
this case, pick the orbit \(o\) that minimizes \(|(Z-Z_o)+z|\), among the
current reference orbit at iteration whatever, and the critical point
orbits at iteration number \(0\). Rebasing means you only need as many
reference orbits as critical points (which for simple formulas like the
Mandelbrot set and Burning Ship means only one), and glitches are avoided
rather than detected, needing to be corrected later. This is a big
boost to efficiency (which is nice) and correctness (which is much more
important).

Rebasing also works for hybrids, though you need more reference orbits, because the reference iteration can be reset at any phase in the hybrid loop. For example, if you have a hybrid loop of "(M,BS,M,M)", you need reference orbits for each of "(M,BS,M,M)", "(BS,M,M,M)", "(M,M,M,BS)" and "(M,M,BS,M)". Similarly if there is a pre-periodic part, you need references for each iteration (though for a zoomed in view, the minimum escaping iteration in the image determines whether they will be used in practice): "M,M,(BS,BS,M,BS)" needs reference orbits "M,M,(BS,BS,M,BS)", "M,(BS,BS,M,BS)" and the four rotations of "(BS,BS,M,BS)". Each of these phases needs as many reference orbits as the starting formula has critical points. As each reference orbit calculation is intrinsically serial work, and modern computers typically have many cores, the extra wall-clock time taken by the additional references is minimal because they can be computed in parallel.

The second technique ("**bilinear approximation**") is
only hinted at in the thread. If you have a deep zoom, the region of
\(z\) values starts very small, and bounces around the plane typically
staying small and close together, in a mostly linear way, except for
when the region gets close to a critical point (e.g. \(x=0\) and \(y=0\) for the
Mandelbrot set) or line (e.g. either \(x=0\) or \(y=0\) for the Burning Ship),
when non-linear stuff happens (like complex squaring, or absolute
folding). For example for the Mandelbrot set, the perturbed iteration

\[ z \to 2 Z z + z^2 + c \]

when \(Z\) is not small and \(z\) is small, can be approximated by

\[ z \to 2 Z z + c \]

which is linear in \(z\) and \(c\) (two variables call this "bilinear"). In particular, this approximation is valid when \( z^2 << 2 Z z + c \), which can be rearranged with some handwaving (for critical point at \(0\)) to

\[ z < r = \max\left(0, \epsilon \frac{\left|Z\right| - \max_{\text{image}} \left\{|c|\right\}}{\left|J_f(Z)\right| + 1}\right) \]

where \(\epsilon\) is the hardware precision (e.g. \(2^{-24}\)), and \(J_f(Z) = 2Z\) for the Mandelbrot set. For Burning Ship replace \(|Z|\) with \(\min(|X|,|Y|)\) where \(Z=X+iY\). In practice I divide \(|Z|\) by \(2\) just to be extra safe. For non-complex-analytic functions I use the operator norm for the Jacobian matrix, implemented in C++ by:

template <typename real> inline constexpr real norm(const mat2<real> &a) { using std::max; using std::sqrt, ::sqrt; const mat2<real> aTa = transpose(a) * a; const real T = trace(aTa); const real D = determinant(aTa); return (T + sqrt(max(real(0), sqr(T) - 4 * D))) / 2; } template <typename real> inline constexpr real abs(const mat2<real> &a) { using std::sqrt, ::sqrt; return sqrt(norm(a)); }

This gives a bilinear approximation for one iteration, which is not so useful. The acceleration comes from combining neighbouring BLAs into a BLA that skips many iterations at once. For neighbouring BLAs \(x\) and \(y\), where \(x\) happens first in iteration order, skipping \(l\) iterations via \(z \to A z + B c\), one gets:

\[\begin{aligned} l_{y \circ x} &= l_y + l_x \\ A_{y \circ x} &= A_y A_x \\ B_{y \circ x} &= A_y B_x + B_y \\ r_{y \circ x} &= \min\left(r_x, \max\left(0, \frac{r_y - |B_x| \max_{\text{image}}\left\{|c|\right\}}{|A_x|}\right) \right) \end{aligned}\]

This is a bit handwavy again, higher order terms of Taylor expansion are probably necessary to get a bulletproof radius calculation, but it seems to work ok in practice.

For a reference orbit iterated to \(M\) iterations, one can construct a BLA table with \(2M\) entries. The first level has \(M\) 1-step BLAs for each iteration, the next level has \(M/2\) combining neighbours (without overlap), the next \(M/4\), etc. It's best for each level to start from iteration \(1\), because iteration \(0\) is always starting from a critical point (which makes the radius of BLA validity \(0\)). Now when iterating, pick the BLA that skips the most iterations, among those starting at the current reference iteration that satisfy \(|z| < |r|\). In between, if no BLA is valid, do regular perturbation iterations, rebasing as required. You need one BLA table for each reference orbit, which can be computed in parallel (and each level of reduction can be done in parallel too, perhaps using OpenCL on GPU).

BLA is an alternative to series approximation for the Mandelbrot set, but it's conceptually simpler, easier to implement, easier to parallelize, has better understood stopping conditions, is more general (applies to other formulas like Burning Ship, hybrids, ...) - need to do benchmarks to see how it compares speed-wise before declaring an overall winner.

It remains to research the BLA initialisation for critical points not at \(0\), and check rebasing with multiple critical points: so far I've only actually implemented it for formulas with a single critical point at \(0\), so there may be bugs or subtleties lurking in the corners.

]]>Fractal rendering using perturbation techniques for deep zooming needs floating point number types with large exponent range. The number types available are typically float (8 bit exponent), double (11 bit exponent), long double (usually either x87 80bit or 128 bit quad double, depending on CPU architecture) (15 bit exponent). Even 15 bits is not enough for very deep zooms, so software implementations are useful. Two are analyzed here: floatexp (which is a float with a separate 32bit exponent), and softfloat (two unsigned 32bit ints, with one containing sign and 31bit exponent).

I measured the CPU time in seconds to render 1920x1080 unzoomed Mandelbrot set with 100 iterations and 16 subframes, using a C++ perturbation technique inner loop templated on number type, on a selection of CPU architectures:

Ryzen 2700x desktop | Core 2 Duo laptop | AArch64 tablet | AArch64 raspi3 | ArmHF raspi4 | |||||
---|---|---|---|---|---|---|---|---|---|

clang-11 | gcc-10 | clang-11 | clang-11 | gcc-10 | clang-11 | gcc-10 | clang-11 | gcc-10 | |

float | 35.0 | 33.1 | 25.5 | 73.8 | 73.1 | 96.7 | 88.0 | 42.5 | 57.2 |

double | 42.46 | 33.6 | 26.7 | 82.4 | 78.0 | 96.7 | 87.8 | 42.1 | 67.5 |

long double | 88.8 | 42.1 | 49.9 | 1324 | 1343 | 1464 | 1464 | n/a | n/a |

floatexp | 216 | 209 | 453 | 936 | 1138 | 1081 | 1332 | 655 | 1131 |

softfloat | 172 | 120 | 265 | 634 | 670 | 762 | 770 | 365 | 606 |

Relative time vs float for each column:

Ryzen 2700x desktop | Core 2 Duo laptop | AArch64 tablet | AArch64 raspi3 | ArmHF raspi4 | |||||
---|---|---|---|---|---|---|---|---|---|

clang-11 | gcc-10 | clang-11 | clang-11 | gcc-10 | clang-11 | gcc-10 | clang-11 | gcc-10 | |

float | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

double | 1.21 | 1.02 | 1.05 | 1.12 | 1.07 | 1 | 1 | 0.99 | 1.18 |

long double | 2.54 | 1.27 | 1.96 | 17.9 | 18.4 | 15.1 | 16.6 | n/a | n/a |

floatexp | 6.17 | 6.31 | 17.8 | 12.7 | 15.6 | 11.2 | 15.1 | 15.4 | 16.8 |

softfloat | 4.91 | 3.63 | 10.4 | 8.59 | 9.17 | 7.88 | 8.75 | 8.59 | 10.6 |

Conclusions:

- Use the fastest whose range is big enough.
- On x86_64, the best types are float, double, long double, softfloat.
- On aarch64, the best types are float, double, softfloat. Here long double is very slow.
- On armhf, the best types are float, double, softfloat. Here long double appears to be an alias for double.
- On no architecture is floatexp useful.
- Code compiled with gcc is generally faster than clang, except for armhf.

Still to be investigated is the relative performance when using OpenCL (including both CPU and GPU devices), and when compiled for web assembly using Emscripten.

]]>The Mandelbrot set \(M\)is formed by iterations of the function \(z \to z^2 + c\) starting from \(z = 0\). If this remains bounded for a given \(c\), then \(c\) is in \(M\), otherwise (if it escapes to infinity) then \(c\) is not in \(M\). The interior of \(M\) is characterized by collections of cardioid-like and disk-like shapes, these are hyperbolic components each associated with an integer, its period. For the \(c\) at the center of each component, if the period is \(p\) then after \(p\) iterations, \(z\) returns to \(0\), and the iterations repeat (hence the name period). For points in the complex plane (either in \(M\) or not) sufficiently near to a hyperbolic compoenent of period \(p\), \(|z|\) reaches a new minimum (discounting the initial \(z=0\)) at iteration \(p\). The region for which this is true is called the atom domain associated with the hyperbolic component.

To find the center (sometimes called nucleus) of a hyperbolic component, one can use Newton's root-finding method in one complex variable. Iterate the derivative with respect to \(c\) along with \(z\) (using \(\frac{\partial z}{\partial c} \to 2 \frac{\partial z}{\partial c} z + 1\)) for \(p\) iterations, then update \(c \to c - \frac{z}{\frac{\partial z}{\partial c}}\) until it converges. However, Newton's method requires a good initial guess for \(c\). As there are multiple roots, and if you are near to a root Newton's method brings you nearer to it, there must be a boundary where which root is reached depends sensitively on the initial guess. It turns out (if there are more than 2 roots) that the boundary is fractal, and for any point on the boundary, an arbitrarily small neighbourhood will be spread to all the roots. See 3blue1brown's videos on YouTube about the Newton fractal for further information. Which comes to my conjecture:

Conjecture: points in the complement of the Mandelbrot in an atom domain of period \(p\) are good initial guesses for Newton's method to find the root of period \(p\) at the center of that atom domain.

It turns out that **this conjecture is false**. The proof
is by counter-example. The counter-example is the period \(18\) island
with angled internal address \(1 \to_{1/2} 2 \to_{1/8} 16 \to_{1/2} 18\),
whose upper external angle is \(.(010101010101100101)\) when
expressed in binary. I found this counter-example by brute force search:
for every period increasing from \(1\), trace every ray pair of that
period until the endpoints reach the atom domain. Then from each use
Newton's method to find the center of the hyperbolic component. Compare
the two centers reached, if they aren't the same then we have found a
counter-example. Here is a picture:

The Mandelbrot set is shown in black, using distance estimation to make its filaments visible. The fractal boundary of the Newton basins of period \(18\) is shown in white. The atom domain is shown in red. The complement of the Mandelbrot set is shown with a binary decomposition grid that follows the external rays and equipotentials. You can see that the path of the ray that goes from the cusp of the mini Mandelbrot island will intersect the Newton basin boundary at the corner of the atom domain, so that the eventual point of convergence of Newton's method is unpredictable. In my experiment it converged to the child bulb with angled internal address \(1 \to_{1/2} 2 \to_{1/9} 18\).

The above image was using regular Newton's method, without factoring out the roots of lower period that divide the target period. With the reduced polynomials, the basins are typically a little bigger, but in this case it made no difference and the problem persists with this counter-example:

I uploaded a short video showing the counter-example with both variants of Newton's method:

1280x360p60 MP4 (42MB), 0m30s, no sound

You can download the FragM source code for the visualisation.

This counter-example shows that the strategy of tracing rays until the atom domain is reached, before switching to Newton's method to find the root, is unsafe. A guaranteed safe strategy remains to be investigated.

]]>The Mandelbrot set fractal is formed by iterations of \(z \to z^2 + c\) where \(c\) is determined by the coordinates of the pixel and \(z\) starts at the critical point \(0\). The critical point is where the derivative w.r.t. \(z\) is \(0\). The image is usually coloured by how quickly \(z\) escapes to infinity, regions that remain bounded forever are typically coloured black. It has a distinctive shape, with a cardioid adjoined by many disks decreasing in size, each with further disks attached. Looking closely there are smaller "island" copies of the whole, but they are actually attached to the main "continent" by thin filaments.

The Newton fractal is formed by applying Newton's method to the cube roots of unity, iterating \(z \to z - \frac{z^3 - 1}{3z^2}\) where the initial \(z\) is determined by the coordinates of the pixel. The image is usually coloured by which of the 3 complex roots of unity is reached, with brightness and saturation showing how quickly it converged. It has its own distinctive shape, with three rays extending from the origin towards infinity separating the immediate basins of attraction, each with an intricate shape: at each point on the Julia set, different points in an arbitrarily small neighbourhood will converge to all three of the roots.

The Nova fractal mashes these two fractals together: the Newton fractal is perturbed with a \(+ c\) term that is determined by the pixel, and the iteration starts from a critical point (any of the cube roots of unity). The image is coloured by how quickly the iteration converges to a fixed point (a different point for each pixel) and points that don't converge (or converge to a cycle of length greater than 1) are usually coloured black. The fractal appearance combines features of the Newton fractal and the Mandelbrot set, with mini-Mandelbrot set islands appearing in the filaments.

Deep zooms of the Mandelbrot set can be rendered efficiently using perturbation techniques: consider \[z = (Z+z) - Z \to ((Z + z)^2 + (C + c)) - (Z^2 + C) = (2 Z + z) z + c \] where \(Z, C\) is a "large" high precision reference and \(z, c\) is a "small" low precision delta for nearby pixels. Going deeper one can notice "glitches" around mini-Mandelbrot sets when the reference is not suitable, but these can be detected with Pauldelbrot's criterion \(|Z+z| << |Z|\), at which point you can use a different reference that might be more appropriate for the glitched pixels.

Trying to do the same thing for the Nova fractal works at first, but going deeper (to about \(10^{30}\) zoom factor) it breaks down and glitches occur that are not fixed by using a nearer reference. These glitches are due to the non-zero critical point recurring in the periodic mini-Mandelbrot sets: precision loss occurs when mixing tiny values with large values. They also occur when affine-conjugating the quadratic Mandelbrot set to have a critical point away from zero (e.g. \(z \to z^2 - 2 z + c\) has a critical point at \(z = 1\)). Affine-conjugation means using an affine function \(m(z) = a z + b\) to conjugate two functions \(f, F\) like \(m(f(z)) = F(m(z))\).

The solution is to affine-conjugate the Nova fractal formula, to move the starting critical point from \(1\) to \(0\). One way of doing it gives the modified Nova formula \[ z \to \frac{ \frac{2}{3} z^3 - 2 z - 1 }{ (z + 1)^2 } + c + 1 \] which seems to work fine when going beyond \(10^{30}\) at the same locations where the variant with critical point \(z=1\) fails. For example, see the ends of the following two short videos:

**Nova zoom (without glitches)**

1920x1080p60 MP4 (327MB), 2m00s, no sound

**Nova zoom (feat. glitches)**

640x360p60 MP4 (76MB), 2m06s, no sound

]]>In a previous post I presented a neat form for the series approximation coefficient updates for the quadratic Mandelbrot set. Note that the series for the derivative is redundant, as you can just take the derivative of the main series term by term. Recently superheal on fractalforums.org asked about series approximation for other powers. I explored a bit with SageMath (which is based on Python) and came up with this code (empirically derived formula highlighted):

def a(k): return var("a_" + str(k)) m = 10 for p in range(2,5): f(c, z) = z^p + c print(f) g(C, Z, c, z) = (f(C + c, Z + z) - f(C, Z)).expand().simplify() print(g) h0(C, Z, c) = sum( a(k) * c^k for k in range(1, m) ).series(c, m) print(h0) h1(C, Z, c) = g(C, Z, c, h0(C, Z, c)).expand().series(c, m) print(h1) def x(C, Z, c, k): return h1(C, Z, c).coefficient(c^k)def y(C, Z, c, k): return ((1 if k == 1 else 0) + sum([binomial(p, t) * Z^(p-t) * sum([Permutations(qs).cardinality() * product([a(q) for q in qs]) for qs in Partitions(k, length=t).list()]) for t in range(1, k + 1)]))for k in range(1, m): print((x(C, Z, c, k) - y(C, Z, c, k)).full_simplify(), " : ", a(k)," := ", x(C, Z, c, k)) print()

Example output:

(c, z) |--> z^2 + c (C, Z, c, z) |--> 2*Z*z + z^2 + c (C, Z, c) |--> (a_1)*c + (a_2)*c^2 + (a_3)*c^3 + (a_4)*c^4 + (a_5)*c^5 + (a_6)*c^6 + (a_7)*c^7 + (a_8)*c^8 + (a_9)*c^9 + Order(c^10) (C, Z, c) |--> (2*Z*a_1 + 1)*c + (a_1^2 + 2*Z*a_2)*c^2 + (2*a_1*a_2 + 2*Z*a_3)*c^3 + (a_2^2 + 2*a_1*a_3 + 2*Z*a_4)*c^4 + (2*a_2*a_3 + 2*a_1*a_4 + 2*Z*a_5)*c^5 + (a_3^2 + 2*a_2*a_4 + 2*a_1*a_5 + 2*Z*a_6)*c^6 + (2*a_3*a_4 + 2*a_2*a_5 + 2*a_1*a_6 + 2*Z*a_7)*c^7 + (a_4^2 + 2*a_3*a_5 + 2*a_2*a_6 + 2*a_1*a_7 + 2*Z*a_8)*c^8 + (2*a_4*a_5 + 2*a_3*a_6 + 2*a_2*a_7 + 2*a_1*a_8 + 2*Z*a_9)*c^9 + Order(c^10) 0 : a_1 := 2*Z*a_1 + 1 0 : a_2 := a_1^2 + 2*Z*a_2 0 : a_3 := 2*a_1*a_2 + 2*Z*a_3 0 : a_4 := a_2^2 + 2*a_1*a_3 + 2*Z*a_4 0 : a_5 := 2*a_2*a_3 + 2*a_1*a_4 + 2*Z*a_5 0 : a_6 := a_3^2 + 2*a_2*a_4 + 2*a_1*a_5 + 2*Z*a_6 0 : a_7 := 2*a_3*a_4 + 2*a_2*a_5 + 2*a_1*a_6 + 2*Z*a_7 0 : a_8 := a_4^2 + 2*a_3*a_5 + 2*a_2*a_6 + 2*a_1*a_7 + 2*Z*a_8 0 : a_9 := 2*a_4*a_5 + 2*a_3*a_6 + 2*a_2*a_7 + 2*a_1*a_8 + 2*Z*a_9 (c, z) |--> z^3 + c (C, Z, c, z) |--> 3*Z^2*z + 3*Z*z^2 + z^3 + c (C, Z, c) |--> (a_1)*c + (a_2)*c^2 + (a_3)*c^3 + (a_4)*c^4 + (a_5)*c^5 + (a_6)*c^6 + (a_7)*c^7 + (a_8)*c^8 + (a_9)*c^9 + Order(c^10) (C, Z, c) |--> (3*Z^2*a_1 + 1)*c + (3*Z*a_1^2 + 3*Z^2*a_2)*c^2 + (a_1^3 + 6*Z*a_1*a_2 + 3*Z^2*a_3)*c^3 + (3*a_1^2*a_2 + 3*Z*a_2^2 + 6*Z*a_1*a_3 + 3*Z^2*a_4)*c^4 + (3*a_1*a_2^2 + 3*a_1^2*a_3 + 6*Z*a_2*a_3 + 6*Z*a_1*a_4 + 3*Z^2*a_5)*c^5 + (a_2^3 + 6*a_1*a_2*a_3 + 3*Z*a_3^2 + 3*a_1^2*a_4 + 6*Z*a_2*a_4 + 6*Z*a_1*a_5 + 3*Z^2*a_6)*c^6 + (3*a_2^2*a_3 + 3*a_1*a_3^2 + 6*a_1*a_2*a_4 + 6*Z*a_3*a_4 + 3*a_1^2*a_5 + 6*Z*a_2*a_5 + 6*Z*a_1*a_6 + 3*Z^2*a_7)*c^7 + (3*a_2*a_3^2 + 3*a_2^2*a_4 + 6*a_1*a_3*a_4 + 3*Z*a_4^2 + 6*a_1*a_2*a_5 + 6*Z*a_3*a_5 + 3*a_1^2*a_6 + 6*Z*a_2*a_6 + 6*Z*a_1*a_7 + 3*Z^2*a_8)*c^8 + (a_3^3 + 6*a_2*a_3*a_4 + 3*a_1*a_4^2 + 3*a_2^2*a_5 + 6*a_1*a_3*a_5 + 6*Z*a_4*a_5 + 6*a_1*a_2*a_6 + 6*Z*a_3*a_6 + 3*a_1^2*a_7 + 6*Z*a_2*a_7 + 6*Z*a_1*a_8 + 3*Z^2*a_9)*c^9 + Order(c^10) 0 : a_1 := 3*Z^2*a_1 + 1 0 : a_2 := 3*Z*a_1^2 + 3*Z^2*a_2 0 : a_3 := a_1^3 + 6*Z*a_1*a_2 + 3*Z^2*a_3 0 : a_4 := 3*a_1^2*a_2 + 3*Z*a_2^2 + 6*Z*a_1*a_3 + 3*Z^2*a_4 0 : a_5 := 3*a_1*a_2^2 + 3*a_1^2*a_3 + 6*Z*a_2*a_3 + 6*Z*a_1*a_4 + 3*Z^2*a_5 0 : a_6 := a_2^3 + 6*a_1*a_2*a_3 + 3*Z*a_3^2 + 3*a_1^2*a_4 + 6*Z*a_2*a_4 + 6*Z*a_1*a_5 + 3*Z^2*a_6 0 : a_7 := 3*a_2^2*a_3 + 3*a_1*a_3^2 + 6*a_1*a_2*a_4 + 6*Z*a_3*a_4 + 3*a_1^2*a_5 + 6*Z*a_2*a_5 + 6*Z*a_1*a_6 + 3*Z^2*a_7 0 : a_8 := 3*a_2*a_3^2 + 3*a_2^2*a_4 + 6*a_1*a_3*a_4 + 3*Z*a_4^2 + 6*a_1*a_2*a_5 + 6*Z*a_3*a_5 + 3*a_1^2*a_6 + 6*Z*a_2*a_6 + 6*Z*a_1*a_7 + 3*Z^2*a_8 0 : a_9 := a_3^3 + 6*a_2*a_3*a_4 + 3*a_1*a_4^2 + 3*a_2^2*a_5 + 6*a_1*a_3*a_5 + 6*Z*a_4*a_5 + 6*a_1*a_2*a_6 + 6*Z*a_3*a_6 + 3*a_1^2*a_7 + 6*Z*a_2*a_7 + 6*Z*a_1*a_8 + 3*Z^2*a_9 (c, z) |--> z^4 + c (C, Z, c, z) |--> 4*Z^3*z + 6*Z^2*z^2 + 4*Z*z^3 + z^4 + c (C, Z, c) |--> (a_1)*c + (a_2)*c^2 + (a_3)*c^3 + (a_4)*c^4 + (a_5)*c^5 + (a_6)*c^6 + (a_7)*c^7 + (a_8)*c^8 + (a_9)*c^9 + Order(c^10) (C, Z, c) |--> (4*Z^3*a_1 + 1)*c + (6*Z^2*a_1^2 + 4*Z^3*a_2)*c^2 + (4*Z*a_1^3 + 12*Z^2*a_1*a_2 + 4*Z^3*a_3)*c^3 + (a_1^4 + 12*Z*a_1^2*a_2 + 6*Z^2*a_2^2 + 12*Z^2*a_1*a_3 + 4*Z^3*a_4)*c^4 + (4*a_1^3*a_2 + 12*Z*a_1*a_2^2 + 12*Z*a_1^2*a_3 + 12*Z^2*a_2*a_3 + 12*Z^2*a_1*a_4 + 4*Z^3*a_5)*c^5 + (6*a_1^2*a_2^2 + 4*Z*a_2^3 + 4*a_1^3*a_3 + 24*Z*a_1*a_2*a_3 + 6*Z^2*a_3^2 + 12*Z*a_1^2*a_4 + 12*Z^2*a_2*a_4 + 12*Z^2*a_1*a_5 + 4*Z^3*a_6)*c^6 + (4*a_1*a_2^3 + 12*a_1^2*a_2*a_3 + 12*Z*a_2^2*a_3 + 12*Z*a_1*a_3^2 + 4*a_1^3*a_4 + 24*Z*a_1*a_2*a_4 + 12*Z^2*a_3*a_4 + 12*Z*a_1^2*a_5 + 12*Z^2*a_2*a_5 + 12*Z^2*a_1*a_6 + 4*Z^3*a_7)*c^7 + (a_2^4 + 12*a_1*a_2^2*a_3 + 6*a_1^2*a_3^2 + 12*Z*a_2*a_3^2 + 12*a_1^2*a_2*a_4 + 12*Z*a_2^2*a_4 + 24*Z*a_1*a_3*a_4 + 6*Z^2*a_4^2 + 4*a_1^3*a_5 + 24*Z*a_1*a_2*a_5 + 12*Z^2*a_3*a_5 + 12*Z*a_1^2*a_6 + 12*Z^2*a_2*a_6 + 12*Z^2*a_1*a_7 + 4*Z^3*a_8)*c^8 + (4*a_2^3*a_3 + 12*a_1*a_2*a_3^2 + 4*Z*a_3^3 + 12*a_1*a_2^2*a_4 + 12*a_1^2*a_3*a_4 + 24*Z*a_2*a_3*a_4 + 12*Z*a_1*a_4^2 + 12*a_1^2*a_2*a_5 + 12*Z*a_2^2*a_5 + 24*Z*a_1*a_3*a_5 + 12*Z^2*a_4*a_5 + 4*a_1^3*a_6 + 24*Z*a_1*a_2*a_6 + 12*Z^2*a_3*a_6 + 12*Z*a_1^2*a_7 + 12*Z^2*a_2*a_7 + 12*Z^2*a_1*a_8 + 4*Z^3*a_9)*c^9 + Order(c^10) 0 : a_1 := 4*Z^3*a_1 + 1 0 : a_2 := 6*Z^2*a_1^2 + 4*Z^3*a_2 0 : a_3 := 4*Z*a_1^3 + 12*Z^2*a_1*a_2 + 4*Z^3*a_3 0 : a_4 := a_1^4 + 12*Z*a_1^2*a_2 + 6*Z^2*a_2^2 + 12*Z^2*a_1*a_3 + 4*Z^3*a_4 0 : a_5 := 4*a_1^3*a_2 + 12*Z*a_1*a_2^2 + 12*Z*a_1^2*a_3 + 12*Z^2*a_2*a_3 + 12*Z^2*a_1*a_4 + 4*Z^3*a_5 0 : a_6 := 6*a_1^2*a_2^2 + 4*Z*a_2^3 + 4*a_1^3*a_3 + 24*Z*a_1*a_2*a_3 + 6*Z^2*a_3^2 + 12*Z*a_1^2*a_4 + 12*Z^2*a_2*a_4 + 12*Z^2*a_1*a_5 + 4*Z^3*a_6 0 : a_7 := 4*a_1*a_2^3 + 12*a_1^2*a_2*a_3 + 12*Z*a_2^2*a_3 + 12*Z*a_1*a_3^2 + 4*a_1^3*a_4 + 24*Z*a_1*a_2*a_4 + 12*Z^2*a_3*a_4 + 12*Z*a_1^2*a_5 + 12*Z^2*a_2*a_5 + 12*Z^2*a_1*a_6 + 4*Z^3*a_7 0 : a_8 := a_2^4 + 12*a_1*a_2^2*a_3 + 6*a_1^2*a_3^2 + 12*Z*a_2*a_3^2 + 12*a_1^2*a_2*a_4 + 12*Z*a_2^2*a_4 + 24*Z*a_1*a_3*a_4 + 6*Z^2*a_4^2 + 4*a_1^3*a_5 + 24*Z*a_1*a_2*a_5 + 12*Z^2*a_3*a_5 + 12*Z*a_1^2*a_6 + 12*Z^2*a_2*a_6 + 12*Z^2*a_1*a_7 + 4*Z^3*a_8 0 : a_9 := 4*a_2^3*a_3 + 12*a_1*a_2*a_3^2 + 4*Z*a_3^3 + 12*a_1*a_2^2*a_4 + 12*a_1^2*a_3*a_4 + 24*Z*a_2*a_3*a_4 + 12*Z*a_1*a_4^2 + 12*a_1^2*a_2*a_5 + 12*Z*a_2^2*a_5 + 24*Z*a_1*a_3*a_5 + 12*Z^2*a_4*a_5 + 4*a_1^3*a_6 + 24*Z*a_1*a_2*a_6 + 12*Z^2*a_3*a_6 + 12*Z*a_1^2*a_7 + 12*Z^2*a_2*a_7 + 12*Z^2*a_1*a_8 + 4*Z^3*a_9

You can try it online. The first column should be all 0 if the formula is correct, which it seems to be for all the cases I've tried. But it remains to be proven rigourously that it is correct for all terms of all powers.

Efficiently implementing the general series coefficient update formula would probably need a multi-stage process: first (one-time cost given power and number of terms) calculate tables of constants (binomials, partitions as (index, multiplicity) pairs, partition permutation cardinalities). Second stage (once per iteration) calculate tables of powers of series coefficient variables (only going as far as the highest needed multiplicity for each index), and powers of Z. Third stage (once per series coefficient variable) combine all the powers and constants. Final stage, add 1 to the first variable.

It should be possible to generate OpenCL code for this at runtime. The expressions for each variable are of very different sizes but bundling a_k with a_(m-k) might give a more uniform load per execution unit.

]]>Newton's method can be used to trace external rays in the Mandelbrot set. See:

An algorithm to draw external rays of the Mandelbrot set

Tomoki Kawahira

April 23, 2009

AbstractIn this note I explain an algorithm to draw the external rays of the Mandelbrot set with an error estimate. Newton’s method is the main tool. (I learned its principle by M. Shishikura, but this idea of using Newton’s method is probably well-known for many other people working on complex dynamics.)

The algorithm uses \(S\) points in each dwell band, this number is called the "sharpness". Increasing the sharpness presumably makes the algorithm more robust when using the previous ray point \(c_n\) as the initial guess for Newton's method to find the next ray point \(c_{n+1}\) as the points are closer together.

I hypothesized that it might be better (faster) to use a different method for choosing the initial guess for the next ray point. I devised 3 new methods in addition to the existing one:

- nearest
- \( c_{n+1} := c_n \)
- linear
- \( c_{n+1} := c_n + (c_n - c_{n-1}) \)
- hybrid
- \( c_{n+1} := c_n + (c_n - c_{n-1}) \left| \frac{c_n - c_{n-1}}{c_{n-1} - c_{n-2}} \right| \)
- geometric
- \( c_{n+1} := c_n + \frac{(c_n - c_{n-1})^2}{c_{n-1} - c_{n-2}} \)

I implemented the methods in a branch of my mandelbrot-numerics repository:

git clone https://code.mathr.co.uk/mandelbrot-numerics.git cd mandelbrot-numerics git checkout exray-methods git diff HEAD~1

I wrote a test program for real-world use of ray tracing, namely tracing rays of preperiod + period ~= 500 to dwell ~1000, with all 4 methods and varying sharpness. I tested for correctness by comparing with the previous method, which was known to work well with sharpness around 4 through 8.

Results were disappointing. The hybrid and geometric methods failed in all cases, no matter the sharpness. The linear method failed for sharpness below 7, but when it worked (sharpness 7 or 8) it took about 68% of the time of the nearest method. However, the nearest method at sharpness 4 took 62% of the time of nearest at sharpness 8, so this is not so impressive.

The nearest method seemed to work all the way down to sharpness 2, which was surprising, and warrants further investigation: nearest at sharpness 2 took only 41% of the time of nearest at sharpness 8, if it turns out to be reliable this would be a good speed boost.

You can download my raw data.

Reporting this failed experiment in the interests of science.

]]>*Old Wood Dish* (2010) by James W. Morris is a fractal artwork,
a zoomed-in view of part of the Mandelbrot set. The magnification factor
of 10^{152} is quite shallow by today's standards, but in 2010 the
perturbation and series approximation techniques for speeding up image
generation had not yet been developed: this is a deep zoom for that era.
Thankfully JWM's (now defunct) gallery included the parameter files, the
image linked above is a high resolution re-creation in Kalle's Fraktaler,
thanks to a parameter file conversion script I wrote. You can find out
more about JWM's software MDZ and see more of his images on my
mirror of part of his old website.

*Old Wood Dish* is an example of what would now be called
"Julia morphing", using the property that zooming in towards baby Mandelbrot
set islands doubles-up (and then quadruples, octuples, ...) the features
you pass. This allows you to sculpt patterns, here the pattern has a tree
structure.

Each baby Mandelbrot set islands has a positive integer associated to
it: its period. Iteration of the center of its cardioid repeats with that
period, returning to 0. Atom periods are "near miss" periods, where the
iteration gets nearer to 0 than it ever did before. They indicate a nearby
baby Mandelbrot set island (or child bulb) of that period.
The atom periods of the center of *Old Wood Dish* are:

1, 2, 34, 70, 142, 286,574,862, 1438, 2878, 5758

One can see a pattern: 2 * 34 + 2 = 70; 2 * 70 + 2 = 142; 2 * 142 + 2 = 286. But this pattern is broken at the numbers highlighted: 2 * 574 + 2 = 1150 != 862.

Using Newton's root-finding method in one complex variable,
one can find the nearby baby Mandelbrot sets with those periods. When zooming
out, these eventually each become the lowest period island in the view in turn
(higher periods are closer to the starting point), and the zoom level at which
this happens is usually significant in terms of the decisions made when performing
Julia morphing. These zoom levels (log base 10) for *Old Wood Dish* are:

0.114, 0.591, 4.69, 8.44, 14.0, 22.4, 30.8, 43.4, 66.6, 101, 152

and the successive ratios of these numbers are

5.15, 7.94, 1.79, 1.66, 1.59,1.37,1.40, 1.53, 1.52, 1.50

Repeated Julia morphing leads to these ratios tending to a constant (often 1.5), but the two numbers highlighted are clearly outside the curve: one can see that these correspond to the two mismatching periods. I'll have to ask him to see if this was intentional or an accident.

A list of atom domain periods is related to a concept called an
internal address, which is an ascending list of the lowest periods of
the hyperbolic components (cardioid-like or disk-like shapes) that you
pass through along the filaments on the way to the target from the origin.
An extension, angled internal addresses, removes the ambiguity of which
way to turn (for example, there are two period 3 bulbs attached to the
period 1 cardioid, they have internal angles 1/3 and 2/3). One can find
angled internal addresses by converting from external angles, and one
can find external angles by tracing rays outwards from a point towards
infinity. The angled internal address of *Old Wood Dish* starts:

1 1/2 2 16/17 33 1/2 34 1/3 69 1/2 70 1/3 141 1/2 142 1/3 285 1/2 286 ...

and the pattern can be extended indefinitely by

... 1/3 (p-1) 1/2 p 1/3 (2p+1) 1/2 (2p+2) ...

The numerators of the angles in an angled internal address can be varied
freely, so one can create a whole family of variations. Varying the 1/3 to
2/3 only changes the alignments of the decorations outside the tree structure,
but varying 16/17 changes the shapes that tree is built from. Here are
*Old Wood Dish* variations 1-16, with the irregular zoom pattern
adjusted to a fully-regular zoom ending up with period 9214:

I found the center coordinates for these images by tracing external rays towards each period 9214 inner island. This took almost 5 hours wall-clock time with 16 threads in parallel (one for each ray). I then found the approximate view radius by atom domain size raised to the power of 1.125, multiplied by 10. These magic numbers were found by exploring shallower versions graphically. Using this radius I used Newton's method again, to find the pair of period 13820 minibrots at the first junctions near the center. I found these periods using KF-2.15.3's newly improved Newton zooming dialog. I used their locations to rotate and scale all the images for precise alignment. Animated it looks quite hypnotic I think:

Software used:

- mandelbrot-numerics m-describe program and script to get rough idea of period structure;
- mandelbrot-perturbator GTK program to explore the shallow layers and trace external rays to find external angles;
- mandelbrot-symbolics Haskell library in GHCI REPL to convert (both directions) between external angles and angled internal addresses;
- mandelbrot-numerics m-exray-in program to trace rays inwards given external angles;
- mandelbrot-numerics m-nucleus program to find periodic root from ray end point;
- mandelbrot-numerics m-domain-size program to find approximate view size;
- kf-2.15.3 interactive Newton zooming dialog to find period of the first junction nodes;
- custom code in C to align views, using mandelbrot-numerics library;
- custom code in bash shell to combine everything into KFS+KFR files;
- kf-2.15.3 command line mode to render each KFS+KFR to very large TIFF files;
- ImageMagick convert program to downscale for anti-aliasing (PNG for web, and smaller GIFs);
- gifsicle program to combine the 16 frames into 1 animated GIF.

The complex beauty of the world's most famous fractal, the Mandelbrot set, emerges from the repeated iteration of a simple formula:

\[z \to z^2 + c\]

Zooming into the intricate boundary of the shape reveals ever more detail, but one needs higher precision numbers and higher iteration counts as you go deeper. The computational cost rises quickly with the classical rendering algorithms which use high precision numbers for each pixel.

In 2013, K.I. Martin's SuperFractalThing and accompanying white paper sft_maths.pdf popularized a pair of new acceleration techniques. First one notes that the formula \(z \to z^2 + c\) is continuous, so nearby points remain nearby under iteration. This means you can iterate one point at high precision (the reference orbit) and compute differences from the reference orbit for each pixel in low precision (the perturbed orbits). Secondly, iterating the perturbed formula one ends up with a polynomial series in the initial pertubation in \(c\), which depends only on the reference. The degree rises rapidly but you can truncate it to get an approximation. This means you can compute the series approximation coefficients once, and substitute in the perturbed \(c\) values for each pixel, allowing you to initialize the perturbed orbits at a later iteration, skipping potentially lots of per-pixel work.

The perturbation technique has since been extended to the Burning Ship fractal and other "abs variations", and it also works for hybrid fractals combining iterations of several formulas.

Prerequisites for the rest of this article: a familiarity with complex numbers and algebraic manipulation; knowing how to draw the unzoomed Mandelbrot set; understanding the limitations of computer implementation of numbers (see for example Hardware Floating Point Types).

In the remainder of this post, lower case and upper case variables with the same letter mean different things. Upper case means unperturbed or reference, usually high precision or high range. Lower case means perturbed per pixel delta, low precision and low range.

In perturbation, on starts with the iteration formula [1]:

\[Z \to Z^2 + C\]

Perturb the variables with unevaluated sums [2]:

\[(Z + z) \to (Z + z)^2 + (C + c)\]

Do symbolic algebra to avoid the catastrophic absorption when adding tiny values \(z\) to large values \(Z\) (e.g. 1 million plus 1 is still 1 million if you only have 3 significant digits to work with) [3]:

\[z \to 2 Z z + z^2 + c\]

\(C, Z\) is the "reference" orbit, computed in high precision using [1] and rounded to machine double precision, which works fine most of the time. \(c, z\) are the "pixel" orbit, you can do many of these near each reference (e.g. an entire image).

There is a problem that can be noticed when you zoom deeper near certain features in the fractal. There are parts that can have a "noisy" appearance, or there may be weird flat blobs that look out of place. These are the infamous perturbation glitches. It was observed that adding references in the glitches and recomputing the pixels could fix them, but there was no reliable way to detect them programmatically until Pauldelbrot discovered/invented a method: Perturbation Theory Glitches Improvement.

The solution: if [4]:

\[|Z+z| << |Z|\]

at any iteration, then glitches can occur. The solution: retry with a new reference, or (for well-behaved formulas like the Mandelbrot set) rebase to a new reference and carry on.

Perturbation assumes exact maths, but some images have glitches when naively using perturbation in low precision. Pauldelbrot found his glitch criterion by perturbing the perturbation iterations: one has perturbed iteration as in [3] (recap: \(z \to 2 Z z + z^2 + c\)). Then one perturbs this with \(z \to z + e, c \to c + f\) [5]:

\[e \to (2 (Z + z) + e) e + f\]

We are interested what happens to the ratio \(e/z\) under iteration, so rewrite [3] as [6]:

\[z \to (2 Z + z) z + c\]

Pattern matching, the interesting part (assuming \(c\) and \(f\) are small) of \(e/z\) is \(2(Z + z) / 2 Z\). When \(e/z\) is small, the nearby pixels "stick together" and there is not enough precision in the number type to distinguish them, which makes a glitch. So a glitch can be detected when [7]:

\[|Z + z|^2 < G |Z|^2\]

where G is a threshold (somewhere between 1e-2 and 1e-8, depending how strict you want to be). This does not add much cost, as \(|Z+z|^2\) already needs to be computed for escape test, and \(G|Z^2|\) can be computed once for each iteration of the reference orbit and stored.

The problem now is: How to choose G? Too big and it takes forever as glitches are detected all over, too small and some glitches can be missed leading to bad images.

The glitched pixels can be recalculated with a more appropriate reference point: more glitches may result and adding more references may be necessary until the image is finished.

Double precision floating point (with 53 bits of mantissa) is more than enough for computing perturbed orbits: even single precision (with 24 bits) can be used successfully. But when zooming deeper another problem occurs: double precision has a limited range, once values get smaller than about 1e-308 then they underflow to 0. This means perturbation with double precision can only zoom so far, as eventually the perturbed deltas are smaller than can be represented.

An early technique for extending range is to store the mantissa as a double precision value, but normalized to be near 1 in magnitude, with a separate integer to store the exponent. This floatexp technique works for arbitrarily deep zooms, but the performance is terrible because it needs to handle every arithmetic operation in software (instead of them being a single CPU instruction).

The solution for efficient performance turned out to be using an unevaluated product (compare with the unevaluated sum of perturbation) to rescale the double precision iterations to be nearer 1 and avoid underflow: substitute \(z = S w\) and \(c = S d\) to get [8]:

\[S w \to 2 Z S w + S^2 w^2 + S d\]

and now cancel out one scale factor \(S\) throughout [9]:

\[w \to 2 Z w + S w^2 + d\]

Choose \(S\) so that \(|w|\) is around \(1\). When \(|w|\) is at risk of overflow (or underflow) after some iterations, redo the scaling; this is typically a few hundred iterations as \(|Z|\) is bounded by \(2\) except at final escape.

Optimization: if \(S\) underflowed to \(0\) in double precision, you don't need to calculate the \(+ S w^2\) term at all when \(Z\) is not small. Similarly you can skip the \(+ d\) if it underflowed. For higher powers there will be terms involving \(S^2 w^3\) (for example), which might not need to be calculated either due to underflow. Ideally these tests would be performed once at rescaling time, instead of in every inner loop iteration (though they would be highly predictable I suppose).

There is a problem: if \(|Z|\) is very small, it can underflow to \(0\) in unscaled double in [9]. One needs to store the full range \(Z\) and do a full range (e.g. floatexp) iteration at those points, because \(|w|\) can change dramatically. Rescaling is necessary afterwards. This was described by Pauldelbrot: Rescaled Iterations in Nanoscope.

To do the full iteration, compute \(z = S w\) in floatexp (using a floatexp for \(S\) so that there is no underflow), do the perturbed iteration [3] with all variables in floatexp. To rescale afterwards, compute \(S = |z|\) and \(w = z/S, d = c/S\) (computed in floatexp with \(w\) and \(d\) rounded to double precision afterwards). Then a double precision \(s\) can be computed for use in [9].

The Burning Ship fractal modifies the Mandelbrot set formula by taking absolute values of the real and imaginary parts before the complex squaring [10]:

\[X + i Y \to (|X| + i |Y|)^2 + C\]

When perturbing the Burning Ship and other "abs variations", one ends up with things like [11]:

\[|XY + Xy + xY + xy| - |XY|\]

which naively gives \(0\) by catastrophic absorption and cancellation. laser blaster made a case analysis Perturbation Formula for Burning Ship which can be written as [12]:

diffabs(c, d) := |c+d| - |c| = c >= 0 ? c + d >= 0 ? d : -(2*c+d) : c + d > 0 ? 2*c+d : -d

when \(d\) is small the \(\pm d\) cases are much more likely. With rescaling in the mix [11] works out as [13]:

\[\operatorname{diffabs}(XY/s, Xy + xY + sxy)\]

which has the risk of overflow when \(s\) is small, but the signs work out ok even for infinite \(c\) as \(d\) is known to be finite. Moreover, if \(s = 0\) due to underflow, the \(\pm d\) branches will always be taken (except when \(XY\) is small, when a full floatexp iteration will be performed instead), and as \(s \ge 0\) by construction, [13] reduces to [14]:

\[\operatorname{sign}(X) * \operatorname{sign}(Y) * (X y + x Y)\]

(Note: this formulation helps avoid underflow in \(\operatorname{sign}(XY)\) when \(X\) and \(Y\) are small.)

For well-behaved functions like the Mandelbrot set iterations, one needs to do full iterations when \(Z\) gets small. For the Burning Ship and other abs variations, this is not sufficient: problems occur if either X and Y are small, not only when both are small at the same time. Full iterations need to be done when either variable is small. This makes rescaled iterations for locations near the needle slower than just doing full floatexp iterations all the time (because of the extra wasted work handling the rescaling). This is because near the needle all the iterations have Y near 0, which means floatexp iterations will be done anyway. Using floatexp from the get go avoids many branches and rescaling in the inner loop, so it's significantly faster. The problem is worse in single precision because it has much less range: it underflows below 1e-38 or so, rather than 1e-308 for double precision.

The problem of automatically detecting these "deep needle" locations (which may be in the needles of miniships) and switching implementations to avoid the extra slowdown remains unresolved in KF.

The Mandelbrot set has lovely logarithmic spirals all over, and the Burning Ship has interesting "rigging" on the miniships on its needle. Hybridization provide a way to get both these features in a single fractal image. The basic idea is to interleave the iteration formulas, for example alternating between [1] and [10], but more complicated interleavings are possible (eg [1][10][1][1] in a loop, etc).

Hybrid fractals in KF are built from stanzas, each has some lines, each line has two operators, and each operator has controls for absolute x, absolute y, negate x, negate y, integer power \(p\), complex multiplier \(a\). The two operators in a line can be combined by addition, subtraction or multiplication, and currently the number of lines in a stanza can be either 1 or 2 and there can be 1, 2, 3 or 4 stanzas. The output of each line is fed into the next, and at the end of each stanza the +c part of the formula happens. There are controls to choose how many times to repeat each stanza, and which stanza to continue from after reaching the end.

Implementing perturbation for this is quite methodical. Start from an operator, with inputs \(Z\) and \(z\). Set mutable variables:

z := input W := Z + z B := Z

If absolute x enabled in formula, then update

re(z) := diffabs(re(Z), re(z)) re(W) := abs(W) re(B) := abs(B)

Similarly for the imaginary part. If negate x enabled in formula, then update

re(z) := -re(z) W := -W B := -B

Similarly for the imaginary part. Now compute

\[S = \sum_{i=0}^{p-1} W^i B^{p-1 - i}\]

and return \(a z S\). Combining operators into lines may be done by Perturbation Algebra. Combining lines into stanzas can be done by iterating unperturbed \(Z\) alongside perturbed \(z\); only the \(+C\) needs high precision, and that is not done within a stanza.

Rescaling hybrid iterations seems like a big challenge, but it's not that hard: if either or both the real and imaginary parts of the reference orbit \(Z\) are small, one needs to do a full range iteration with floatexp and recalculate the scale factor afterwards, as with formulas like Burning Ship. Otherwise, thread \(s\) through from the top level down to the operators. Initialize with

W := Z + z*s

and modify the absolute cases to divide the reference by \(s\):

re(z) := diffabs(re(Z/s), re(z))

Similarly for imaginary part. When combining operators (this subterm only occurs with multiplication) replace \(f(o_1, Z + z)\) with \(f(o_1, Z + z s)\).

And that's almost all the changes that need to be made!

For distance estimation of hybrid formulas I use dual numbers for automatic differentiation. One small adjustment was needed for it to work with rescaled iterations: instead of initializing the dual parts (before iteration) with 1 and scaling by the pixel spacing at the end for screen-space colouring, initialize the dual parts with the pixel spacing and don't scale at the end. This avoids overflow of the derivative, and the same rescaling factor can be used for regular and dual parts.

Naive implementations of parametric hybrids are very slow due to all the branches in the inner loops (checking if absolute x enabled at every iteration for every pixel, etc). Using for example OpenCL, these branches can be done once when generating source code for a formula, instead of every iteration for every pixel. This runs much faster, even when compiled to run on the same OpenCL device that is interpreting the parametric code.

The other part of the thing that K I Martin's SuperFractalThing popularized was that iteration of [3] gives a polynomial series in \(c\) [15]:

\[z_n = \sum A_{n,k} c^k\]

(with 0 constant term). This can be used to "skip" a whole bunch of iterations, assuming that truncating the series and/or low precision doesn't cause too much trouble. Substituting [15] into [3] gives [16]:

\[\sum A_{n+1,k} c^k = 2 Z \sum A_{n,k} c^k + (\sum A_{n,k} c^k)^2 + c\]

Equating coefficients of \(c^k\) gives recurrence relations for the series coefficients \(A_{n,k}\). See Simpler Series Approximation.

The traditional way to evaluate that it's ok to do the series approximation at an iteration is to check whether it doesn't deviate too far from regular iterations (or perturbed iterations) at a collection of "probe" points. When it starts to deviate, roll back an iteration and initialize all the image pixels with [15] at that iteration.

Later, knighty extended the series approximation to two complex variables. If the reference \(C\) is a periodic point (for example the center of a minibrot), the biseries in \(z, c\) allows skipping a whole period of iterations. Then multiple periods can be skipped by repeating the biseries step. This gives a further big speedup beyond regular series approximation near minibrots. An escape radius is needed for \(z\), based on properties of the reference, so as not to perform too many biseries iterations. After that, regular perturbed iterations are performed until final escape. This is available in KF as NanoMB1.

Current research by knighty and others involves a chain of minibrots at successively deeper zoom levels. One starts with the deepest minibrot, performing biseries iterations until it escapes its \(z\) radius. Then rebase the iterates to the next outer minibrot, and perform biseries iterations with that. Repeat until final escape. This is available in KF as NanoMB2, but it's highly experimental and fails for many locations. Perhaps it needs to be combined with more perturbation or higher precision: sometimes the iterates may still be too close to each other when they escape a deep minibrot, such that catastrophic absorption occurs. In progress...

For Burning Ship and other abs variations (and presumably hybrids too), series approximation can take the form of two bivariate real series in \(\Re(c)\) and \(\Im(c)\) for the real and imaginary parts of \(z\). But these are only good so long as the region is not folded by an absolute value, so typically only a few iterations can be skipped. Maybe the series can be split into two (or more) parts with the other side(s) shifted when this occurs? In progress...

Perturbation techniques that greatly reduce the quantity of high precision iterations needed, as well as (for well-behaved formulas) series approximation techniques that reduce the quantity of low precision iterations needed still further, provide a vast speedup over classical algorithms that use high precision for every pixel. Rescaling can provide an additional constant factor speedup over using full range floatexp number types for most (not "deep needle") locations. Chained biseries approximation ("NanoMB2") and series approximation for abs variations and hybrids are still topics of research.

It remains open how to choose the \(G\) for Pauldelbrot's glitch detection criterion, and how to robustly compute series approximation skipping: there is still no complete mathematical proof of correctness with rigourous error bounds, although the images do most often look plausible and different implementations do tend to agree.

]]>Almost a decade ago I wrote about optimizing zoom animations by reusing the center portion of key frame images. In that post I used a fixed scale factor of 2, because I didn't think about generalizing it. This post here is to rectify that oversight.

So, fix the output video image size to \(W \times H\) with a pixel density (for anti-aliasing) of \(\rho^2\). For example, with \(5 \times 5\) supersampling (25 samples per output pixel), \(\rho = 5\).

Now we want to calculate the size of the key frame images \(H_K \times W_K\) and the scale factor \(R\). Suppose we have calculated an inner / deeper key frame in the zoom animation, then the next outer keyframe only needs the outer border calculated, because we can reuse the center. This means only \( H_K W_K \left(1 - \frac{1}{R^2}\right) \) need to be calculated per keyframe. The number of keyframes decreases as \(R\) increases, it turns out to be proportional to \(\frac{1}{\log R}\) for a fixed total animation zoom depth.

Some simple algebra, shows that \(H_K = R \rho H\) and \(W_K = R \rho W\). Putting this all together means we want to minimize the total number of pixels that need calculating, which is proportional to

\[ \frac{R^2 - 1}{\log R} \]

which decreases to a limit of \(2\) as \(R\) decreases to \(1\). But \(R = 1\) is not possible, as this wouldn't zoom at all; as-close-to-1-as-possible means a ring of pixels 1 pixel thick, at which point you are essentially computing an exponential map.

So if an exponential map is the most efficient way to proceed, how much worse is the key frame interpolation approach? Define the efficiency by \(2 / \frac{R^2 - 1}{\log R}\), then the traditional \(R = 2\) has an efficiency of only 46%, \(R = \frac{4}{3}\) 74%, \(R = \frac{8}{7}\) 87%.

These results are pushing me towards adding an exponential map rendering mode to all of my fractal zoom software, because the efficiency savings are significant. Expect it in (at least the command line mode of) the next release of KF which is most likely to be in early September, and if time allows I'll try to make a cross-platform zoom video assembler that can make use of the exponential map output files.

]]>