Zoom videos are a genre of 2D fractal animation. The rendering of the final video can be accelerated by computing exponentially spaced rings around the zoom center, before reprojecting to a sequence of flat images.

Some fractal software supports rendering EXR keyframes in exponential
map form, which *zoomasm* can assemble into a zoom video.
*zoomasm* works from EXR, including raw iteration data, and
colouring algorithms can be written in OpenGL shader source code fragments.

*zoomasm* **1.1**
brings two new features, and plenty of bug fixes. Full change log:

- New:
`F9`

key toggles user interface transparency. - New:
`.ppm`

output file saves an image sequence without FFmpeg. - Fix: no more garbage audio output for an instant on device open.
- Fix: fix build with recent upstream
`imgui`

changes. - Fix: when loading a session, don’t reset the last timeline waypoint zoom depth to the default for the input keyframe count.
- Fix: reduce timeline Z slider range to avoid glitches at extremes.
- Fix: add an example preset that was missing from the distribution.
- Doc: add demo/tutorial videos to documentation.
- Doc: round recommended keyframe image sizes to multiples of 16 so that
`exrsubsample`

can be used without worrying about edge effects. - Win: include
`presets/`

folder in Windows distribution. - Win: include PDF manual in Windows distribution.
- Win: update
`imgui`

,`miniaudio`

,`tomlplusplus`

to latest versions.`miniaudio`

needed a small patch to build as C++`void*`

casts must be explicit (unlike as in C); this will hopefully be fixed upstream soon.

Get it at mathr.co.uk/zoomasm.

I had a report that *zoomasm*/*FFmpeg* deadlocks when
recording on Microsoft Windows (it works fine in Wine on Linux, and the
native Linux version has no issues). Make sure you save your session
before recording on Windows, and if you have problems you can try the
PPM export as a last resort (it doesn't need *FFmpeg*, but the
output file size is huge). You can then encode with *FFmpeg* from the
command line using `-i foo-%08d.ppm`

(assuming you saved the
output as `foo.ppm`

). If anyone can fix the bidirectional
process communication code to work on Microsoft Windows reliably, git
patches would be very welcome via email - I don't have any Microsoft
Windows here to test with.

I actually released kf-2.15 a couple of weeks ago, but I didn't get around to posting here about it. The highlights include OpenCL support (if you have a good GPU, this can make rendering faster, especially for hybrid formulas), hybrid formula designer (lots of options to make custom fractals), and exponential map transformation for optimizing zoom animations.

kf-2.15.1 (2020-10-28)

- OpenCL support for perturbation iterations (requires double precision support on device: CPUs should work, some GPUs might not)
- hybrid formula editor (design your own fractal formula)
- exponential map coordinate transformation (useful for export to zoomasm for efficient video assembly)
- rotation and skew transformations are rewritten to be more flexible (but old skewed/rotated KFR locations will not load correctly)
- kf-tile.exe tool supports the new rotation and skew transformations
- the bitrotten skew animation feature is removed
- a few speed changes in built in formulas (one example, RedShiftRider 4 with derivatives is almost 2x faster due to using complex analytic derivatives instead of 2x2 Jacobian matrix derivatives)
- flip imaginary part of Buffalo power 2 (to match other powers; derivative was flipped already)
- slope implementation rewritten (appearance is different but it is now independent of zoom level and iteration count)
- smooth (log) iteration count is offset so dwell bands match up with the phase channel

swiftly followed by

kf-2.15.1.1 (2020-10-28)

- fix OpenCL support for NVIDIA GPUs (reported by bezo97)
- fix crash in aligned memory (de)allocation (reported by gerrit)
- documentation improvements (thanks to FractalAlex)

and today by

kf-2.15.1.2 (2020-11-08)

- refactor OpenCL error handling to display errors in the GUI without exiting
- OpenCL hybrids: fix breaking typo in neg x (reported by Microfractal)

I recorded a screencast video to show how to make fractal zoom videos with kf-2.15 and zoomasm-1.0, and there are a couple of zoom videos I made with them too:

- Making fractal zoom videos with kf-2.15 and zoomasm-1.0
- https://archive.org/details/making-fractal-zoom-videos-with-kf-2.15-and-zoomasm-1.0
- https://www.youtube.com/watch?v=72IIn7C3UeI
- Charred Bard
- https://archive.org/details/charred-bard
- https://www.youtube.com/watch?v=NMKBBk-yf_4
- Special Branch
- https://archive.org/details/special-branch
- https://www.youtube.com/watch?v=uQDV87vVIxk

Yesterday I released zoomasm version 1.0 "felicitats".

It's a tool for assembling fractal zoom animation video from exponential strip keyframes with RGB and/or raw iteration data in EXR format - the colouring is controlled by an OpenGL GLSL fragment shader snippet with a few presets included (or write your own).

Source code is available for Linux (and maybe MacOS, I haven't tried), with a compiled EXE for Windows, which need 7-zip to extract (Debian package p7zip). To render output, you need an FFmpeg program binary. More complete instructions are on the website.

However, the only software I know of that can export keyframes for input to zoomasm, is the 2.15 branch of Kalle's Fraktaler 2 + which is not yet released... so getting that done is my current focus.

If you want to add support to your own fractal zoom software, get in touch and I can explain the expected conventions for the EXR keyframes (the coordinate transform, and semantics of the image channels). I need to write it up eventually for the zoomasm manual, but it won't be a high priority unless you ask.

]]>Almost a decade ago I wrote about optimizing zoom animations by reusing the center portion of key frame images. In that post I used a fixed scale factor of 2, because I didn't think about generalizing it. This post here is to rectify that oversight.

So, fix the output video image size to \(W \times H\) with a pixel density (for anti-aliasing) of \(\rho^2\). For example, with \(5 \times 5\) supersampling (25 samples per output pixel), \(\rho = 5\).

Now we want to calculate the size of the key frame images \(H_K \times W_K\) and the scale factor \(R\). Suppose we have calculated an inner / deeper key frame in the zoom animation, then the next outer keyframe only needs the outer border calculated, because we can reuse the center. This means only \( H_K W_K \left(1 - \frac{1}{R^2}\right) \) need to be calculated per keyframe. The number of keyframes decreases as \(R\) increases, it turns out to be proportional to \(\frac{1}{\log R}\) for a fixed total animation zoom depth.

Some simple algebra, shows that \(H_K = R \rho H\) and \(W_K = R \rho W\). Putting this all together means we want to minimize the total number of pixels that need calculating, which is proportional to

\[ \frac{R^2 - 1}{\log R} \]

which decreases to a limit of \(2\) as \(R\) decreases to \(1\). But \(R = 1\) is not possible, as this wouldn't zoom at all; as-close-to-1-as-possible means a ring of pixels 1 pixel thick, at which point you are essentially computing an exponential map.

So if an exponential map is the most efficient way to proceed, how much worse is the key frame interpolation approach? Define the efficiency by \(2 / \frac{R^2 - 1}{\log R}\), then the traditional \(R = 2\) has an efficiency of only 46%, \(R = \frac{4}{3}\) 74%, \(R = \frac{8}{7}\) 87%.

These results are pushing me towards adding an exponential map rendering mode to all of my fractal zoom software, because the efficiency savings are significant. Expect it in (at least the command line mode of) the next release of KF which is most likely to be in early September, and if time allows I'll try to make a cross-platform zoom video assembler that can make use of the exponential map output files.

]]>Already happening now!

Fringe Arts Bath (FaB) is a test-bed for early career-curators, and those who prefer to operate outside of the gallery-based arts scene. FaB aims to raise the profile of contemporary visual arts in Bath and beyond, providing opportunities for early-career and emerging artists.

...

Co.Lab Sound is an experimental testbed for artists to develop new work and collaborate live. Informed by a long line of experimental music, sound art and performance, these events embrace the processes of improvisation and experimentation to showcase artists expanding the field of sound art and what it means to experience art live.

...

I have 3 works in the online exhibition (Dynamo, disco/designer and Puzzle), and my live slot is scheduled for this Wednesday evening (assuming I can resolve the issues with Zoom...).

]]>Today I released a new version of KF. Kalles Fraktaler 2 + is my fork of Kalle's Fraktaler 2, with many enhancements. KF is a fast deep zoom fractal renderer for Mandelbrot, Burning Ship, and many other formulas. It uses perturbation techniques and series approximation to allow fast low precision deltas per pixel to be used relative to a slow high precision reference orbit per image. The changelog entry is large; over 50 commits since the previous release, affecting 47 files and over 3000 lines. You can download 64bit Windows binaries from the homepage: mathr.co.uk/kf/kf.html (they work in Wine on Linux, and are cross-compiled with MINGW64). I won't replicate the full list of changes, but here are some highlights.

The most visible change is that the old Iterations dialog is gone. In its place are three new dialogs: Formula, Bailout and Information. Some of the controls have moved to the Advanced menu as they shouldn't usually need to be adjusted. The Formula dialog has the fractal type and power and a few other things, the Bailout dialog has things that affect the exit from the inner loop (maximum iteration count, escape radius, and so on).

The main new feature in these dialogs is the ability to control the bailout test with 4 variables: custom escape radius, real and imaginary weights (which can now be any number, including fractions or negative, instead of previously limited to 0 or 1), and norm power. Together these allow for the shape of the iteration bands to be changed in many different ways. The glitch test is now always done with Euclidean norm (unweighted), and reference calculations are simpler because they don't need to calculate their own pixel any more (I prevent the infinite loop of reference being detected as glitch in a different way now).

Colouring-wise, there is the return of the Texture option which had been broken for many years, there is a fourth-root transfer function, and a new phase channel is computed (saved in EXR as T channel in [0..1)). So far it is exposed to colouring only as a "phase strength" setting. It works best with Linear bailout smooth method (Log gives seams as it is independent of escape radius). There is also a new Flat toggle, if you can't stand palette interpolation.

More code is parallelized, as well as significant speedups for Mandelbrot power 3, and faster Newton-zooming for analytic formulas (arbitrary power Mandelbrot and the Redshifter formulas). Upgrading to GMP 6.2 should improve performance especially on AMD Ryzen CPUs.

Lots of bugfixes, including directional DE for NanoMB1+2 and an off-by-one in NanoMB1 that was making colours different vs normal rendering. EXR load and save uses much less memory, and there are new options to select which channels to export for smaller files if you don't need the data.

]]>*Vertical slices (constant x) have the same number of samples per pixel.
number of samples doubles each slice, from 1 at the left to 16384 at the right.
Horizontal slices (constant y) have density estimation on or off.
The top slice has density estimation on, the bottom slice has density estimation off.
Below the bottom slice are numbers indicating samples per pixel
Quality is low at the left and high at the right.
With density estimation, the left image is blurry but about the same brightness as the right hand side.
Without density estimation, the left image is sparse and noisy, and much darker due to the black background.
Towards the right hand side both methods converge to the same image.*

Fractal flames are generated by applying the chaos game technique to a weighted directed graph iterated function system with non-linear functions. Essentially what this means is, you start from a random point, and repeatedly pick a random transformation from the system and apply it to the point - plotting all the points into an histogram accumulation buffer, which ends up as a high dynamic range image that is compressed for display using a saturation-preserving logarithm technique.

The main problem with the chaos game is that you need to plot a lot of points, otherwise the image is noisy. So to get acceptable images with fewer numbers of points, an adaptive blur algorithm is used to smooth out the sparse areas, keeping the dense areas sharp:

Adaptive filtering for progressive monte carlo image rendering

by Frank Suykens , Yves D. Willems

The 8-th International Conference in Central Europe on Computer Graphics, Visualization and Interactive Digital Media 2000 (WSCG' 2000), February 2000. Held in Plzen, Czech Republic

Abstract

Image filtering is often applied as a post-process to Monte Carlo generated pictures, in order to reduce noise. In this paper we present an algorithm based on density estimation techniques that applies an energy preserving adaptive kernel filter to individual samples during image rendering. The used kernel widths diminish as the number of samples goes up, ensuring a reasonable noise versus bias trade-off at any time. This results in a progressive algorithm, that still converges asymptotically to a correct solution. Results show that general noise as well as spike noise can effectively be reduced. Many interesting extensions are possible, making this a very promising technique for Monte Carlo image synthesis.

The problem with this density estimation algorithm as presented in the paper is that it is very expensive: asymptotically it's about \(O(d^4)\) where \(d\) is the side length of the image in pixels. It's totally infeasible for all but the tiniest of images. But I did some thinking and figured out how to make it \(O(d^2)\), which at a constant cost per pixel is the best it is possible to do.

So, starting with the high dynamic range accumulation buffer, I
compute a histogram of the alpha channel with log-spaced buckets. As
I am using ulong (uint64_t) for each channel, a cheap way to do this
is the `clz()`

function, which counts leading zeros. The
result will be between 0 and 64 inclusive. Now we have a count of how
many pixels are in each of the 65 buckets.

For each non-empty bucket (that doesn't correspond to 0), threshold the HDR accumulation buffer to keep only the pixels that fall within the bucket, setting all the rest to 0. Then blur this thresholded image with radius inversely proportional to \(\sqrt{\text{bucket value}}\) (so low density is blurred a lot, highest density is not blurred at all). And then sum all the blurred copies into a new high dynamic range image, before doing the log scaling, auto-white-balance, gamma correction, etc as normal.

The key to making it fast is making the blur fast. The first observation is that Gaussian blur is separable: you can do a 2D blur by blurring first horizontally and then vertically (or vice versa), independently. The second observation is that repeated box filters tend to Gaussian via the central limit theorem. And the third important observation is that wide box filters can be computed just as cheaply as narrow box filters by using a recursive formulation:

for (uint x = 0; x < width; ++x) { acc -= src[x - kernel_width]; acc += src[x]; dst[x] = acc / kernel_width; }

There is some subtlety around initializing the boundary conditions (whether to clamp or wrap or do something else), and to avoid shifting the image I alternate between left-to-right and right-to-left passes. A total of 8 passes (4 in each dimension, of which 2 in each direction) does the trick. There are at most 64 copies to blur, but in practice the number of non-empty buckets is much less (around 12-16 in my tests).

As a bonus, here's the page from my notebook where I first developed this fast technique:

Thanks to lycium for pointing out to me the second and third observations about box filters.

**EDIT:** I benchmarked it a bit more. At large image
sizes (~8k^2) with extra band density (needed to avoid posterization
artifacts from filter size quantization), it's about 40x slower (2mins)
than Fractorium (5secs). If the sample/pixel count not ridiculously
small they are indistinguishable (maybe mine is a very tiny bit smoother
in less dense areas); for ridiculously small sample counts mine is very
blurry and Fractorium's is very noisy, but 2mins iteration and 5secs
filter definitely beats 5secs iteration and 2mins filter in terms of
visual quality any way you do it. So I guess it's "nice idea in theory,
but in practice...".

Deep zoom is a subgenre of fractal video art. Shapes in the Mandelbrot set (and other fractals) stack up in nested period-doubling crises as you pass by mini-sets and the embedded Julia sets between them.

Mandelbrot set deep zooms are computationally expensive, requiring lots of high precision numerical calculations. Computational cost has a corresponding ecological cost. The climate is the real crisis.

This is not a Mandelbrot set deep zoom, but it might resemble one at first glance with its nested period-doubling crises. But instead it is designed to have a similar appearance with much cheaper equations.

Rendering of v1 (duration one hour) at 1920x1080p60@50Mbps took a little over four hours wall-clock time. Two-pass video encoding to 960x540p60@4Mbps took a little over half an hour wall-clock time. I estimate around 1kWh of electricity was used.

Summary of the key parts of the algorithm:

for (l = L; l >= 0; --l) t = -log2(1 - t) n += floor(t) t -= floor(t) t = (1 - exp2(-t)) * 2 // store n and t for level l

Drone soundtrack based on the same code used for the graphics.

]]>In an appendix to the paper from which I implemented the slow mating algorithm in my previous post, there is a brief description of another algorithm:

The Thurston Algorithm for quadratic matings

Initialization A.2 (Spider algorithm with a path)Suppose \(\theta = \theta_1 \in \mathbb{Q} \backslash \mathbb{Z}\) has prepreperiod \(k\) and period \(p\). Define \((x_1(t), \ldots, x_{k+p}(t))\) for \(0 \le t \le 1\) as

\[ x_1(t) = t e^{i 2 \pi \theta_1} \\ x_p(t) = (1 - t) e^{i 2 \pi \theta_p}, \text{ if } k = 0 \\ x_j(t) = e^{i 2 \pi \theta_j}, \text{ otherwise.} \]Pull this path back continuously with \(x_i(t + 1) = \pm \sqrt{x_{i+1}(t)-x_1(t)}\). Then it converges to the marked points of \(f_c\) with appropriate collisions.

In short, given a rational \(\theta\) measured in turns, this provides a way to calculate \(c\) in the Mandelbrot set that has corresponding dynamics. Here \(\theta_j = 2^{j - 1} \theta \mod 1\), and the desired \(c = x_1(\infty)\).

This week I implemented it in my mandelbrot-numerics library, in the hope that it might be faster than my previous method of tracing external rays. Alas, it wasn't to be: both algorithms are \(O(n^2)\) when ignoring the way cost varies with numerical precision, and the spider path algorithm has higher constant factors and requires \(O(n)\) space vs ray tracing \(O(1)\) space. This meant spider path was about 6x slower than ray tracing when using a single-threaded implementation, in one test at period 469, and I imagine it would be slower still at higher periods and precisions.

This isn't entirely surprising, spider path does \(s n\) complex square roots to extend the paths by \(t \to t + 1\), while ray trace does \(s t\) arithmetical operations to extend the ray from depth \(t \to t + 1\). The \(O(n^2)\) comes from \(t\) empirically needing to be about \(2 n\) to be close enough to switch to the faster Newton's root finding method.

Moreover spider path needs very high precision all the way through, the initial points on the unit circle need at least \(n\) bits (I used about \(2 n\) to be sure) to resolve the small differences in external angles, even though the final root can usually be distinguished from other roots of the same period using much less precision. In fact I measured spider path time to be around \(O(n^{2.9})\), presumably because of the precision. Ray tracing was very close to \(O(n^2)\).

Ray tracing has a natural stopping condition: when the ray enters the atom domain with period \(p\), Newton's method is very likely to converge to the nucleus at its center. I imagine something similar will apply to preperiodic Misiurewicz domains, but I have not checked yet. I tried it with spider path but in one instance I got a false positive and ended up at a different minibrot to the one I wanted.

The only possible advantages that remain for the spider path algorithm is that it can be parallelized more effectively than ray tracing, and that the numbers are all in the range \([-2,2]\) which means fixed point could be used. Perhaps a GPU implementation of spider path would be competitive with ray tracing on an elapsed wall-clock time metric, though it would probably still lose on power consumption.

I plotted a couple of graphs of the spider paths, the path points end up log-spiraling around their final resting places. I think this means it converges linearly. Ray tracing is also linear when you are far from the landing point (before the period-doubling cascade starts in earnest). Newton's method converges quadratically, which means the number of accurate digits doubles each time, but you need to start from somewhere accurate enough.

]]>I recently came across Arnaud Chéritat's polynomial mating movies and just had to try to recreate them.

If \(p\) and \(q\) are in the Mandelbrot set, they have connected Julia sets for the quadratic polynomial functions \(z^2+p\) and \(z^2+q\). If they are not in conjugate limbs (a limb is everything beyond the main period 1 cardioid attached at the root of a given immediate child bulb, conjugation here is reflection in the real axis, the 1/2 limb is self-conjugate) then the Julia sets can be mated: glue the Julia sets together respecting external angles so that the result fills the complex plane (which is conveniently represented as the Riemann sphere). It turns out that this mating is related to the Julia set of a rational function of the form \(\frac{z^2+a}{z^2+b}\).

One algorithm to compute \(a\) and \(b\) is called "slow mating". Wolf Jung has a pre-print which explains how to do it in chapter 5: The Thurston Algorithm for quadratic matings.

My first attempts just used Wolf Jung's code and later my own code, to compute the rational function and visualize it in Fragmentarium (FragM fork). This only worked for displaying the final limit set, while Chéritat's videos had intermediate forms. I found a paper which had this to say about it:

On The Notions of Mating

Carsten Lunde Petersen & Daniel Meyer

5.4. Cheritat moviesIt is easy to see that \(R_\lambda\) converges uniformly to the monomial \(z^d\) as \(\lambda \to \infty\). Cheritat has used this to visualize the path of Milnor intermediate matings \(R_\lambda\), \(\lambda \in ]1,\infty[\) of quadratic polynomials through films. Cheritat starts from \(\lambda\) very large so that \(K_w^\lambda\) and \(K_b^\lambda\) are essentially just two down scaled copies of \(K_w\) and \(K_b\), the first near \(0\), the second near \(\infty\). From the chosen normalization and the position of the critical values in \(K_w^\lambda \cup K_b^\lambda\) he computes \(R_\sqrt{\lambda}\). From this \(K_w^\sqrt{\lambda} \cup K_b^\sqrt{\lambda}\) can be computed by pull back of \(K_w^\lambda \cup K_b^\lambda\) under \(R_\sqrt{\lambda}\). Essentially applying this procedure iteratively one obtains a sequence of rational maps \(R_{\lambda_n}\) and sets \(K_w^{\lambda_n} \cup K_b^{\lambda_n}\), where \(\lambda_n \to 1+\) and \(\lambda_n^2 = \lambda_{n-1}\). For more details see the paper by Cheritat in this volume.

What seems to be the paper referred to contains this comment:

Tan Lei and Shishikura’s example of non-mateable degree 3 polynomials without a Levy cycle

Arnaud Chéritat

Figure 4The Riemann surface \(S_R\) conformally mapped to the Euclidean sphere, painted with the drawings of Figure 2. The method for producing such a picture is interesting and will be explained in a forthcoming article; it does not work by computing the conformal map, but instead by pulling-back Julia sets by a series of rational maps. It has connections with Thurston's algorithm.

I could not find that "forthcoming" article despite the volume having been published in 2012 following the 2011 workshop, so I emailed Arnaud Chéritat and got a reply to the effect that it had been cancelled by the author.

My first attempts at coding the slow mating algorithm worked by pulling back the critical orbits as described in Wolf Jung's preprint. The curves look something like this:

A little magic formula for finding the parameters \((a, b)\) for the function \(\frac{z^2+a}{z^2+b}\):

\[a = \frac{C(D-1)}{D^3(1-C)}\] \[b = \frac{D-1}{D^2(1-C)}\]

where \((C,D)\) are the pulled back \(1\)th iterates. This was reverse-engineered from Wolf Jung's code, which worked with separate real and imaginary components, with no comments and heavy reuse of the same variable names. I'm not sure if it is correct but it seems to give useable results when plugged into FragM for visualization:

I struggled to implement the intermediate images at first: I tried pulling back from the coordinates of a filled in Julia set but that needed huge amounts of memory and the resolution was very poor:

Eventually I figured out that I could invert each pullback function into something of the form \(\frac{az^2+b}{cz^2+d}\) and push forward from pixel coordinates to colour according to which hemisphere it reached:

I struggled further, until I found the two bugs that were almost cancelling each other out. The coordinates in each respective hemisphere can be rescaled, and thereafter regular iterations of \(z^2+c\) until escape or maximum iterations could be used to colour the filled in Julia sets expanding within each hemisphere:

After that it was quite simple to bolt on dual-complex-numbers for automatic differentiation, to compute the derivatives for distance estimation to make the filaments of some Julia sets visible:

I also added an adaptive super-sampling scheme: if the standard deviation of the current per-pixel sample population divided by the number of samples is less than a threshold, I assume that the next sample will make negligible changes to the appearance, and so I stop. This speeds up interior regions (which need to be computed to the maximum iteration count) because the standard deviation will be 0 and it will stop after only the minimum sample count. I also have a maximum sample count to avoid taking excessive amounts of time. I do blending of samples in linear colour space, with sRGB conversion only for the final output.

Get the code:

git clone https://code.mathr.co.uk/mating.git

Currently about 600 lines of C99 with GNU getopt for argument parsing, but I may port the image generation part to OpenCL because my GPU is about 8x faster than my CPU for some double-precision numerical algorithms, which will help when rendering animations.

]]>
Melinda Green's webpage
The 4D Mandel/Juli/Buddhabrot Hologram
has a nice video at the bottom, titled
*ZrZi to ZrCr - only points Inside the m-set*.
I recalled my 2013 blog post about the
Ultimate Anti-Buddhabrot
where I used Newton's method to find the limit Z cycle of each C value
inside the Mandelbrot set and plotted them. The (anti-)Buddhagram is
just like the (anti-)Buddhabrot, but the Z points are plotted in 4D space
augmented with their C values. Then the 4D object can be rotated in
various ways before projection down to 2D screen, possibly via a 3D step.

My first attempt was based on my ultimate anti-Buddhabrot code, computing all the points in a fine grid over the C plane. I collected all the points in a large array, then transformed (4D rotation, perspective projection to 3D, perspective projection to 2D) them to 2D and accumulated with additive blending to give an image. This worked well for videos at moderate image resolutions, achieving around 6 frames per second (after porting the point cloud rasterization to OpenGL) at the highest grid density I could fit into RAM, but at larger sizes the grid of dots became visible in areas where the z→z²+c transformation magnified it.

Then I had a flash of inspiration while trying to find the surface normals for lighting. Looking at the formulas on Wikipedia I realized that each "pringle" is an implicit surface \(F_p(c, z) = 0\), with \(F_p(c, z) = f_c^p(z) - z\) and the usual \(f_c(z) = z^2 + c\). \(p\) is the period of the hyperbolic component. Rendering implicit surfaces can be done via sphere-marching through signed distance fields, so I tried to construct a distance estimate. I tried using \(|F_p(c, z)| - t\) as a first try, where \(t\) is a small thickness to make the shapes solid, but that extended beyond the edges of each pringle and looked very wrong. The interior of the pringle has \(\left|\frac{\partial F_p}{\partial z}\right| \le 1\) so I added that to the distance estimate (using max() for intersection) to give:

float DE(vec2 c, vec2 z0) { vec2 z = z0; vec2 dz = vec2(1.0, 0.0); float de = 1.0 / 0.0; for (int p = 1; p <= MaxPeriod; ++p) { dz = 2.0 * cMul(dz, z); z = cSqr(z) + c; de = min(de, max(length(z - z0), length(dz) - 1.0)); } return 0.25 * de - Thickness; }

Note that this has complexity linear in MaxPeriod, my first attempt was quadratic which was way too slow for comfort when MaxPeriod got bigger than about 10. The 0.25 at the end is empirically chosen to avoid graphical glitches.

I have not yet implemented a 4D raytracer in FragM, though it's on my todo list. It's quite straightforward, most of the maths is the same as the 3D case when expressed in vectors, but the cross-product has 3 inputs instead of 2. Check S. R. Hollasch's 1991 masters thesis Four-Space Visualization of 4D Objects for details. Instead I rendered 3D slices (with 4th dimension constant) with 3D lighting, animating the slice coordinate over time, and eventually accumulating all the 3D slices into one image to create a holographic feel similar to Melinda Green's original concept.

Source code is in my fractal-bits repository:

]]>git clone https://code.mathr.co.uk/fractal-bits.git

Back in 2017 I forked the Windows fractal explorer software Kalles Fraktaler 2. I've been working on it steadily since, adding plenty of new features (and bugs). My fork's website is here with binary downloads for Windows (including Wine on Linux).

I had been maintaining 3 branches of various ages, purely because the 2.12 branch was faster than the 2.13 and 2.14 branches and I couldn't figure out why, until recently. Hence this blog post. It turns out to be quite obscure. This is the patch that fixed it:

diff --git a/formula/formula.xsl b/formula/formula.xsl index b47f763..d07c002 100644 (file) --- a/formula/formula.xsl +++ b/formula/formula.xsl @@ -370,6 +370,7 @@ bool FORMULA(perturbation,<xsl:value-of select="../@type" />,<xsl:value-of selec (void) Ai; // -Wunused-variable (void) A; // -Wunused-variable (void) c; // -Wunused-variable + bool no_g = g_real == 1.0 && g_imag == 1.0; int antal = antal0; double test1 = test10; double test2 = test20; @@ -385,7 +386,14 @@ bool FORMULA(perturbation,<xsl:value-of select="../@type" />,<xsl:value-of selec Xxr = Xr + xr; Xxi = Xi + xi; test2 = test1; - test1 = double(g_real * Xxr * Xxr + g_imag * Xxi * Xxi); + if (no_g) + { + test1 = double(Xxr * Xxr + Xxi * Xxi); + } + else + { + test1 = double(g_real * Xxr * Xxr + g_imag * Xxi * Xxi); + } if (test1 < Xz) { bGlitch = true;

In short, it adds a branch inside the inner loop, to avoid two
multiplications by 1.0 (which would leave the value unchanged). Normally
branches inside inner loops are harmful for optimization, but because the
condition is static and unchanging over the iterations, the compiler can
actually reverse the order of the loop and branch, generating code for
two loops, one of which has the two multiplications completely gone. In
real-world usage, the values *are* almost always both 1.0 - they
determine which parts of the value to use for the escape test (and glitch
test, but this is probably a bug).

The performance boost from this patch was about **20%**
(CPU time), which is huge in the grand scheme of things, so I was quite
happy, because it brought performance of kf-2.14.7.1 back to the level
of the 2.12 branch, so I don't have to support it any more (by backporting
bugfixes).

But when you get a taste for speed, you want more. So far KF has not
taken advantage of CPUs to their fullest. Until now, KF has been
resolutely scalar, computing one pixel at a time in each thread. Last
night I started work on upgrading KF to use vectorization
(aka SIMD).
Now when I
compile KF for my CPU (which is not portable, so I won't ship binaries with
these flags enabled), I get an **80%** (CPU time) speed boost,
which is absolutely ginormous, and when compiling for more conservative CPU
settings (Intel Haswell / AMD Excavator) the speed boost is **61%**
which is still a very nice thing to have. With no CPU specific flags
(baseline x86_64) the speed boost is **55%** which is great
too.

The vectorization work is not finished yet, so far it is only added
for "type R" formulae in `double`

precision (which allows zoom
depths to 1e300 or so). Unfortunately `long double`

(used after
`double`

until 1e4900 or so) has no SIMD support at the hardware
level, but I will try to add it for the `floatexp`

type used
for even deeper zooms (who knows, maybe `floatexp`

+SIMD will
be competitive with `long double`

, but I doubt it...). I will
also add support for "type C" formulae before the release, which is a little
complicated by the hoops you have to jump through to get gcc to broadcast
a scalar to a vector in initialization.

Here's a table of differently optimized KF versions:

version | vector size | wall-clock time | CPU time | speed boost | |
---|---|---|---|---|---|

2.14.7.1 | 1 | 3m47.959s | 23m30.676s | 1.00 | 1.00 |

git/64 | 1 | 3m46.703s | 23m26.290s | ||

2 | 3m22.280s | 15m11.022s | 1.13 | 1.55 | |

4 | 3m55.158s | 25m26.638s | |||

git/64+ | 1 | 3m46.977s | 23m26.065s | ||

2 | 3m13.442s | 14m34.363s | 1.18 | 1.61 | |

4 | 3m26.012s | 14m54.546s | |||

git/native | 1 | 3m42.554s | 21m51.381s | ||

2 | 3m10.440s | 13m26.100s | |||

4 | 3m08.784s | 13m06.386s | 1.21 | 1.80 | |

8 | 3m50.812s | 24m01.230s |

All these benchmarks are with Dinkydau's "Evolution of Trees" location, quadratic Mandelbrot set at zoom depth 5e227, with maximum iteration count 1200000. Image size was 3840x2160. My CPU is an AMD Ryzen 7 2700X Eight-Core Processor (with 16 threads that appear as distinct CPUs to Linux). Wall-clock performance doesn't scale up as much as CPU because some parts (computing reference orbits) are sequential; only the perturbed per-pixel orbits are embarassingly parallel.

]]>