Sonics Immersive Media Lab (SIML), Room G05, Hatcham Building, Goldsmiths, 25 St James's, SE14 6AD, London, UK (the old church)

Feb 15th 2019, doors 6:15pm, start 6:30pm, end 10:30pm

mathr / Deerful / Luuma / Cassiel / Lil Data / BITPRINT / rumblesan / lnfiniteMonkeys

Part of an international multi-day streaming celebration of 15 years of live-coding.

]]>A recent post by matty686 on fractalforums about Photoshop IFS fractals got me interested. I didn't manage to do it in GIMP 2.8.18, but succeeded with Inkscape 0.92.4. The process needs two PNG images, I didn't succeed with only one. Once you have set up the transformed linked images that reference "bitmap.png", repeatedly export the page to "bitmap2.png" (pressing return in the box with the filename does this quickly) and in a terminal "mv bitmap2.png bitmap.png" when the export has finished. Here are some explanatory screenshots:

This could probably be scripted within Inkscape so you don't have to do so much manual repetitive work at the end: this is just a proof of concept.

]]>In the #supercollider channel on talk.lurk.org, there was recently
discussion about a "DJ filter" that transitions smoothly between low-pass
and high-pass filters. This made me curious to see if I could make one.
I found Miller Puckette's book section on
Butterworth filters,
but figure 8.18 is not quite there yet for my purposes: the normalization
is off for "shelf 2" (it would be better if the Nyquist gain was 1, instead
of having the DC gain as 1). The figure has 3 poles and 3 zeroes, but for
simplicity of implementing with 2 cascaded biquad sections I went with a
**4-pole** filter design.

After fixing the order, the next variable is the **center frequency**
\(\beta = 2 \pi f_c / SR\), which determines \(r = \tan(\beta/2)\).
Using the formula from the above link gives the pole locations:

\[ \frac{(1 - r^2) \pm \mathbf{i} (2 r \sin(\alpha)) }{ 1 + r^2 + 2 r \cos(\alpha))} \]

For a 4-pole filter, \(\alpha \in \{ \frac{\pi}{8}, \frac{3 \pi}{8} \} \).

The **hi/lo** control \(o\) is conveniently expressed in octaves relative
to the center frequency. It controls the stop band gain, which levels
off after \(o\)-many octaves (so these are really shelving filters).
The \(o\) control fixes the location of the zeroes of the filter, the
formula is the same as above but with \(r\) modified using
\(r_Z = \frac{r_P}{2^o}\).

The filter is normalized so that the pass-band gain (at DC for low-pass and Nyquist for high-pass) is unity. Then the gain in the stop band is \(-24 o\) dB, the transition slope is fixed by the order, and the center frequency gain is about \(-3\)dB when \(o\) is away from \(0\). This can be done by computing the gain of the unnormalized filter at \(\pm 1\) (sign chosen as appropriate). Computing the gain of a filter specified by poles and zeroes is simple: multiply by the distance to each zero and divide by the distance to each pole (phase response is left as an exercise).

The poles and zeroes come in conjugate pairs, which are easy to transform to biquad coefficients (see my previous post about biquad conversions). I put the gain normalization in the first biquad of the chain, not sure if this is optimal. The filters should be stable as long as the center frequency is not DC or Nyquist, as the poles are inside the unit circle. But modulating the filter may cause blowups - to be investigated.

You can browse my implementation.

]]>Tomorrow, an afternoon of live coding at New River Studios:

Right on the heels of the International Conference on Live Coding and an Algorave at Access Space in Sheffield, livecodenyc in exile presents an afternoon of live coding at New River Studios.

Come join us to jam and hang out.

Featuring performances from:

- Codie (Sarah Groff Hennigh-Palermo and Melody Loveless, nyc) - @hi_codie
- Ulysses Popple (nyc) - @ulysses_le_sees
- Deerful - @deer_ful
- BITLIP (Evan Raskob, London) - pixelist.info
- Visor (Jack Purvis, New Zealand) - jackvpurvis.com
- mathr - mathr.co.uk
See you there!

New River Studios is at Ground Floor Unit E, 199 Eade Road, N4 1DN London.

]]>I modified Pure-data and libpd just enough to compile it with Emscripten. See microsite here:

mathr.co.uk/empd

Not user friendly yet, maybe someone else will contribute that stuff...

]]>meshwalk-3.0 is a 360 animation of animated Voronoi cells on the surface of a sphere. Rapid acceleration is highlighted visually and sonified spatially. The movement of the dynamic mesh in this version is partly based on the idea of potential wells for crystal formation, and is realized efficiently by updating a collection of nearest neighbours to each node using the neighbours of neighbours (and hoping that it doesn't move so fast that the algorithm would break).

At each time step, after updating the positions of each node (which depends on attraction and repulsion between neighbouring cells), the neighbour list of each cell is updating by choosing the 8 nearest cells among the neighbours and neighbours of neighbours. As the nodes are moving, a neighbour of neighbour might now be nearer than the previous neighbours, so the mesh is dynamically changing. The number 8 can be changed at compile time, too high and it slows down, too low and it breaks (the mesh splits apart and can overlap itself without rejoining).

This mesh algorithm is O(N B^2), where N is the number of nodes and B is the number of neighbours to keep track of. As B is typically a small constant, this reduces to O(N). A naive algorithm is O(N^2) considering all nodes for each node, which I do do in the one-time initialization of the neighbour data. A spatial data structure could probably reduce this cost to O(N log N), but for a few hundred nodes on a sphere it isn't too slow. The Vornoi cell rendering is O(N W H), my GPU is fast so it isn't an issue.

]]>In yesterday's post I showed how dividing by unwanted roots leads to better stability when finding periodic nuclei \(c\) that satisfy \(f_c^p(0) = 0\) where \(f_c(z) = z^2 + c\). Today I'll show how two techniques can bring this gain to finding periodic cycles \(z_0\) that satisfy \(f_c^p(z_0) = z_0\) for a given \(c\).

The first attempt is just to do the Newton's iterations without any wrong root division, unsurprisingly it isn't very successful. The second attempt divides by wrong period roots, and is a bit better. The third algorithm is much more involved, thus slower, but is much more stable (in terms of larger Newton basins around the desired roots).

Here are some images, each row corresponds to an algorithm as introduced. The colouring is based on lifted domain colouring of the derivative of the limit cycle: \(\left|\frac{\partial}{\partial z}f_c^p(z_0)\right| \le 1\) in the interior of hyperbolic components, and acts as conformal interior coordinates which do extend a bit into the exterior.

The third algorithm works by first finding a \(c_0\) that is a periodic nucleus, then we know that a good \(z_0\) for this \(c_0\) is simply \(0\). Now move \(c_0\) a little bit in the direction of the real \(c\) that we wish to calculate, and use Newton's method with the previous \(z_0\) as the initial guess to find a good \(z_0\) for the moved \(c_0\). Repeat until \(c_0 \to c\) and hopefully the resulting \(z_0\) will be as hoped for, in the periodic cycle for \(c\).

Source code for Fragmentarium: 2018-11-18_newtons_method_for_periodic_cycles.frag.

]]>Previously on mathr: Newton's method for Misiurewicz points (2015). This week I applied the same "divide by undesired roots" technique to the periodic nucleus Newton iterations. I implemented it GLSL in Fragmentarium, which has a Complex.frag with dual numbers for automatic differentiation (this part of the frag is mostly my work, but I largely copy/pasted from C99 standard library manual pages for the transcendental functions, Wikipedia for basic properties of differentiation like product rule, quotient rule, chain rule...).

Here's the improved Newton's method, with the newly added lines in bold:

vec2 nucleus(vec2 c0, int period, int steps) { vec2 c = c0; for (int j = 0; j < steps; ++j) { vec4 G = cConst(1.0); vec4 z = cConst(0.0); for (int l = 1; l <= period; ++l) { z = cSqr(z) + cVar(c);if (l < period && period % l == 0) G = cMul(z, G);} G = cDiv(z, G); c -= cDiv(G.xy, G.zw); } return c; }

And some results, top half of the image is without the added lines, bottom half of the image is with the added line, from left to right the target periods are 2, 3, 4, 9, 12:

You can download the FragM source code for the images in this article: 2018-11-17_newtons_method_for_periodic_points.frag.

]]>I'm playing at ArtFutura in Stour Space next month! Screenings, talks, performances.

]]>## ArtFutura 2018 London

Stour Space

7 Roach Road, Hackney Wick, London E3 2PA

+44 (0) 2089857827## Thursday 22nd November 2018

- 19:30
- Doors
- 20:00
- Premiere (1h) + Behind the Scenes (30min)
- 21:30
- Artist talk - Paul Friedlander (1h)
- 22:30
- Live performance - Claude Heiland-Allen
## Friday 23rd November 2018

- 19:30
- Doors
- 20:00
- Estela Oliva presents CLON (30 min)
- 20:30
- 3D Futura show (1h)
- 21:30
- Futura graphics (1h)
- 22:30
- Live AV performance - Mowgli and the slate pipe banjo draggers
## Saturday 24rd November 2018

- 19:30
- Doors
- 20:00
- Artworks (40min)
- 21:00
- Schools (1h)
- 22:00
- Live AV performance - Christian Duka & Marco Maldarella

While evaluating whether to dive in and get a Bela single board computer for low latency processing of external inputs, that I would want to livecode in the C programming language using my clive system, I found that my initial attempts at cross-compiling into a network-shared folder were doomed, because my host system was running Debian Buster (current Testing, next Stable) and my device system (a Raspberry Pi Model 3 B) was running Raspbian Jessie (current OldStable). Jessie has an ancient glibc, the basic libraries that let C programs and libraries work, and my cross-compiled code needed a newer version.

So I installed Debian Buster arm64 from
unofficial unsupported sources.
After upgrading the system to my liking, I tried the audio through HDMI.
The `aplay` command could make some noise, but it was using a
Pulseaudio backend. Try as I might I couldn't get a raw ALSA backend
to make any sound, especially not with JACK (which output errors about
not supported hardware sample types). So I resorted to radical means.
I set up a JACK server using the dummy backend, and hooked Pulseaudio
to it with `pulseaudio-module-jack`. Then adding a loopback
Pulseaudio device to route the JACK sources/sinks to the hardware:

$ /usr/bin/jackd -ddummy & $ tail /etc/pulse/default.pa load-module module-jack-source load-module module-jack-sink set-default-sink jack_out set-default-source jack_in load-module module-loopback source=jack_in sink=alsa_output.platform-3f902000.hdmi.iec958-stereo $ mplayer -ao jack somefile.ogg -loop 0 & $ jack_connect "MPlayer [15406]:out_0" "PulseAudio JACK Source:front-left" $ jack_connect "MPlayer [15406]:out_1" "PulseAudio JACK Source:front-right"

Note: this is significantly different to the usual software -> Pulse -> JACK -> ALSA -> ears route, it's more like software -> JACK -> Pulse -> ALSA -> ears, though I don't really know what happens beyond Pulse in this setup... All I know is I hear sounds from JACK on the HDMI output, which is all that matters.

Now the last thing is figuring out how to disable the screensaver timeout on the lightdm greeter login prompt, when that kicks in all sound goes off. I can disable the screensaver if I login, but an automatically logged in session is a bit annoying. There is a greeter screensaver timeout variable, but that applies only to lock screen, not login screen. I want to make sounds through HDMI from a remote ssh session.

And another shiny thing would be automatically considering the PulseAudio JACK ports as physical and the JACK dummy system: ports as not, so that automatic connections work as expected (I hear nothing until I manually connect the ports as above).

]]>Start with the iteration formula for the Burning Ship escape time fractal:

\[ X := X^2 - Y^2 + A \quad\quad Y := \left|2 X Y\right| + B \]

Perturb this iteration, replacing \(X\) by \(X+x\) (etc) where \(x\) is a small deviation:

\[ x := (2 X + x) x - (2 Y + y) y + a \quad\quad y := 2 \operatorname{diffabs}(X Y, X y + x Y + x y) + b \]

Here \(\operatorname{diffabs}(c,d)\) is laser blaster's formula for evaluating \(|c + d| - |c|\) without catastrophic cancellation; see my post about perturbation algebra for more details.

Now replace \(x\) and \(y\) by bivariate power series in \(a\) and \(b\):

\[ x = \sum_{i,j} s_{i,j} a^i b^j \quad\quad y = \sum_{i,j} t_{i,j} a^i b^j \]

To implement this practically (without lazy evaluation) I pick an order \(o\) and limit the sum to \(i + j \le o\). Substituting these series into the perturbation iterations, and collecting terms, gives iteration formulae for the series coefficients \(s_{i,j}\) and \(t_{i,j}\):

\[ s_{i,j} := 2 X s_{i,j} - 2 Y t_{i,j} + \sum_{k=0,l=0}^{k=i,l=j} \left( s_{k,l} s_{i-k,j-l} - t_{k,l} t_{i-k,j-l} \right) + \mathbb{1}_{i=1,j=0} \]

The formula for \(t\) requires knowing which branch of the \(\operatorname{diffabs}(X Y, 0)\) was taken, which turns out has a nice reduction to \(\operatorname{sgn}(XY)\):

\[ t_{i,j} := 2 \operatorname{sgn}(X Y) \left( X t_{i,j} + s_{i,j} Y + \sum_{k=0,l=0}^{k=i,l=j} \left( s_{k,l} t_{i-k,j-l} \right) \right) + \mathbb{1}_{i=0,j=1} \]

\(\mathbb{1}_F\) is the indicator function, \(1\) when \(F\) is true, \(0\) otherwise. For distance estimation, the series of the derivatives are just the derivatives of the series, which can be computed very easily.

The series is only valid in a region that doesn't intersect an axis, at which point the next iteration will fold the region in a way that a series can't represent. Moreover, the series loses accuracy the further from the reference point, so there needs to be a way to check that the series is ok to use. One approach is to iterate points on the boundary of the region using perturbation, and compare the relative error against the same points calculated with the series. Only if all these probe points are accurate, is it safe to try the next iteration.

This usually means that the series will skip at most \(P\) iterations near a central miniship reference of period \(P\). But one can do better, by subdividing the region into parts that are folded in different ways. The parts that are folded the same way as the reference can continue with series approximation, with probe points at the boundary of each part, with the remainder switching to regular perturbation initialized by the series.

It may even be possible to use knighty's techniques like Taylor shift, which is a way to rebase a series to a new reference point (for example, one in the "other side" of the fold) to split the region into two or more separate parts each with their own series approximation. The Horner shift algorithm is not too complicated, and I think it can be extended to bivariate series by shifting along each variable in succession:

// Horner shift code from FLINT2 for (i = n - 2; i >= 0; i--) for (j = i; j < n - 1; j++) poly[j] += poly[j + 1] * c

Untested Haskell idea for bivariate shift:

shift :: Num v => v -> Map Int v -> Map Int v shift = undefined -- left as an exercise shift2 :: Num v => v -> v -> Map (Int, Int) v -> Map (Int, Int) v shift2 x y = twiddle . pull . fmap (shift x) . push . twiddle . pull . fmap (shift y) . push push :: (Ord a, Ord b) => Map (a, b) v -> Map a (Map b v) push m = M.fromListWith M.union [ (i, M.singleton j e) | ((i,j), e) <- M.assocs m ] pull :: (Ord a, Ord b) => Map a (Map b v) -> Map (a, b) v pull m = M.fromList [((i,j),e) | (i, mj) <- M.assocs m, (j, e) <- M.assocs mj ] twiddle :: (Ord a, Ord b) => Map (a, b) v -> Map (b, a) v twiddle = M.mapKeys (\(i,j) -> (j,i))

Exciting developments! I hope to release a new version of Kalles Fraktaler 2 + containing at least some of these algorithms soon...

]]>The current work-in-progress version of GraphGrow has three components that communicate via OSC over network. The visuals are rendered in OpenGL using texture array feedback, this process is graphgrow-video. The transformations are controlled by graphgrow-iface, with the familiar nodes and links graphical user interface. The interface runs on Linux using GLFW, and I'm working on an Android port for my tablet using GLFM. The component I'll be talking about in this post is the graphgrow-audio engine, which makes sounds using an audio feedback delay network with the same topology as the visual feedback network. Specifically, I'll be writing up my notes on what I did to make it around 2x as CPU efficient, while still making nice sounds.

First up, I tried gprof, but after following the instructions I only got an empty profile. My guess is that it doesn't like JACK doing the audio processing in a separate realtime thread. So I switched to perf:

perf record ./graphgrow-audio perf report

Here's the first few lines of the first report, consider it a baseline:

Overhead Command Shared Object Symbol 34.41% graphgrow-audio graphgrow-audio [.] graphgrow::operator() 18.11% graphgrow-audio libm-2.24.so [.] expm1f 14.27% graphgrow-audio graphgrow-audio [.] audiocb 8.69% graphgrow-audio libm-2.24.so [.] sincos 8.34% graphgrow-audio libm-2.24.so [.] __logf_finite 4.47% graphgrow-audio libm-2.24.so [.] tanhf

That was after I already made some algorithmic improvements: I had 32 delay lines, all of the same length, with 64 delay line readers in two groups of 32, each group reading at the same offset every sample. This meant there was a lot of duplicated work calculating the delay line interpolation coefficients. I factored out the computation of the delay coefficients into another struct, which could be calculated 1x per sample instead of 32x per sample. Then the delay readers are passed the coefficients, instead of computing them themselves.

Looking at what to optimize, the calls to expm1f() seem to be a big target. Looking through the code I saw that I had 32 dynamic range compressors, each doing RMS to dB (and back) conversions every sample, which means a lot of log and exp. My compressor had a ratio of 1/8, so I replaced the gain logic by a version that worked in RMS with 3x sqrt instead of 1x log + 1x exp per sample:

index a6ba512..588d098 100644 --- a/graphgrow3/audio/graphgrow-audio.cc +++ b/graphgrow3/audio/graphgrow-audio.cc @@ -549,18 +549,25 @@ struct compress sample factor; hip hi; lop lo1, lo2; + sample thresrms; compress(sample db) : threshold(db) , factor(0.25f / dbtorms((100.0f - db) * 0.125f + db)) , hi(5.0f), lo1(10.0f), lo2(15.0f) + , thresrms(dbtorms(threshold)) { }; signal operator()(const signal &audio) { signal rms = lo2(0.01f + sqrt(lo1(sqr(hi(audio))))); +#if 0 signal db = rmstodb(rms); db = db > threshold ? threshold + (db - threshold) * 0.125f : threshold; signal gain = factor * dbtorms(db); +#else + signal rms2 = rms > thresrms ? thresrms * root8(rms / thresrms) : thresrms; + signal gain = factor * rms2; +#endif return tanh(audio / rms * gain); }; };

This seemed to work, the new perf output was:

Overhead Command Shared Object Symbol 38.89% graphgrow-audio graphgrow-audio [.] graphgrow::operator() 22.11% graphgrow-audio libm-2.24.so [.] expm1f 10.78% graphgrow-audio libm-2.24.so [.] sincos 5.76% graphgrow-audio libm-2.24.so [.] tanhf

The numbers are higher, but this is actually an improvement, because if graphgrow::operator() goes from 34% to 39%, everything else has gone from 66% to 61%, and I didn't touch graphgrow::operator(). Now, there are still some large amounts of expm1f(), but none of my code calls that, so I made a guess: perhaps tanhf() calls expm1f() internally? My compressor used tanh() for soft-clipping, so I tried simply removing the tanh() call and seeing if the audio would explode or not. In my short test, the audio was stable, and CPU usage was greatly reduced:

Overhead Command Shared Object Symbol 60.53% graphgrow-audio graphgrow-audio [.] graphgrow::operator() 17.62% graphgrow-audio libm-2.24.so [.] sincos 11.51% graphgrow-audio graphgrow-audio [.] audiocb

The next big target is that sincos() using 18% of the CPU. The lack of 'f' suffix tells me this is being computed in double precision, and the only place in the code that was doing double precision maths was the resonant filter biquad implementation. The calculation of the coefficients used sin() and cos(), at double precision, so I swapped them out for single precision polynomial approximations (9th order, I blogged about them before). The approximation is roughly accurate (only a bit or two out) for float (24bits) which should be enough: it's only to control the angle of the poles, and a few cents (or more, I didn't check) error isn't so much to worry about in my context. Another big speed improvement:

Overhead Command Shared Object Symbol 85.48% graphgrow-audio graphgrow-audio [.] graphgrow::operator() 11.45% graphgrow-audio graphgrow-audio [.] audiocb 1.22% graphgrow-audio libc-2.24.so [.] __isinff 0.64% graphgrow-audio libc-2.24.so [.] __isnanf 0.41% graphgrow-audio graphgrow-audio [.] graphgrow::graphgrow

perf has a mode that annotates the assembly and source with hot instructions, looking at that let me see that the resonator was using double precision sqrt() when calculating the gain when single precision sqrtf() would be enough:

Overhead Command Shared Object Symbol 88.70% graphgrow-audio graphgrow-audio [.] graphgrow::operator() 8.49% graphgrow-audio graphgrow-audio [.] audiocb 1.24% graphgrow-audio libc-2.24.so [.] __isinff 0.65% graphgrow-audio libc-2.24.so [.] __isnanf

Replacing costly double precision calculations with cheaper single precision calculations was fun, so I thought about how to refactor the resonator coefficient calculations some more. One part that definitely needed high precision was the calculation of 'r = 1 - t' with t near 0. But I saw some other code was effectively calculating '1 - r', which I could replace with 't', and make it single precision. Again, some code was doing '1 - c * c' with c the cosine of a value near 0 (so 'c' is near 1 and there is catastrophic cancellation), using basic trigonometry this can be replaced by 's * s' with s the sine of the value. However, I kept the final recursive filter in double precision, because I had bad experiences with single precision recursive filters in Pure-data (vcf~ had strongly frequency-dependent ring time, porting to double precision fixed it).

Overhead Command Shared Object Symbol 87.86% graphgrow-audio graphgrow-audio [.] graphgrow::operator() 9.18% graphgrow-audio graphgrow-audio [.] audiocb 1.37% graphgrow-audio libc-2.24.so [.] __isinff 0.68% graphgrow-audio libc-2.24.so [.] __isnanf

The audiocb reponds to OSC from the user interface process, it took so much time because it was busy-looping waiting for the JACK processing to be idle, which was rare because I still hadn't got the CPU load down to something that could run XRUN-free in realtime at this point. I made it stop that, at the cost of increasing the likelihood of the race condition when storing the data from OSC:

Overhead Command Shared Object Symbol 96.87% graphgrow-audio graphgrow-audio [.] graphgrow::operator() 1.49% graphgrow-audio libc-2.24.so [.] __isinff 0.80% graphgrow-audio libc-2.24.so [.] __isnanf

Still not running in realtime, I took drastic action, to compute the resonator filter coefficients only every 64 samples instead of every 1 sample, and linearly interpolate the 3 values (1x float for gain, 2x double for feedback coefficients). This is not a really nice way to do it from a theoretical standpoint, but it's way more efficient. I also check for NaN or Infinity only at the end of each block of 64 samples (if that happens I replace the whole block with zeroes/silence), which is also a bit of a hack - exploding filters sound bad whatever you do to mitigate it, but I haven't managed to make it explode very often.

Success: now it was using 60% of the CPU, comfortably running in real time with no XRUNs. So I added in a (not very accurate, but C2 continuous) rational approximation of tanh() to the compressor that I found on musicdsp (via Pd-list):

signal tanh(const signal &x) { signal x2 = x * x; return x < -3.0f ? -1.0f : x > 3.0f ? 1.0f : x * (27.0f + x2) / (27.0f + 9.0f * x2); }

CPU usage increased to 72% (I have been doing all these tests with the CPU frequency scaling governor set to performance mode so that figures are comparable). I tried g++-8 instead of g++-6, CPU usage reduced to 68%. I tried clang++-3.8 and clang++-6.0, which involved some invasive changes to replace 'c?a:b' (over vectors) with a 'select(c,a,b)' template function, but CPU usage was over 100% with both versions. So I stuck with g++-8.

The last thing I did was an algorithmic improvement: I was doing 4x the necessary amount of work in one place. Each of 4 "rules" gets fed through 4 "edges" per rule, and each edge pitchshifted its rule down an octave. By shifting (ahem) the pitchshifting from being internal to the edge to being internal to the rule, I saved 15% CPU (relative) 10% CPU (absolute), now there are only 8 pitchshifting delay lines instead of 32.

In conclusion, today I brought down the CPU usage of graphgrow-audio down from "way too high to run in realtime", to "60% of 1 core", benchmarking on an Intel Atom N455 1.66GHz netbook running Linux in 32bit (i686) mode. A side channel may be lurking, as the CPU usage (in htop) of graphgrow-audio goes up to 80% temporarily while I regenerate my blog static site...

]]>