While rendering some GPU-intensive OpenGL stuff I got scared when my graphics card hit 90C so I paused the process until it had returned to something cooler. I got fed up pausing and restarting it by hand so I wrote this small script:

#!/bin/bash kill -s SIGSTOP "${@}" running=0 stop_threshold=85 cont_threshold=75 while true do temperature="$(nvidia-smi -q -d TEMPERATURE | grep 'GPU Current Temp' | sed 's/^.*: \(.*\) C$/\1/')" if (( running )) then if (( temperature > stop_threshold )) then echo "STOP ${temperature} > ${stop_threshold}" kill -s SIGSTOP "${@}" running=0 fi else if (( temperature < cont_threshold )) then echo "CONT ${temperature} < ${cont_threshold}" kill -s SIGCONT "${@}" running=1 fi fi sleep 1 done | ts

If you want to run it yourself, I advise checking the output from nvidia-smi on your system because its manual page says the format isn't stable. Moreover I suggest monitoring the temperature, at least until you're sure it's working ok for you. Usage is simple, just pass on the command line the PIDs of the processes you want to throttle by GPU temperature, typically these would be OpenGL applications (or Vulkan / OpenCL / CUDA / whatever else they come up with next).

]]>A while ago I read this paper and finally got around to implementing it this week:

Real-Time Hatching

Emil Praun, Hugues Hoppe, Matthew Webb, Adam Finkelstein

Appears in SIGGRAPH 2001

The key concept in the paper is the "tonal art map", in which strokes are added to a texture array's mipmap levels to preserve coherence between levels and across tones - each stroke in an image is also present in all images above and to the right:

My possibly-novel contribution is to use the inverse (fast) Fourier transform (IFFT) to generate blue noise for the tonal art map generation. This takes a fraction of a second, compared to the many hours for void-and-cluster methods at large image sizes. The quality may be lower, but something to investigate another time - it's good enough for this hatching experiment. Here's a contrast of white and blue noise, the blue noise is perceptually much more smooth, lacking low-frequency components:

The other parts of the paper I haven't implemented yet, namely adjusting the hatching to match the principal curvature directions of the surface. This is more a mesh parameterization problem - I'm being simple and generating UVs for the bunny by spherical projection, instead of something complicated and good-looking.

My code is here:

git clone https://code.mathr.co.uk/hatching.git

Note that there are horrible hacks in the shaders for the specific scene
geometry at the moment, hopefully I'll find time to clean it up and make it
more general soon. You'll need to download the `bunny.obj`

from
cs5721f07.

I implemented a little widget in HTML5 Javascript and WebGL:

/clusters/

It's inspired by Clusters by Jeffrey Ventrella, but its source seems to be obfuscated so I couldn't see how it worked. Instead I worked backwards from the referenced ideas of Lynn Margulis. I modelled a symbiotic system by a bunch of particles, each craving or disgusted by the emissions of the others. There are a settable number of different substances, and (currently hardcoded) 24 different species with their own tastes, represented by different colours. The particle count is settable too, but due to a bug in my code you have to manually refresh the page after doing it (and don't go too high, the slow down is \(O(n^2)\)).

Some seeds give really interesting large-scale structures that chase each other around, with bits peeling off and joining other groupings. If A is attracted to B but B is repulsed by A, then a pursuit ensues. If the generated rule weights (576 numbers with the default settings) align just right you can get a chain or even a ring that becomes stable and spins on its own accord. Other structures include concentric shells in near-spherical blobs.

One thing I'm not happy with is the friction - I had to add it to make the larger clusters stable, but it makes smaller clusters less mobile. There's probably something my naive model misses from Ventrella's original, maybe some kind of satiation and transfer of actual materials between particles, rather than a per-species (dis)like tendency. If more satiated particles were to move less quickly than hungry particles, that might fix it. I'll try it another day!

]]>Late last year I implemented some coupled continuous cellular automata, inspired by Softology's experiments. Now I'm finally getting around to blogging about it. I used OpenGL shaders, here's some of the fragment shader source of the main algorithm:

void main() { vec4 s1 = texture(state, coord, 1.0); vec4 s100 = texture(state, coord, 100.0); vec4 s; for (int k = 0; k < 4; ++k) s[k] = texture(state, coord, blur[k])[k]; vec4 h = texture(history, coord); s = coupling * (s - s100) + h; s = speed * s; s = mix(s1, vec4(0.5) + 0.5 * cos(s), 0.125); h = mix(s, h, decay); state_out = s; history_out = h; }

The non-linearity of the `cos()`

on the coupled input acts like
a "reaction", the blurring (looking up reduced mipmap levels from the texture)
acts like "diffusion".

Colouring is done with another affine matrix transform, the output from which is thresholded and clamped, before edge-detection filter is applied. The edge-detection uses dFdx and dFdy, so the results are coarse (these derivatives are typically computed for blocks of 2x2 pixels, rendered together in parallel) - for better results the edge detection could be done in another pass, or the whole thing could run at double the resolution and be resized down to screen size afterwards.

Here's a video of it in action:

Here are some static images:

Here is another video, from January when it was still in colour:

Here's where you can get the code:

git clone https://code.mathr.co.uk/cca.git

Future work might be to do proper Gaussian blurs (it's separable, so even large radius might be feasible in real-time) instead of the cheap (but yielding squarish grid artifacts) mipmap reduction.

**EDIT** I worked on it some more, now in colour and with a
high quality mode that does Gaussian blur (on my system frame rate drops from
~60fps to between ~5fps and ~30fps depending on blur radius). Pictures:

I also added a mutation mode, which randomizes the parameters one by one at random. Here's a final example video showing off the new features:

]]>

After instrumenting Monotone with OpenGL timer queries I could see where the major bottleneck lay:

IFS( 7011.936000 ) FLAT( 544.672000 ) PBO( 2921.728000 )SORT( 6797.760000 ) LUP( 71136.064000 )TEX( 284.224000 ) DISP( 272.480000 )

LUP is the per-pixel binary search lookup for histogram equalisation (to compress the dynamic range of the HDR fractal to something suitable for display), the previous SORT generates the histogram from a 4x4 downscaled image. A quick calculation shows that this LUP is taking 80% of the GPU time, so is a good focus for optimisation efforts.

The 4x4 downscaled image for the histogram is still a lot of pixels: 129600. LUP involves finding an index into this array, which gives a value with around 17bits of precision. However, typical computer displays are only 8bit (256 values) so the extra 9 random-access texture lookups per pixel to get a more accurate value are a waste of time and effort. Combined with a reduction of the downscaled image to 8x8, the optimisation to compute a less accurate (but visually indistinguishable) histogram equalisation allows Monotone to now run at 30fps at 1920x1080 full HD resolution. Here are the post-optimisation detailed timing metrics:

IFS( 7087.104000 ) FLAT( 509.888000 ) PBO( 2744.864000 )SORT( 1409.440000 ) LUP( 15696.352000 )TEX( 281.472000 ) DISP( 290.848000 )

A productive day!

]]>I blogged about this before with an animation (Rolling Torus), the only thing missing now is the Fragmentarium source code, so here it is:

]]>#define providesColor #include "Soft-Raytracer.frag" uniform float time; const float pi = 3.141592653; const float s = 2.0; const float ri = s / (sqrt(s * s + 1.0) + 1.0); const float ro = s / (sqrt(s * s + 1.0) - 1.0); const float rc = 0.5 * (ro + ri); const float rt = 0.5 * (ro - ri); const vec3 rgb[5] = vec3[]( vec3(1.0, 0.7, 0.0), vec3(0.7, 1.0, 0.0), vec3(0.0, 0.7, 1.0), vec3(0.7, 0.0, 1.0), vec3(1.0, 0.0, 0.7)); float torus(vec3 z) { return length(z - rc * normalize(vec3(z.xy, 0.0))) - rt; } float plane(vec3 z) { return z.z; } vec3 spin(vec3 z) { float a = 2.0 * pi * time / 3.0; mat2 m = mat2(cos(a), sin(a), -sin(a), cos(a)); z.xy = m * z.xy; return z.xzy - vec3(0.0,ro,0.0); } vec3 baseColor(vec3 q, vec3 n) { vec2 uv = vec2(0.0); if (q.z < 0.01) { uv = q.xy * sqrt(5.0); } else { vec3 p = spin(q); float k = -1.0; float l = 1.0; if (p.z > 0.0) { l = -l; } if (length(p.xy) > rc) { k = -k; } float a = (p.z * p.z * sqrt(s * s + 1.0) + k * sqrt(max(1.0 - p.z * p.z * s * s, 0.0))) / (p.z * p.z + 1.0); float y = l * acos(clamp(a, -1.0, 1.0)) / (2.0 * pi); float x = s * atan(p.x, p.y) / (2.0 * pi); if (x < 0.0) { x = s + x; } if (y < 0.0) { y = 1.0 + y; } x = 5.0 * x; y = 5.0 * y; uv = vec2(y, x); float b = atan(1.0, 2.0); mat2 m = mat2(cos(b), sin(b), -sin(b), cos(b)) * sqrt(5.0); uv = m * uv; } vec3 t; if (mod(uv.x, 1.0) < 0.25 || mod(uv.y, 1.0) < 0.25) { t = vec3(0.0); } else { int k = clamp(int(mod(floor(uv.x) + 2.0 * floor(uv.y), 5.0)), 0, 4); t = rgb[k]; } return vec3(t); } float DE(vec3 z) { return min(torus(spin(z)), plane(z)); } #preset default FOV = 0.3 Eye = -5,-5,2.25 Target = -0.593772,1.17202,1.5097 Up = 0,0,1 EquiRectangular = false FocalPlane = 5 Aperture = 0.01062 Gamma = 1 ToneMapping = 1 Exposure = 1 Brightness = 1 Contrast = 1 Saturation = 1 GaussianWeight = 5.2308 AntiAliasScale = 1.7333 Detail = -5.47582 DetailAO = -3.26669 FudgeFactor = 1 MaxRaySteps = 317 BoundingSphere = 12 Dither = 1 NormalBackStep = 1 AO = 0,0,0,0.32456 Specular = 0.2 SpecularExp = 24.051 SpotLight = 0.737255,0.686275,0.533333,1.8841 SpotLightPos = 10,-5.0684,8.0822 SpotLightSize = 2.3 CamLight = 0.792157,0.556863,0.682353,0.6579 CamLightMin = 3e-05 Glow = 0,0.917647,1,0 GlowMax = 16 Fog = 0 Shadow = 0.82906 NotLocked Sun = -1.29354,1.01588 SunSize = 0.001 Reflection = 0 NotLocked BaseColor = 1,1,1 OrbitStrength = 1 X = 0.5,0.6,0.6,0.7 Y = 1,0.6,0,0.4 Z = 0.8,0.78,1,0.5 R = 0.4,0.7,1,0.12 BackgroundColor = 0.501961,0.501961,0.501961 GradientBackground = 1.66665 CycleColors = false Cycles = 0.1 EnableFloor = false NotLocked FloorNormal = 0,0,0 FloorHeight = 0 FloorColor = 1,1,1 #endpreset

My video piece Monotone has been accepted to the Mozilla Festival art exhibition. Mozilla Festival 2016 takes place October 28-30, at Ravensbourne College, London.

Since submitting the pre-rendered video loop I've been working on improving the real-time rendering mode of the Monotone software. The main bottle-neck at this time is the histogram equalisation to take the high dynamic range calculations down to a low dynamic range image for display. I did manage to get a large speed boost by calculating the histogram on a \(\frac{1}{4} \times \frac{1}{4}\) downscaled image, but on my hardware it only achieves \(\frac{1}{2} \times \frac{1}{2}\) of the desired resolution (HD 1920x1080). On my NVIDIA GTX 550 Ti with 192 CUDA cores I get 960x540 at 30 frames per second. Recent hardware has 1000s of cores, so perhaps it's just a matter of throwing more grunt at the problem.

If you want to try it (and have OpenGL 4 capable hardware, and development headers installed for GLFW and JACK, among other things; only tested on Debian):

git clone https://code.mathr.co.uk/monotone.git git clone https://code.mathr.co.uk/clive.git cd monotone/src make ./monotone

You can also browse the monotone source code repository. If you do have a significantly more powerful GPU than me, you can try to edit the source code monotone.cpp to change "#define SCALE 2" to "#define SCALE 1", which will make it target 1920x1080 instead of half that in each dimension. I'd love to hear back if you get it working (or if you have trouble getting it running, maybe I can help).

**UPDATE** here are some photos, the projection was really
impressive, so I'm satisfied even though the sound aspect was absent:

The festival was pretty interesting, many many things all going on at once. Highlight was the Sonic Pi workshop (though I spent most of the time dist-upgrade-ing to Debian Stretch so I could install it), and the Nature In Code workshop was also interesting (though it was packed full and uncomfortable so I didn't attend the full session).

]]>Back at the end of April last year I think I was futzing about on math.stackexchange.com answering a question about rendering negative multibrot sets, for example one produced by iterations of \(z \to z^{-2} + c\). I tried applying the atom domain colouring from the regular Mandelbrot set, but found it looked better if I accumulated all the partials with additive blending, not just the final domain. Here's a zoomed in view:

I implemented it as a GLSL fragment shader in Fragmentarium, here's the source code (which you can download too):

// Mandelbrot set for \( z \to z^{-n} + c \) coloured by Lyapunov atom domains// Created: Thu Apr 30 15:10:00 2015#include "Progressive2D.frag" #include "Complex.frag" const float pi = 3.141592653589793; const float phi = (sqrt(5.0) + 1.0) / 2.0; #group Lyapunov atom domains uniform int Iterations; slider[10,200,5000] uniform int Power; slider[-16,-2,16] vec3 color(vec2 c) {// critical point is \( 0 \) for positive Power, and \( 0^Power + c = c \)// critical point is \( \infty \) for negative Power, and \( \infty^Power + c = c \)// so start iterating from \( c \)vec2 z = c;// Lyapunov exponent accumulatorfloat le = 0.0;// atom domain accumulatorfloat minle = 0.0; int mini = 1;// accumulated colourvec4 rgba = vec4(0.0); for (int i = 0; i < Iterations; ++i) {// \( zn1 \gets z^{Power - 1} \)vec2 zn1 = vec2(1.0, 0.0); for (int j = 0; j < abs(Power - 1); ++j) { zn1 = cMul(zn1, z); } if (Power < 0) { zn1 = cInverse(zn1); }// \( dz \gets Power z^{Power - 1} \)vec2 dz = float(Power) * zn1;// \( z \gets z^{Power} + c \)z = cMul(zn1,z) + c;// \( le \gets le + 2 log |dz| \)float dle = log(dot(dz, dz)); le += dle;// if the delta is smaller than any previous, accumulate the atom domain domainif (dle < minle) { minle = dle; mini = i + 1; float hue = 2.0 * pi / (36.0 + 1.0/(phi*phi)) * float(mini); vec3 rainbow = 2.0 * pi / 3.0 * vec3(0.0, 1.0, 2.0); vec3 domain = clamp(vec3(0.5) + 0.5 * sin(vec3(hue) + rainbow), 0.0, 1.0); rgba += vec4(domain, 1.0); } }// accumulated 'iterations' logs of squared magnitudes// so divide by 2 iterationsle /= 2.0 * float(Iterations);// scale accumulated colour and blacken interiorreturn mix(rgba.rgb / rgba.a, vec3(le < 0.001 ? 0.0 : tanh(exp(le))), 0.5); }

More negative powers than -2 don't look very good, though.

]]>The classic Sayagata tiling is a Euclidean (flat) plane tesselation based on a square grid. For May's calendar image I thought it would be fun to make a hyperbolic variation, with 5 squares meeting at each vertex instead of 4. Furthermore, to wrap the hyperbolic plane into a repeating ring shape like what Bulatov did (warning, that page is slow loading but worth it in the end).

I implemented it as a GLSL fragment shader in Fragmentarium (download my source code) with a few constants for changing the tiling parameters (the hyperbolic symmetry group, the orientation across the band, number of repetitions, whether to draw a pretty pattern or just triangles). Here are a few variations:

The code has a few comments; the gist of the overall operation is to transform the coordinates from the ring back into the PoincarĂ© disc model of hyperbolic geometry, then repeatedly apply hyperbolic symmetries towards a central fundamental region, which finally gets the Sayagata texture. Note that tweaking some of the values leads to ugly seams with the parts not lining up - still some bugs I guess! (WONTFIX DONTDOTHAT)

]]>**UPDATE** my GL 3.3 version may have been
faster, but modifications to the original program are faster
still (there was a misapplied patch slowing things down
drastically, that getting fixed made it easier to improve).
Check
this rrv issue thread
for more details.

Radiosity is a method for computing diffuse lighting. Unlike raytracing, radiosity is viewpoint independent, which means the lighting calculations can be performed once for a given scene, then visualized multiple times with different virtual camera positions. But ray tracing also supports specular reflections and transparency, so eventually a more complete renderer could combine both.

While searching for radiosity implementations I found rrv, which uses OpenGL to render the view from each patch (triangle) in the scene. My first experiences were disappointing - it was very slow, but that turned out to be missing optimisation flags. But I found some further ways to optimize it - first using vertex buffers instead of glBegin() so that the scene geometry is uploaded to the GPU only once instead of per-patch, secondly using a large flat array instead of a C++ std::map to accumulate the results of rendering. These optimisations gave a speed boost around 7x, but I wasn't satisfied.

I ended up porting the radiosity renderer core to run almost entirely on the GPU, using OpenGL 3.3 (though it could probably be ported to OpenGL 2.1 with a few extensions like floating point textures and framebuffers). The speed boost is around 30x faster than the original OpenGL 1 implementation. Here's rrv's room4 demo scene with lighting calculated by my port in a little over 12 minutes (the visualizer code remains unchanged):

Another change I made involved the form factor calculations for hemicube projection. Radiosity ideally projects to a hemisphere and from there to a circle, but rectangular grids are more convenient for computers. So radiosity implementations tend to project to a hemi-cube, rasterizing the scene 5 times, once for the top and each of the four sides. Then each pixel in the result is scaled by a delta form factor, so that it corresponds more closely to the hemisphere circle projection.

rrv's form factor calculations used a product of cosines, which looked suspect to me as it didn't take into account the edges and corners of the hemicube, so I did some searching and found a paper which gave some different formulas:

The Hemi-cube: a Radiosity Solution for Complex Environments

Michael F. Cohen and Donald P. Greenberg

ACM SIGGRAPH Volume 19, Number 3, 1985

I implemented them in a test program and plotted a comparison between RRV2007 (magenta) and Cohen1985 (green):

The difference is small, but visible in the visualizer as slight shape differences between quantized bands on the gradients. Here's a comparison between the output after 1 step with the two different form factors, amplified 64 times (mid-grey is equal output):

When time allows, I have some ideas for some additional features, like lighting groups (compute the radiosity for each light separately, then combine them at visualization time - hopefully I'm correct in assuming linearity - allowing the brightness (but not colour) of each light to be adjusted separately) and scene symmetry (like an infinite corridor repeating every 5m or so, the symmetry means the radiosity for translationally equivalent parts of the scene must be identical).

I put my changes in a branch at code.mathr.co.uk/rrv, which may end up merged to the upstream at github.com/kdudka/rrv.

]]>I've already blogged a bit about this calendar image for April (yep, late again...). I made a diagram using Fragmentarium to show the principle behind it:

Five points forming a regular pentagon rotate, their vertical coordinates make five sine waves. Each wave is split into a number of rectangles, whose width depends on the height of the wave. Then all five sets of rectangles are overlayed. Because the centroid of the pentagon is fixed no matter the rotation angle, the sum of the waves remains constant - this property is used in windowed overlap add resynthesis of FFT for frequency-domain processing in audio.

But it's not the same as the original, which you can tell by zooming in:

The original (above) has an equal amount of white space between each coloured strip, while in the recreation (below) the amount of white space changes. And the order of the colours in the recreation is flipped from the original, and the coloured strips in the recreation are a bit narrower than the original too.

You can download the source code for the recreation: gradient.frag. For extra fun switch to animation mode in Fragmentarium to see the pentagon rotate with the waves streaming out from it. I did render a video but it looks really horrible compressed, so unless you have Fragmentarium or you wait until I get around to porting this code to WebGL (not very likely..) you'll just have to imagine it.

]]>mightymandel now has a home page, where you can find documentation, downloads, links, and an image gallery. Highlights from the changes since v15 include:

- sliced rendering
- Splits iterate computations into smaller batches, allowing larger images to be rendered in the same amount of video memory (each pixel needs 12-16 bytes, but each iterate needs 128-160 bytes).
- progressive rendering
- When sliced rendering is enabled, the slices are ordered in such a way that a lofi pixelated image appears first, which gradually refines into the final hifi image.
- improved glitch correction
- Finding reference points now uses a two-pass blob extraction algorithm, first glitched blobs are extracted, then the most-glitched sub-blobs are extracted. Small blobs are ignored (default 1 pixel) for faster completion (can be disabled for previous behaviour).
- improved no-de colouring
- Now closer to the algorithm described in my blog post faking distance estimate colouring which should make it smoother.
- zoom motion blur
- The zoom assembler adds motion blur to reduce unpleasant strobing for fast zooms at standard frame rates. The shutter speed is variable, so you can adjust the amount of blurring to suit your tastes.
- downloads
- No longer do you need git to get mightymandel, there are source code tarballs and Windows binaries (cross-compiled from Linux).

A final note, my future blog posts about mightymandel will have fewer tags, you can subscribe to the mightymandel tag feed to stay updated.

]]>