Harry
A recent toot inspired me:
Computermusiker @hecanjog@social.radio.af
Are there any sox-like tools that draw waveforms or spectrographs directly into the console's framebuffer?
It would be a really handy thing to quickly look over audio files on a remote server.
I wasn't happy with my gnuplot hack, so I created harry, a text-mode audio viewer:
It loads an audio file into memory, with 4 subchannels per input channel containing mean, min, max, and rms. Then it generates a mipmap chain, reducing the length progressively by factors of two. This reduction enables fast high quality zooming, at the cost of doubling the data size again. Total memory usage is about 32 bytes per sample per channel, which means a 6 minute CD track explodes to about 1 GB (from about 55 MB).
But! Literally just this minute I remembered that at the base (longest) level, all of mean, min, max, rms are identical, so I can reduce the total memory to 5/8 of the current bloat, at the cost of more complicated logic when cleaning up the buffers (double-free is not allowed). I may implement that later tonight.
When plotting a waveform the naive way, one might simply plot all the points. But audio files have many 1000s of points, and unless you are zoomed all the way in it's just too much data to deal with. One way of reducing might be to sample the data at a subset of points, but that leads to aliasing (OctaMED SoundStudio's sample editor does this, it is very bad and very confusing because the visual appearance of the waveform has little resemblance to what is really happening). The way to anti-alias is to filter, and I'm using a simple box filter kernel (neighbouring pairs of samples, non-overlapping) in the reduction stages.
So the variables that are plotted:
- mean
- This is a simple low pass filter, that approximates what a band-limited version of the signal would be. As well as for plotting, these buffers are used for variable (higher) speed audio playback with cubic interpolation, then linear interpolation between neighbouring mipmap levels.
- rms
- Raw RMS is not so useful, but it can be cooked to standard deviation
via
sqrt(rms * rms - mean * mean)
, and that gives a meaningful bound on the spread of the signal values in each cell - min, max
- Peaks are useful to monitor because exceeding the value range (typically -1 .. 1) leads to ugly clipping distorion.
- zero
- Silence is the baseline for everything.
At high zooms the actual waveform is visible:
file 0:00:05:53.851 mathr_-_lockaccum.wav view 0:00:00:00.058 gain 1 | speed 1 at 0:00:00:00.319 | @@@ | ^~v~@ @~~~@@@ | @@~~@@^ ^~~ ^ @@@@^@ @v v~ ~|@v @@v~@^ ^^@@@@@@@~ v v~^ @ @^ ^@@| ~~ ~@vvvvv @@^ ------@--------@----------@-------------|--------@-----------@@----------@@----- @ ~ ~ @@ | @v^^ ^@@~v ~@~^ @ v@ @v ~~ | @@@ v@@@v v@@^^ ~@@v@@ @~v~ @@@ | ~@@v @@ v@ @@@@ | v | ~@~ | ^ ^ ^@@ @ ^@@~@@@@ | @@@@@@ ~@ @^ ^ @vv@~~ ~@v ~~ ~|@v~ v @ ^^@~@~@@@@^ v@^ @ @^ @@| ~@ ^@@v v vvv@@ ------@--------@----------@----------@--|---------@-----------------------@----- ~ ~~ ~@ ^~ | ~@ ~@@ ~@~ @~ @@ ~~ @@ | v~@@~@@@@@ v v@@@^ @@@@@~ @@ @v@@v | ~@v ~ v~@ vv v@~@~ | >]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
At medium zooms the mean and standard deviation start to give a thick core while the outer min/max fringes show the peaks:
file 0:00:05:53.860 mathr_-_lockaccum.wav view 0:00:00:00.928 gain 1 | speed 1 at 0:00:00:10.303 | ^ ^ ^^^^ ^^^ ~ ^^^^ ^ ^ | +^ ^^ ^+ ^ ++~+ ~+~ ^* ~ +~~+^ ~+^+^ | ^ ^ ~+ +~ ^~~^~ ~ ~ ~~*~~*~*~+@ * @ @ +**~~ ^~^*~~++^ | ~ ~^ ^^ ^~ ^^^~ *~~~*^+**+@ * * ~^^**@****@*~~^@ *^~^ ~@@*@^+*~@*@~~~~^|^@^@+~~~~~^@~~~*^^@~~@^****@~+*@+*^@^@ @@@@@-@@@---------@---@---@--@@@-@-@-@@@|----@@@@@@@-@@-@---@@--@@@@-@-@-----@-- ~v~**~*~*@~@*v*v~ ~ @@~@*~*~@+**v~+~@~~~|@v@v*~+ ~~~ ~~@+@@v*~ ~******@*~*+@v~v~ ~~+~+~*+~@ @ @ ~vv~~v*v*+*~ v+v~ +v|~ ~ ~vv v v vvvvv~ ~+ @~~~~~~+~+@v* v @ vv++v+*vv+ ~ ~ v v ~ ~+~ v v v | v vv v+v++vvv+v~ ~ ~ vv v~ v v v vvv | v vv v v v ^ | ^ ^^ ^^^ ^ ^^+ ^ | ^^ ^ ^^^ ^ ~^+~ ~+~ ~ ^ ^ ~ +~~^^ ~^^^+^^ | ^ ^ ^^ ~+ ~~ ++~^~ ^ ~ ^ *~~*^*+*~^@ ~ ~ * +**~~ *~+~~~~^^ | ~ ~ ^^^ ^ ++^ ~^~~ *~~**^+~*+@ ~ * ~^^**@*~*~@*+~^@ @^@ ^ ~@@*@^**+@*@*~+^~| @^@~^^@+~^~^^~@~ @+*@^@***@~~*@+*^@^@ @@@@@-@@@--@------@----@--@--@---@-@@--@|----@@--@@@@@@---@--@---@-@-@-@-------- ~v~*~~*~*@~**v*v~ v @@@*~~**@*@@+~~+~@@~|@v@v*~@ ~~* ~~@~@~ @~ @*~@**~@*~~v@+@+~ ~++~v**+~@ @ * ~v ~++*~*~~vvv+++ +v|v ~ ~vv v ~ v~ ~ ~+ ~~+**~v*~v@ *v~v@ +v++ ~*vv+ ~ @ v vvv~ *++ vv+ v | v +v~~v ~v * ~ v ~ v vv v~ v v v ~vv v | v v v ~ [[>]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
At low zooms the long-time-scale dynamics are visible:
file 0:00:01:30.650 01.Yawning_Zeitgeist_Intro_(Freestyle).flac view 0:00:00:59.443 gain 1 | speed 1 at 0:00:00:44.582 ^ | ^ ^ ^ ^ + ^^| ^^ ^^ ^ ^ ^ ^ ^ ^ ^ ^+^^^^ ^^ +^+ ^ ^+^+^++| ^^ ^^ ^++ ^ ++^^^+ ^ ^+ ^ ^ ^ ^^ ^^^ + +^+^+^^+^ ^++++++^++^+++^+^+++++++|^++^ ^^++^ +++^^^+^++++++^+^++^+^+^+^++ +~~~+^~~~~~~~+~~~~+++~+~~~++~++~+~+~+~~~|++~~^+++~+^~~~++~+~~~~+~+++~~+~++++~+~~ @@@-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ++v@~~+~~~~+~~+v~+~~++++~~~~~++~~~++~~~+|~~++~~~~~v~+~~+~+~v~~++++~~+~+~~~+~~+~~ +v v+v+vvv++v+v vv+v+++vv+++v+++v+++++++|v+++ vv+v +++++++ ++v+++++v++++v++vv++ v v v vv v v vvv vvv +vv vv++vv+| vvv v vv+vvvv vv +v+vv vvvv vv +v v v+ v| v v v v ^ v | +^ ^ ^ ^ ^ ^^^ | ^ ^^ ^ ^^^^^ ^ ^ ^ ^^ ^ ^ ^++^^^ ^+^+ +^^^^+^+++^| +^ ^ ^ ^++^ ^+ ^+++++^^ ^+^+^ ^^ + ++^^+ ^^^^^^^+^ ^++++++^++++^+++++++++++|^++^ +^^+ ++++^++^++++++++^+++++^++^^+ ~+~~~^~+++++~+~^~~~~+~~~~~++++~~+~~~~~~~|~+++^+~~+~^+~~~~~~~+~~+~~~~+~~~~~~~~+++ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ +++~+++~++~+~++v~~~~~+~~~~~~~~++~~~+~+~+|v+++~++~+v~~~~~+~~~~+~~~~~++~~~~~~~~+~~ ++v+vv+vvv++vvv +++v+++v++++++++++++++++| +++ +v++ v++++++v++++++++v++++++++v++ +v v v vv v++ vvv v+v+v++vvvvv+vv+| v+v v vv ++vvvv vvv+v++v vvv+vvvv v+ v vv v v vv v v| v vv v vv v v [[[[[[[[[[[[[==========================>==========================]]]]]]]]]]]]]]
These ASCII grabs don't do it justice, because the real code uses ANSI control sequences via ncurses to make parts more or less bold. It also optionally plays audio using SDL2 (which has a PulseAudio backend that can work remotely over local network) with a synchronized play head (in two modes), with whole file or A-B subregion looping (same key controls as MPV).
Another cool thing in the build system embeds the sources into the executable using the linker, so it's easy to fulfil your AGPLv3+ license obligations. A command line flag writes out the files again. Check the Makefile for details, and the README (duplicated to the harry website via Pandoc and some hand editing) for the documentation.