mathr / blog / #

Divergent Protocol

Divergent Protocol

Downloads:

FLAC (83 MB) VBR MP3 (21 MB) Ogg Vorbis (10 MB)
PNG image (403 kB) Shell source code (2 kB) transcript (21 kB)

A feedback process involving speech synthesis and automatic transcription: text is converted to speech using the Flite engine, then the audio is reversed using SoX, and converted back to text with Pocket Sphinx and fed back into the start of the process. If an encode/decode loop is detected, the additional text "Divergent Protocol" is inserted to break the cycle.

POSIX shell source code:

#!/bin/sh
# Divergent Protocol (c) 2017 Claude Heiland-Allen
# mkdir dp && cd dp && ../dp.sh 'divergent protocol' 9
if [ "$1" ]
then
  if [ "$2" ]
  then
    utterance="$1"
    target="$2"
    n=1
    count=1
    touch dp.txt
    cat > dp.hs <<EOF
import Data.List
main = interact $ unwords . map head . group . words
EOF
    ghc -O2 dp.hs
    while true
    do
      for voice in slt awb
      do
        utterance="$(echo "${utterance}" | ./dp)"
        speak="${voice}: ${utterance}"
        if grep -q "^${speak}$" dp.txt
        then
          speak="${speak} ${1}"
          utterance="${utterance} ${1}"
          count=$((count + 1))
        fi
        echo "${speak}" | tee -a dp.txt
        echo "${utterance}" | flite -voice "${voice}" -o "${voice}.wav"
        sox "${voice}.wav" utterance.wav reverse
        wait
        if [ "${voice}" = "slt" ]
        then
          sox -M "${voice}.wav" -v 0.7 "${voice}.wav" stereo.wav swap
        else
          sox -M "${voice}.wav" -v 0.7 "${voice}.wav" stereo.wav
        fi
        cp -af stereo.wav "${n}.wav"
        n=$((n + 1))
        # mplayer -quiet -really-quiet stereo.wav &
        if [ $count -ge $target ]
        then
          wait
          rm -f slt.wav awb.wav stereo.wav utterance.wav
          rename s/^/0/ ?.wav
          rename s/^/0/ ??.wav
          rename s/^/0/ ???.wav
          rename s/^/0/ ????.wav
          rename s/^/0/ ?????.wav
          sox ??????.wav -e float dp.wav rate 44100 pad 0 6 reverb \
            compand 0.01,0.1 6:-1000,-999,-10 -12 -1000 0.01
          exit 0
        fi
        utterance="$(pocketsphinx_continuous -infile utterance.wav 2>/dev/null)"
      done
    done
  else
    echo "usage: $0 '$1' 9"
    exit 1
  fi
else
  echo "usage: $0 'divergent protocol' 9"
  exit 1
fi

The embedded Haskell program elides repeated words, without it I found it would get stuck into "... she she she she she ..." which didn't sound too good. On my 3GHz AMD64 system, it runs faster than realtime, so could conceivably be modified to run continuously as an internet radio station.