# # Phase Vocoder

A spectral algorithm for independent time and pitch manipulation.

Uses discrete Fourier transform (DFT / FFT).

## # 1 Time Stretch

### # 1.1 Parameters

These can be tuned to taste.

$$M$$: input audio block size (integer, eg 8192).

$$\Delta I$$: input audio hop size (integer, eg 19).

$$N$$: zero-padded block size (for DFT, integer, eg 262144).

$$\Delta O$$: output audio hop size (integer, eg 432).

The time dilation factor is $$\frac{\Delta O}{\Delta I}$$.

### # 1.2 Algorithm

Take $$M$$ samples of input audio every $$\Delta I$$ samples.

Multiply by raised cosine window of length $$M$$ (peak amplitude $$2$$, mean $$1$$).

Zero-pad to length $$N$$, call it $$x(n)$$ where $$n$$ is the block index in $$0, 1, 2, \ldots$$.

Take the discrete Fourier transform of $$x(n)$$, call it $$X(n)$$.

For each bin, normalize the (complex-valued) ratio $$\frac{X(n)}{X(n - 1)}$$ to magnitude $$1$$, and raise it to the power $$\frac{\Delta O}{\Delta I}$$ (which need not be an integer). Call the result $$\delta \theta(n)$$. In case of division by zero or other badness, set $$\delta \theta(n) = 1$$.

Increment the phase of each bin by $$\theta(n) = \theta(n - 1) \times \delta \theta(n)$$ and normalize (just to be safe in case of rounding errors). Phase of $$\theta(-1)$$ is probably arbitrary but should have magnitude $$1$$.

Then the output Fourier transform has the input’s magnitude with the accumulated phase: $$Y(n) = |X(n)| \times \theta(n)$$.

Take the inverse Fourier transform of $$Y(n)$$, call it $$y(n)$$.

Multiply by raised cosine window of length $$M$$ (peak amplitude $$2$$, mean $$1$$).

Multiply by gain factor: $$G = \frac{\Delta O}{M \times N}$$. This assumes that the gain of the DFT/FFT is not normalized.

Accumulate $$M$$ samples spaced every $$\Delta O$$ samples to output audio stream (overlap-add).

### # 1.3 References

• Pure-data documentation 3.audio.examples/I07.phase.vocoder.pd (Pd version 0.53).