mathr / blog / #

ppmtoy4m debottlenecking

I've been doing some video rendering lately, and I've noticed that a major bottleneck in my encoding pipelines is the ppmtoy4m utility from mjpegtools for converting from RGB to YUV. I set about removing this bottleneck, by writing my own tool mostly from scratch (only the colour space look up table generation was copied from the colorspace.c from the original program).

The main things optimized for were file reading and writing, and a tight loop for converting chunky RGB to planar YUV via lookup tables. The main misfeatures of my quick and dirty version include improper PPM header parsing (the width and height must be specified as command line arguments and the PPM header must be in the "usual" format without any comments), and hardcoded values for output Y4M stream parameters. I don't support any colour space subsampling either, because y4mscaler gives much better results in any case.

With benchmark.ppm containing 250 frames at 1024x576:

ppmtoy4m -v 0 -S 444 -F 25:1     < benchmark.ppm > /dev/null
3.92user 0.21system 0:04.15elapsed  99%CPU (0avgtext+0avgdata  9888maxresident)k 0inputs+0outputs (0major+ 678minor)pagefaults 0swaps
3.93user 0.16system 0:04.09elapsed  99%CPU (0avgtext+0avgdata  9904maxresident)k 0inputs+0outputs (0major+ 679minor)pagefaults 0swaps
3.90user 0.21system 0:04.11elapsed  99%CPU (0avgtext+0avgdata  9888maxresident)k 0inputs+0outputs (0major+ 678minor)pagefaults 0swaps
./ppmtoy4m_quickndirty 1024 576  < benchmark.ppm > /dev/null
0.90user 0.28system 0:01.17elapsed 100%CPU (0avgtext+0avgdata 16240maxresident)k 0inputs+0outputs (0major+1058minor)pagefaults 0swaps
0.91user 0.25system 0:01.17elapsed  98%CPU (0avgtext+0avgdata 16240maxresident)k 0inputs+0outputs (0major+1059minor)pagefaults 0swaps
0.90user 0.27system 0:01.17elapsed 100%CPU (0avgtext+0avgdata 16240maxresident)k 0inputs+0outputs (0major+1058minor)pagefaults 0swaps

In summary, being a little bit dirty gains a 350% speed boost. Woohoo! The visual output seems the same too, but I should double check that the video bits are identical.