Fixing replicateM's space leak
In my previous post I had a problem with some simple Haskell code that exploded in memory. This morning I worked out why, and how to fix it, using the technique of difference lists. The problematic code was:
import Control.Monad (replicateM) shapes bias p = [ s | m <- [0 .. p] , s <- replicateM (fromInteger m + 2) [1 .. p] , p == sum (zipWith (*) (map (bias +) s) (tail s)) ] main = mapM_ (print . length . shapes 0) [1..10]
To see how memory explodes, here's my session:
$ ghc -O2 replicateM.hs -rtsopts ... $ timeout 60 ./replicateM +RTS -h 1 3 5 10 14 27 37 $ hp2pretty replicateM.hp
and here is the SVG graph output by hp2pretty:
The memory is going up dramatically, and while the program ran for a whole minute wall-clock, subtracting the garbage collection time gives less than 17 seconds of useful work.
I had a hunch that the problem is replicateM
(the rest
of the code is so simple that it almost obviously can't be anything
else). I'm using for lists, where it does Cartesian product (output all
possible combinations of one item from each of cnt0 copies of a list f).
Let's look at the definition in the source code:
replicateM cnt0 f = loop cnt0 where loop cnt | cnt <= 0 = pure [] | otherwise = liftA2 (:) f (loop (cnt - 1))-- base-4.16.1.0:Control.Monad.replicateM
To intuitively understand liftA2, one can desugar it to do-notation,
which changes the type unless
{-# LANGUAGE ApplicativeDo #-}
is enabled:
replicateM cnt0 f = loop cnt0 where loop cnt | cnt <= 0 = pure [] | otherwise = do x <- f xs <- loop (cnt - 1) pure (x : xs)
but to see where the leak is coming from we can evaluate the critical
part of the original version, specifically for the [String]
type:
> result = words "hello world how are you" > mapM_ putStrLn $ liftA2 (:) "¿¡" result ¿hello ¿world ¿how ¿are ¿you ¡hello ¡world ¡how ¡are ¡you
Immediately one can see that it needs all of result
in
order to print it once with a prefix, but then it needs it all again to
print it with the second prefix. So because loop (cnt - 1)
is passed as an argument to liftA2
, its value will be
shared in the same way as the result
list of strings was.
And sharing in this way forces the value to be kept in memory, and can
only start to be freed when the last prefix has started being emitted.
The problem in replicateM
is compounded, because the same
happens in each recursive call to loop
, although the
retained values get smaller deeper down so its not such a huge deal.
The problem is at the top of the recursion, where it has to store a list
of length ((length f)^(n-1)), which can get pretty large.
So that's the issue, how to fix it? Difference lists traditionally
work by turning appends like ((x ++ y) ++ z) ++ w
, which is
expensive because (++) cost is the length of the left hand side, and left
nesting means the left hand side gets longer and longer, into something
like
(((prepend x . prepend y) . prepend z) . prepend w) empty
which is much more efficient because the cost of function composition
(.) is constant and prepend costs the length of its argument. In this
example the cost of the first is X + X + Y + X + Y + Z = 3 X + 2 Y + Z,
while the second is X + Y + Z + W (and the + W could be avoided by using
it at the end instead of prepending w to empty).
Here's what I came up with, I'm not 100% sure why it works better, and I'm not sure if it is even correct for anything apart from m = [], and my first attempt output items reversed, and maybe it has similarities to foldl' vs foldr in terms of accumulator as well as using difference lists, but anyway:
{-# LANGUAGE ApplicativeDo #-} replicateM' :: Applicative m => Int -> m a -> m [a] replicateM' n ls = go n (pure id) where go n fs | n <= 0 = do f <- fs pure (f []) | otherwise = go (n - 1) gs where gs = do f <- fs l <- ls pure (f . (l:))
Here's the heap profile graph for the same program, just using this replicateM' to replace replicateM:
What a difference! Tiny constant memory, as it should be, and almost none of the time is spent garbage collecting.