This post is going to be a bit heavier on the theory. If you're not familiar with monads in Haskell, I recommend Learn You a Haskell. If you haven't used the continuation monad, I recommend this article from A Neighborhood of Infinity. That's about all you should need for this post. We'll mention LLVM a bit for the sake of example, but no in-depth knowledge is involved there.
Today, I found myself writing this chunk of Haskell code for a bit of
llvm-hs
interaction.
handleAndOutput :: FilePath -> FilePath -> IO ()
handleAndOutput inp outp =
handleFile inp >>= \case
Nothing -> pure ()
Just res -> withContext $ \ctx -> do
withHostTargetMachine $ \tm -> do
withModuleFromAST ctx res $ \res' -> do
writeObjectToFile tm (File outp) res'
The important part for our discussion is the bottom four
lines. I'm taking a complicated data structure from llvm-hs-pure
and
compiling it down to an object file. In the process, I need to
borrow several "non-pure" data structures, in the sense that
these are data structures controlled by a C++ library and therefore are not managed
by the Haskell runtime. We see this sort of pattern a lot in
Haskell when interacting with foreign code. The relevant type
signatures are
withContext :: (Context -> IO a) -> IO a
withHostTargetMachine :: (TargetMachine -> IO a) -> IO a
withModuleFromAST :: Context -> Module -> (Module -> IO a) -> IO a
So, once we've applied the first couple of arguments to withModuleFromAST
the pattern seems to be
withSomething :: (thing -> IO a) -> IO a
Each of these has the same behavior. They open or allocate
some resource, run the function argument, then close the
resource. Like I said, we see this pattern a lot when
interfacing with things outside the Haskell runtime. System.IO
provides
withFile
, which opens a file,
runs some code, and closes the file at the end. Foreign.C.String
provides
withCString
, which marshalls a
Haskell string into a C string and frees the memory at the
end.
This pattern isn't even unique to Haskell. Python's with
statements
follow the same pattern: run some entry code, do some stuff,
then run some exit code. But both of these constructs suffer
from a similar problem, which my first code snippet exhibits.
If I need to borrow several resources in succession (a task
not uncommon when interacting with foreign code), then I'm
going to naturally end up nested several layers deep.
Now, C++ and Rust actually have a solution to this nesting problem. In C++ and Rust, we can simply make local variables which allocate resources and trust that deallocation will happen on time as soon as the local goes out of scope. This pattern is called Resource Acquisition is Initialization, or RAII for short. The C++ code equivalent to my Haskell code from before might look something like this.
// This is example code; this is NOT compatible with the LLVM C++ library
void handle_and_output(string in, string out) {
auto res = handle_file(in);
Context ctx {};
TargetMachine tm { get_host_target_machine() };
Module m = module_from_ast(ctx, res);
write_object_to_file(tm, out, m);
}
Compare that to what we started with in Haskell.
handleAndOutput :: FilePath -> FilePath -> IO ()
handleAndOutput inp outp =
handleFile inp >>= \case
Nothing -> pure ()
Just res -> withContext $ \ctx -> do
withHostTargetMachine $ \tm -> do
withModuleFromAST ctx res $ \res' -> do
writeObjectToFile tm (File outp) res'
Note how, in the C++ example, I simply allocate the resources and trust that they'll go out of scope at the end. There's no increase in nesting, and we're not creeping over to the right side of the screen. It sure would be nice if we could do this in Haskell.
-- Not going to compile yet, obviously.
handleAndOutputCont :: FilePath -> FilePath -> ContT r IO ()
handleAndOutputCont inp outp =
handleFile inp >>= \case
Nothing -> pure ()
Just res -> block $ do
ctx <- withContext
tm <- withHostTargetMachine
res' <- withModuleFromAST ctx res
liftIO $ writeObjectToFile tm (File outp) res'
In our totally hypothetical example here, block
is
a function that "protects" the outer scope and frees any
resources allocated within it. We protect the writeObjectToFile
with
a liftIO
because, more than likely, we'll be
writing our own home-baked monad to do all this.
But... will we? Is there a monad that already does what we want to do? If you've read the title of this post, you already know the answer, but let's pretend you didn't and take another look at that type signature.
withSomething :: (thing -> IO a) -> IO a
Aha! That looks like the continuation monad.
data ContT r m a = ContT { runContT :: (a -> m r) -> m r }
In fact, writing such a wrapper is almost laughably simple.
withSomething :: (thing -> IO a) -> IO a
withSomething = -- ... Some black magic
withSomethingCont :: ContT a IO thing
withSomethingCont = ContT withSomething
Now, we need that block
function from before.
That, too, is surprisingly simple. When we want to free all of
our resources, we just... run the continuation and unpack that
layer of the monad stack.
block :: Monad m => ContT r m r -> m r
block f = runContT f pure
Of course, you could just use runContT
directly,
but if you're already returning the thing you want to be
returning, then it's simpler and arguably more readable to get
rid of the runContT
and replace it with a shorter
name (in our case, since we're returning ()
the
point is moot anyway).
And that's... about all there is to it. Now the code I showed
above where we borrow all of the resources in a monad and
release them at the end of the block works as intended. As
long as we're inside a block
, we can be assured
that the resource will be freed at the end. You can also nest block
's
(though you'll end up with multiple continuations on your
monad stack, which may be annoying if for some reason you have
to write an explicit type signature for something inside the
block), and since each block
is it's own
continuation mode, continuation tricks like callCC
won't
be allowed to escape the block, so you can't bypass the
deallocation.
So all of those withSomething
functions you see
in Haskell are really just values in the continuation monad.
If you ever need to use several in rapid succession, consider
operating inside the ContT
monad and running the
continuation at the end to free all the resources.