Haskell Bindings to C from Start to Finish

Out of curiosity i wanted to learn how to put together bindings from C to Haskell. The primary need for this on my radar is enabling X composition directly in Haskell, to enable 3D effects in XMonad. Haskell is a great language for doing graphics work, so there is definitely good sense in providing bindings for such things.

However, working with X can be a bit of a large project as the build systems and workflows for bindings are already relatively complex. I decided to familiarize myself with C2HS, which seems like the future of Haskell bindings, based on the brief bit of research i did. Another important set of bindings we may need in the future are good bindings to the RPM library. Problem here is that there is no good documentation on testing bindings with C2HS from start to finish. What follows is roughly a guide to the results i got, minus the swearing, crashing out in a nervous caffinated wreck for a few hours in between, and the tasty "Hollandse Nieuw" herring i had when i woke up.

(This year's herring is really tasty, if you get the chance to visit Holland, try one or two of them.)

The following is just bare metal work, in order to test the basic functionality of your bindings. There will hopefully be posts following this one on how to use autotools and cabal in order to build packages. (As soon as i figure it out myself.)

C2HS's workflow can be summed up as this. Write code in the C2HS format which is passed through C2HS as a preprocessor, which yields Haskell code. You run this code through ghc with extra command line flags to link the right libraries, and it yields a nice Haskell Library that you can link to later. Then just import your code in your program like normal, and you're good to go.

As a sane demo, i am starting with the sample code from the RPM documentation. This can be found here under Listing 16-1

First off, make your project directory. For me this is /rpm/. Since i want the library to be RedHat.Rpm.RpmLib i also need a directory /rpm/RedHat/Rpm/. (I'm debating if i want to keep the RedHat prefix.) Then create a file /rpm/RedHat/Rpm/RpmLib.chs, and immediately begin it with:

{-# LANGUAGE ForeignFunctionInterface #-}
{-# LANGUAGE TypeSynonymInstances #-}

#include <rpmlib.h>

{# context lib="rpmlib" #}

module RedHat.Rpm.RpmLib (
) where

import C2HS
import Foreign.Ptr

rpmReadConfigFiles is the only function we need so far from RPM. The first two lines enable extensions to GHC. The first is for doing Foreign functions, the second will be used later. Then include a standard CPP include statement. Finally we declare the module and the tokens it exports and import two modules we need from Haskell. Some of these details can be put off into the build system later, but since we are working with the tools directly, we can just put them in our code.

The function we want to bind is defined in /usr/include/rpm/rpmlib.h as such:

/** \ingroup rpmrc
* Read macro configuration file(s) for a target.
* @param file colon separated files to read (NULL uses default)
* @param target target platform (NULL uses default)
* @return 0 on success, -1 on error
int rpmReadConfigFiles(const char * file,
const char * target);

This presents a bit of a unique problem. Normally a C style string presents two possibilities we need to account for. Either it contains characters or it's an empty string. Haskell can handle this equally as well with the String type. However, here we have a third possibility, the two in parameters can be null pointers, which is not exactly the same as an empty string. Internally we can handle all three cases in Haskell as a Ptr CChar which lines up exactly to the C function. However, at the outer levels, we really need to create a function that can accept either a null pointer or a String. In order to handle this, we need a new class of RString, such that:

class RString t where
withRString :: t -> (Ptr CChar -> IO a) -> IO a

instance RString String where
withRString s m = withCString s m

instance RString CString where
withRString cs m = m cs

One of the gotchas of programming C level code in Haskell is that all C level code, namely pointers, need to run inside the IO Monad. There simply isn't a (safe) way to convert something like a String to a Ptr CChar outside of IO. C2HS makes use of a withT function pattern to marshall pure Haskell data to pointers to objects in C.

In our case, if we get a String, whether containing data or empty, we need to marshal it into a CString. If we get a Ptr though, we can assume it's already been marshalled. Chances are, there are cases that can break this, but for our simple example, there's relatively little harm we can do. In any case, withCString first marshals the String into a CString and then does pretty much what the Ptr version of our code does.

With all this squared away, we can define our function pretty much as per the documentation for C2HS.

{#fun unsafe rpmReadConfigFiles
`(RString s)' =>
{withRString* `s' ,
withRString* `s' } -> `Int'#}

This creates a new function that cannot call back into Haskell code, accepts anything of RString class, and marshals it with our code. This returns an IO Int. This is it for our binding.

The fun part is compiling it. The first step is to run c2hs on the .chs file. We need to include parameters to the C Preprocessor that tells c2hs where to find the C header files. We also use -l to copy C2HS.hs into the same directory, as it is needed by ghc later. The next step is to run GHC on the resulting .hs file.

/rpm/RedHat/Rpm/ $ c2hs --cppopts='-I/usr/include/rpm/' -l RpmLib.chs
ghc --make RpmLib.hs

Now that this is done, the next step is to build an executable that uses this binding in order to test it. This file, rpm1.hs goes in /rpm/

module Main

import RedHat.Rpm.RpmLib
import Foreign
import Foreign.C.String

nullP :: CString
nullP = nullPtr

main :: IO ()
main = do status <- rpmReadConfigFiles nullP nullP
putStrLn (show status)

Since we defined our class only for Ptr CChar (aka CString) and not for the general Ptr a, we use a helper to type cast nullPtr. There's probably a more idiomatic way to do this, but all we need is a kludge. The rest pulls the integer result from the function and prints it on the screen. To compile this, we use the following ghc magic. It includes another call to link in the rpm library.

/rpm/ $ ghc -lrpm --make -debug rpm1.hs
/rpm/ $ ./rpm1

That is from start to finish how to write your own C bindings in Haskell. Hopefully i'll figure out how to get this to work via cabal, so i don't need to run so many commands to run tests.

5 flames:

Richard Jones zei

I could really do with some help on the libguestfs Haskell bindings.

These are (quite seriously) in need of some love, and they are the only bindings which are incomplete. We have complete Perl, Python, OCaml, Java and Ruby bindings, but not Haskell!

Antoine zei

You can also use the Maybe type to represent a string where null and not-present are distinct.

Then your function signature could be:

readConfig :: Maybe String -> Maybe String -> IO Bool

Yankee zei

It's always a tough decision to decide if you follow the C API closer or the idioms of the local language. Well, i'm just starting with these bindings, so we'll see.

Martin DeMello zei

Yankee: From what I've seen, the best way seems to be to make one binding that follows the C api slavishly, and then build a more idiomatic library as a higher level layer on top of that binding

Unknown zei

This is very nice, and it would have been extremely useful when I was writing some bindings with C2HS a year ago.

Where you write, "This creates a new function that cannot call back into Haskell code", I think it would be better to say that the bound function must not call back into Haskell code. There's nothing to stop it from doing so, but your program will most likely crash.

To me, this was one of the most unclear parts of the C2HS documentation.