Building a shared library in Cabal

This is the second in a series about refactoring parts of wxHaskell so that it will work as expected in GHCi. There’s nothing very specific to wxHaskell here – it’s much more about implementing a custom build system in Cabal, and how to use the Cabal API.

The work described is still somewhat in progress, so there are probably a few wrinkles in the cross-platform support, but it is basically working on my development machine, so it’s slightly more than half-baked…

What we are going to do is to add a build system capable of building a shared library to the Distribution.Simple. This is not a particularly clever build system – it doesn’t do any dependency tracking, for example, which makes it somewhat painful for developers. However, the main use case for Cabal is that it is an easy way for library users to install software on their machines, and in this case dependency tracking is not really an issue as the code is only built once.

If you want real dependency tracking, use Make (or better, something like Scons – if I had a pound for every Make-based build system I’ve worked with which doesn’t quite track dependencies correctly, I’d have enough to buy a paper copy of Learn You a Haskell for Great Good).

Extending the cabal build description

In the hope of creating something reasonably re-usable (if there is enough interest, I’m happy to work with the Cabal team to get something like this into Cabal by default), I’m going to add some custom stanzas. These use the “x-something” support already built into Cabal:

x-dll-sources: a whitespace separated list of all of the C or C++ source files which need to be compiled into a shared library.

x-dll-name: the basename of the shared library we are going to create. This will expand to basename.dll on Windows, libbasename.so on most Unix systems and basename.dylib on OS X.

x-dll-extra-libraries: a list of the libraries with which the shared library must be linked at runtime. This is present to stop the library user from having to worry about this. Restrictions on some platforms mean that any libraries used here really ought to be available in a shared form on all supported platforms.

Updating the list of libraries to link

When installing a Cabal package, Cabal informs GHC of all of the libraries which need to be linked for the package to run. In this implementation, this means the shared library we are building and any libraries in x-dll-extra-libraries. These need to be added to the list of libraries which Cabal passes to ghc-pkg when the package is installed, and a good place to extract this information is in the configuration hook.

The configuration hook runs when you execute runhaskell Setup.hs configure (or as the first stage of a cabal install). You need something like the following in your Setup.hs:

main :: IO ()
main = defaultMainWithHooks simpleUserHooks { confHook  = myConfHook,
                                              buildHook = myBuildHook }

The configure hook is, rather unadventurously, named myConfHook and the build hook, which we will discuss at length later, and which is responsible for building the shared library, is myBuildHook.

Much of the effort in working with both build and configuration hooks is in finding where the information you need is stored and unwrapping and rewrapping the data structures (doubtless someone with greater Haskell-fu than me would do this in a shorter and infinitely more elegant way, but this works, so… whatever). The Cabal documentation is definitely your friend, and in this case, we start from UserHooks, which is the container for all of the hooks available in Cabal.

We are interested in creating a confHook function, and the documentation says that this has type confHook :: (GenericPackageDescription, HookedBuildInfo) -> ConfigFlags -> IO LocalBuildInfo. It’s also worth noting, to understand what follows, that simpleUserHooks simply gives us the default set of hooks envisaged by the Cabal designers.

myConfHook (pkg0, pbi) flags = do
 lbi <- confHook simpleUserHooks (pkg0, pbi) flags
 let lpd        = localPkgDescr lbi
 let lib        = fromJust (library lpd)
 let libbi      = libBuildInfo lib
 let custom_bi  = customFieldsBI libbi
 -- Lookup DLLs to add to our extra-libs from x-dll-name and x-dll-extra-libs
 let all_dlls   = parseDLLs ["x-dll-name", "x-dll-extra-libraries"] custom_bi
 let libbi' = libbi
 { extraLibDirs = extraLibDirs libbi ++ extraLibDirs wx
 , extraLibs    = extraLibs    libbi ++ extraLibs all_dlls ++ extraLibs wx
 , ldOptions    = ldOptions    libbi ++ ldOptions    wx
 , frameworks   = frameworks   libbi ++ frameworks   wx
 , includeDirs  = includeDirs  libbi ++ includeDirs  wx
 , ccOptions = ccOptions libbi ++ ccOptions wx ++ ["-DwxcREFUSE_MEDIACTRL"]
 }
 let lib' = lib { libBuildInfo = libbi' }
 let lpd' = lpd { library = Just lib' }
 return $ lbi { localPkgDescr = lpd' }

The key lines are highlighted in red. The customFieldsBI libbi stanza returns an association list [(key :: String, value :: String)] containing all of the entries in the cabal file which start “x-“. The parseDLLs function extracts the contents of all of the stanzas whose key string matches an entry in the match list – “x-dll-name” and “x-dll-extra-libraries” in this case – and wraps them in a BuildInfo structure. We add the BuildInfo thus extracted to the extraLibs stanza.

parseDLLs :: [String] -> [(String, String)] -> BuildInfo
parseDLLs x_stanzas bi = buildBI emptyBuildInfo dlls
 where
 dlls = concat $ map (\e -> (lines . fromJust) (lookup e bi)) x_stanzas
 buildBI bi (w:ws) = buildBI (bi { extraLibs = w : extraLibs bi }) ws
 buildBI bi []     = bi

The operation of parseDLLs is straightforward: We simply update the extraLibs field of an empty BuildInfo with a complete list of the libraries provided in the selected x_stanzas. It is worth noting that it is legal to have multiple libraries in a stanza, provided that they are separated by newlines (hence the use of lines when we are constructing the list of DLLs).

After executing myConfHook, we have all of the libraries we need to link with our Haskell package (note: not the libraries we link with our DLL!), and these will be added to the list of libraries included when the package is used.

Compiling C or C++ code from within Cabal

This requires a user hook to be executed during the Cabal build phase (i.e. when you type cabal build at the command line). As I mentioned earlier, the build hook we will be executing is myBuildHook. The example shown is from an early (but working) draft. I’ll discuss why this is not quite sufficient in a later posting, but it shows the principle.

myBuildHook pkg_descr local_bld_info user_hooks bld_flags =
 do
 -- Extract custom fields customFieldsPD where field name x-cpp-dll-sources
 let lib   = fromJust (library pkg_descr)
 lib_bi    = libBuildInfo lib
 custom_bi = customFieldsBI lib_bi
 dll_name  = fromJust (lookup "x-dll-name" custom_bi)
 dll_srcs  = (lines . fromJust) (lookup "x-dll-sources" custom_bi)
 dll_libs  = (lines . fromJust) (lookup "x-dll-extra-libraries" custom_bi)
 cc_opts   = ccOptions lib_bi
 ld_opts   = ldOptions lib_bi
 inc_dirs  = includeDirs lib_bi
 lib_dirs  = extraLibDirs lib_bi
 libs      = extraLibs lib_bi
 bld_dir   = buildDir local_bld_info
 progs     = withPrograms local_bld_info
 gcc       = fromJust (lookupProgram (simpleProgram "gcc") progs)
 ver       = (pkgVersion . package) pkg_descr
 -- Compile C/C++ sources - output directory is dist/build/src/cpp
 putStrLn "Building wxc"
 objs <- mapM (compileCxx gcc cc_opts inc_dirs bld_dir) dll_srcs
 -- Link C/C++ sources as a DLL - output directory is dist/build
 putStrLn "Linking wxc"
 linkSharedLib gcc ld_opts lib_dirs (libs ++ dll_libs) objs ver bld_dir dll_name
 -- Remove C/C++ source code from the hooked build (don't change libs)
 putStrLn "Invoke default build hook"
 buildHook simpleUserHooks pkg_descr local_bld_info user_hooks bld_flags

As with the configuration hook, much of the code is simply extracting the required information from the various structures passed into the build hook. Most of the information is pretty standard for compiling any C or C++ shared library: a set of source files, the paths to search for include files, libraries to link and so on. Cabal already knows where to find compilers, linkers, archivers and the like (on Windows, GHC includes a copy of the MinGW development system, which contains a port of gcc and GNU binutils), and you can look up the full path to any of these tools using lookupProgram.

The most important lines in myBuildHook invoke the compiler and link the shared library, respectively, and are marked in red.

compileCxx :: ConfiguredProgram  -- ^ C/C++ compiler (gcc)
           -> [String]           -- ^ Compile options from Cabal and wxConfig
           -> [String]           -- ^ Include paths from Cabal and wxConfig
           -> FilePath           -- ^ Base output directory
           -> FilePath           -- ^ Path to source file
           -> IO FilePath        -- ^ Path to generated object code
compileCxx gcc opts incls out_path cxx_src =
 do
 let includes  = map ("-I" ++) incls
 out_path' = normalisePath out_path
 cxx_src'  = normalisePath cxx_src
 out_file  = out_path' </> dropFileName cxx_src </>
             replaceExtension (takeFileName cxx_src) ".o"
 out       = ["-c", cxx_src', "-o", out_file]
 opts'     = opts ++ osCompileOpts
 do_it <- True -- needsCompiling cxx_src out_file
 when do_it $ createDirectoryIfMissing True (dropFileName out_file) >> 
              runProgram verbose gcc (includes ++ opts' ++ out)
 return out_file

The CompileCxx function compiles a single C or C++ source file to object code. In the version shown there is no dependency management whatsoever, which means that every file is compiled every time cabal build is invoked. This is very inefficient for development, since there are quite a number of files to compile, and most are recompiled needlessly. A proper C++ build system would check whether the modification time of the source file or any of the files it includes is newer than the modification time of the corresponding object file. This is a lot of work to get right (most large scale C++ build systems I have encountered don’t quite get it correct!), and I have taken the judgement that since most Cabal users only compile the library the one time they install it, it is an unnecessary effort. That said, if you look in the code above, you will notice a commented out call to a function needsCompiling which checks only the modification time of the source file to compile. This is fragile (it will do the wrong thing if an include file has been changed, but not the source), but it speeds things up for development. It will not (and should not) go into released code.

I’d also like to mention the normalizePath function. This is present to fix some behaviour in System.FilePath which turns out to be, shall we say, infelicitous. The problem is an age-old one. Way back in the days of MS-DOS 1.0, which didn’t support directories, Microsoft chose to use the forward slash ‘/’ character for command line switches, which meant that it was unavailable for use as a path separator when, in about 1983, MS-DOS 2.0 came along. At that time, Microsoft chose the backslash ‘\’ character as a file separator, and software engineers across the world have been dealing with the fact that Unix chose the forward slash (some 10 years earlier) ever since.

The infelicitous behaviour comes about because of the way that MinGW (and Cygwin) try to get around this irritation. Because software originally written for Unix often hard-codes the forward slash as the directory separator, both MinGW and Cygwin treat the ‘/’ character as though it is a ‘\’ on Windows. This is OK as far as it goes, but it is fragile. In particular, many Unix derived tools generate paths with the ‘/’ separator even on Windows. When these are concatenated with paths produced using System.FilePath (or by Windows executables), you end up with paths containing both sets of separators (e.g. c:/path/to\some\windows/location.c). It turns out that some programs are not robust in the presence of both types of separators, so the only solution is to normalize all paths so that they contain only the correct path separator.

normalisePath :: FilePath -> FilePath
normalisePath = case buildOS of
 Windows -> dosifyFilePath
 _       -> unixifyFilePath

-- | Replace a character in a String with some other character
replace :: Char   -- ^ Character to replace
        -> Char   -- ^ Character with which to replace
        -> String -- ^ String in which to replace
        -> String -- ^ Transformed string
replace old new = map replace'
 where
 replace' elem = if elem == old then new else elem

unixifyFilePath = replace '\\' '/'
dosifyFilePath  = replace '/' '\\'

The code is shown above. Nothing to be especially proud of (in particular, I’m sure there must be something which does the same as replace in Data.List, but I couldn’t find it in about 30 seconds of looking!), but it works. Personally I believe that it would be better if System.FilePath did such normalization, but it’s probably a religious argument, and I’m not trying to flame anyone.

I think that’s about enough for this posting. I’ll be continuing shortly with a look at linking the shared library, and a digression into C++ static destructors. Apologies to those who value the beauty of Haskell and are offended by the ugliness of static destructors, but lifting of stones is sometimes necessary, even if what we find under them is not particularly palatable. Linking libraries isn’t much more attractive either 😉

2 thoughts on “Building a shared library in Cabal”

  1. See: http://hackage.haskell.org/packages/archive/filepath/1.1.0.2/doc/html/System-FilePath-Posix.html#v:normalise – FilePath already has the function you want. There are certainly religious arguments over whether you always/never/sometimes normalise, but having an explicit function let’s everyone get the exact behaviour they want.

    In addition, it’s important to point out that \ and / are both legitimate Windows FilePath separators, and have nothing to do with Mingw/Cygwin – programs which don’t support both are buggy (but unfortunately there are plenty of buggy programs out there!).

    And there is no replace in List, but you could have Hoogle’d for it with http://haskell.org/hoogle/?hoogle=Eq+a+%3D%3E+a+-%3E+a+-%3E+%5Ba%5D+-%3E+%5Ba%5D – which shows that it is in the cgi library… – I’d love this to also be in list.

    1. Thanks Neil. I knew the function(s) I wanted would be somewhere (although I’d never have guessed at replace being in CGI…). Apologies for missing normalize though – I should have spotted it.

Leave a comment