Archive

Archive for November, 2010

Building a shared library in Cabal

November 3, 2010 2 comments

This is the second in a series about refactoring parts of wxHaskell so that it will work as expected in GHCi. There’s nothing very specific to wxHaskell here – it’s much more about implementing a custom build system in Cabal, and how to use the Cabal API.

The work described is still somewhat in progress, so there are probably a few wrinkles in the cross-platform support, but it is basically working on my development machine, so it’s slightly more than half-baked…

What we are going to do is to add a build system capable of building a shared library to the Distribution.Simple. This is not a particularly clever build system – it doesn’t do any dependency tracking, for example, which makes it somewhat painful for developers. However, the main use case for Cabal is that it is an easy way for library users to install software on their machines, and in this case dependency tracking is not really an issue as the code is only built once.

If you want real dependency tracking, use Make (or better, something like Scons – if I had a pound for every Make-based build system I’ve worked with which doesn’t quite track dependencies correctly, I’d have enough to buy a paper copy of Learn You a Haskell for Great Good).

Extending the cabal build description

In the hope of creating something reasonably re-usable (if there is enough interest, I’m happy to work with the Cabal team to get something like this into Cabal by default), I’m going to add some custom stanzas. These use the “x-something” support already built into Cabal:

x-dll-sources: a whitespace separated list of all of the C or C++ source files which need to be compiled into a shared library.

x-dll-name: the basename of the shared library we are going to create. This will expand to basename.dll on Windows, libbasename.so on most Unix systems and basename.dylib on OS X.

x-dll-extra-libraries: a list of the libraries with which the shared library must be linked at runtime. This is present to stop the library user from having to worry about this. Restrictions on some platforms mean that any libraries used here really ought to be available in a shared form on all supported platforms.

Updating the list of libraries to link

When installing a Cabal package, Cabal informs GHC of all of the libraries which need to be linked for the package to run. In this implementation, this means the shared library we are building and any libraries in x-dll-extra-libraries. These need to be added to the list of libraries which Cabal passes to ghc-pkg when the package is installed, and a good place to extract this information is in the configuration hook.

The configuration hook runs when you execute runhaskell Setup.hs configure (or as the first stage of a cabal install). You need something like the following in your Setup.hs:

main :: IO ()
main = defaultMainWithHooks simpleUserHooks { confHook  = myConfHook,
                                              buildHook = myBuildHook }

The configure hook is, rather unadventurously, named myConfHook and the build hook, which we will discuss at length later, and which is responsible for building the shared library, is myBuildHook.

Much of the effort in working with both build and configuration hooks is in finding where the information you need is stored and unwrapping and rewrapping the data structures (doubtless someone with greater Haskell-fu than me would do this in a shorter and infinitely more elegant way, but this works, so… whatever). The Cabal documentation is definitely your friend, and in this case, we start from UserHooks, which is the container for all of the hooks available in Cabal.

We are interested in creating a confHook function, and the documentation says that this has type confHook :: (GenericPackageDescription, HookedBuildInfo) -> ConfigFlags -> IO LocalBuildInfo. It’s also worth noting, to understand what follows, that simpleUserHooks simply gives us the default set of hooks envisaged by the Cabal designers.

myConfHook (pkg0, pbi) flags = do
 lbi <- confHook simpleUserHooks (pkg0, pbi) flags
 let lpd        = localPkgDescr lbi
 let lib        = fromJust (library lpd)
 let libbi      = libBuildInfo lib
 let custom_bi  = customFieldsBI libbi
 -- Lookup DLLs to add to our extra-libs from x-dll-name and x-dll-extra-libs
 let all_dlls   = parseDLLs ["x-dll-name", "x-dll-extra-libraries"] custom_bi
 let libbi' = libbi
 { extraLibDirs = extraLibDirs libbi ++ extraLibDirs wx
 , extraLibs    = extraLibs    libbi ++ extraLibs all_dlls ++ extraLibs wx
 , ldOptions    = ldOptions    libbi ++ ldOptions    wx
 , frameworks   = frameworks   libbi ++ frameworks   wx
 , includeDirs  = includeDirs  libbi ++ includeDirs  wx
 , ccOptions = ccOptions libbi ++ ccOptions wx ++ ["-DwxcREFUSE_MEDIACTRL"]
 }
 let lib' = lib { libBuildInfo = libbi' }
 let lpd' = lpd { library = Just lib' }
 return $ lbi { localPkgDescr = lpd' }

The key lines are highlighted in red. The customFieldsBI libbi stanza returns an association list [(key :: String, value :: String)] containing all of the entries in the cabal file which start “x-“. The parseDLLs function extracts the contents of all of the stanzas whose key string matches an entry in the match list – “x-dll-name” and “x-dll-extra-libraries” in this case – and wraps them in a BuildInfo structure. We add the BuildInfo thus extracted to the extraLibs stanza.

parseDLLs :: [String] -> [(String, String)] -> BuildInfo
parseDLLs x_stanzas bi = buildBI emptyBuildInfo dlls
 where
 dlls = concat $ map (\e -> (lines . fromJust) (lookup e bi)) x_stanzas
 buildBI bi (w:ws) = buildBI (bi { extraLibs = w : extraLibs bi }) ws
 buildBI bi []     = bi

The operation of parseDLLs is straightforward: We simply update the extraLibs field of an empty BuildInfo with a complete list of the libraries provided in the selected x_stanzas. It is worth noting that it is legal to have multiple libraries in a stanza, provided that they are separated by newlines (hence the use of lines when we are constructing the list of DLLs).

After executing myConfHook, we have all of the libraries we need to link with our Haskell package (note: not the libraries we link with our DLL!), and these will be added to the list of libraries included when the package is used.

Compiling C or C++ code from within Cabal

This requires a user hook to be executed during the Cabal build phase (i.e. when you type cabal build at the command line). As I mentioned earlier, the build hook we will be executing is myBuildHook. The example shown is from an early (but working) draft. I’ll discuss why this is not quite sufficient in a later posting, but it shows the principle.

myBuildHook pkg_descr local_bld_info user_hooks bld_flags =
 do
 -- Extract custom fields customFieldsPD where field name x-cpp-dll-sources
 let lib   = fromJust (library pkg_descr)
 lib_bi    = libBuildInfo lib
 custom_bi = customFieldsBI lib_bi
 dll_name  = fromJust (lookup "x-dll-name" custom_bi)
 dll_srcs  = (lines . fromJust) (lookup "x-dll-sources" custom_bi)
 dll_libs  = (lines . fromJust) (lookup "x-dll-extra-libraries" custom_bi)
 cc_opts   = ccOptions lib_bi
 ld_opts   = ldOptions lib_bi
 inc_dirs  = includeDirs lib_bi
 lib_dirs  = extraLibDirs lib_bi
 libs      = extraLibs lib_bi
 bld_dir   = buildDir local_bld_info
 progs     = withPrograms local_bld_info
 gcc       = fromJust (lookupProgram (simpleProgram "gcc") progs)
 ver       = (pkgVersion . package) pkg_descr
 -- Compile C/C++ sources - output directory is dist/build/src/cpp
 putStrLn "Building wxc"
 objs <- mapM (compileCxx gcc cc_opts inc_dirs bld_dir) dll_srcs
 -- Link C/C++ sources as a DLL - output directory is dist/build
 putStrLn "Linking wxc"
 linkSharedLib gcc ld_opts lib_dirs (libs ++ dll_libs) objs ver bld_dir dll_name
 -- Remove C/C++ source code from the hooked build (don't change libs)
 putStrLn "Invoke default build hook"
 buildHook simpleUserHooks pkg_descr local_bld_info user_hooks bld_flags

As with the configuration hook, much of the code is simply extracting the required information from the various structures passed into the build hook. Most of the information is pretty standard for compiling any C or C++ shared library: a set of source files, the paths to search for include files, libraries to link and so on. Cabal already knows where to find compilers, linkers, archivers and the like (on Windows, GHC includes a copy of the MinGW development system, which contains a port of gcc and GNU binutils), and you can look up the full path to any of these tools using lookupProgram.

The most important lines in myBuildHook invoke the compiler and link the shared library, respectively, and are marked in red.

compileCxx :: ConfiguredProgram  -- ^ C/C++ compiler (gcc)
           -> [String]           -- ^ Compile options from Cabal and wxConfig
           -> [String]           -- ^ Include paths from Cabal and wxConfig
           -> FilePath           -- ^ Base output directory
           -> FilePath           -- ^ Path to source file
           -> IO FilePath        -- ^ Path to generated object code
compileCxx gcc opts incls out_path cxx_src =
 do
 let includes  = map ("-I" ++) incls
 out_path' = normalisePath out_path
 cxx_src'  = normalisePath cxx_src
 out_file  = out_path' </> dropFileName cxx_src </>
             replaceExtension (takeFileName cxx_src) ".o"
 out       = ["-c", cxx_src', "-o", out_file]
 opts'     = opts ++ osCompileOpts
 do_it <- True -- needsCompiling cxx_src out_file
 when do_it $ createDirectoryIfMissing True (dropFileName out_file) >> 
              runProgram verbose gcc (includes ++ opts' ++ out)
 return out_file

The CompileCxx function compiles a single C or C++ source file to object code. In the version shown there is no dependency management whatsoever, which means that every file is compiled every time cabal build is invoked. This is very inefficient for development, since there are quite a number of files to compile, and most are recompiled needlessly. A proper C++ build system would check whether the modification time of the source file or any of the files it includes is newer than the modification time of the corresponding object file. This is a lot of work to get right (most large scale C++ build systems I have encountered don’t quite get it correct!), and I have taken the judgement that since most Cabal users only compile the library the one time they install it, it is an unnecessary effort. That said, if you look in the code above, you will notice a commented out call to a function needsCompiling which checks only the modification time of the source file to compile. This is fragile (it will do the wrong thing if an include file has been changed, but not the source), but it speeds things up for development. It will not (and should not) go into released code.

I’d also like to mention the normalizePath function. This is present to fix some behaviour in System.FilePath which turns out to be, shall we say, infelicitous. The problem is an age-old one. Way back in the days of MS-DOS 1.0, which didn’t support directories, Microsoft chose to use the forward slash ‘/’ character for command line switches, which meant that it was unavailable for use as a path separator when, in about 1983, MS-DOS 2.0 came along. At that time, Microsoft chose the backslash ‘\’ character as a file separator, and software engineers across the world have been dealing with the fact that Unix chose the forward slash (some 10 years earlier) ever since.

The infelicitous behaviour comes about because of the way that MinGW (and Cygwin) try to get around this irritation. Because software originally written for Unix often hard-codes the forward slash as the directory separator, both MinGW and Cygwin treat the ‘/’ character as though it is a ‘\’ on Windows. This is OK as far as it goes, but it is fragile. In particular, many Unix derived tools generate paths with the ‘/’ separator even on Windows. When these are concatenated with paths produced using System.FilePath (or by Windows executables), you end up with paths containing both sets of separators (e.g. c:/path/to\some\windows/location.c). It turns out that some programs are not robust in the presence of both types of separators, so the only solution is to normalize all paths so that they contain only the correct path separator.

normalisePath :: FilePath -> FilePath
normalisePath = case buildOS of
 Windows -> dosifyFilePath
 _       -> unixifyFilePath

-- | Replace a character in a String with some other character
replace :: Char   -- ^ Character to replace
        -> Char   -- ^ Character with which to replace
        -> String -- ^ String in which to replace
        -> String -- ^ Transformed string
replace old new = map replace'
 where
 replace' elem = if elem == old then new else elem

unixifyFilePath = replace '\\' '/'
dosifyFilePath  = replace '/' '\\'

The code is shown above. Nothing to be especially proud of (in particular, I’m sure there must be something which does the same as replace in Data.List, but I couldn’t find it in about 30 seconds of looking!), but it works. Personally I believe that it would be better if System.FilePath did such normalization, but it’s probably a religious argument, and I’m not trying to flame anyone.

I think that’s about enough for this posting. I’ll be continuing shortly with a look at linking the shared library, and a digression into C++ static destructors. Apologies to those who value the beauty of Haskell and are offended by the ugliness of static destructors, but lifting of stones is sometimes necessary, even if what we find under them is not particularly palatable. Linking libraries isn’t much more attractive either ;-)

Categories: Haskell Tags: ,

Working around the static libstdc++ restriction

November 1, 2010 Leave a comment

I’ve been spending a bit of time, on and off, over the last few months thinking about one of the most annoying bugs in wxHaskell, and how to fix it: that wxHaskell is now unusable from within GHCi.

There are actually two separate bugs: the first, which prevents things from working at all, is that on Windows, there is no DLL version of libstdc++; the second is that you cannot restart a wxWidgets session in GHCi after the application has terminated.

The first bug is a consequence of the fact that, on Windows, GHC uses the MinGW compiler suite to do C/C++ compilation and all linking (including with Haskell objects). The version of MinGW included in GHC does not support dynamic linking of libstdc++.

When wxHaskell was fully cabalized, one of the changes was to replace a complex and fragile build system for the C++ coe in wxHaskell with a much simple cabal-based build. The consequence of this is that the C++ code and wxcore Haskell 0bject code reside in the same library, which means that wxcore needs to be linked against libstdc++.

For compiled binaries, this is not so much of a problem: the static libstdc++ is linked with the other code and all works well. However, in the GHCi case, it is fatal: GHCi doesn’t know how to statically link non-Haskell object code, although it can load DLLs. When you try to load load wxcore, GHCi complains that it cannot load stdc++ (Loading package wxcore-0.12.1.6 … <interactive>: stdc++: The specified module could not be found. can’t load .so/.DLL for stdc++ (addDLL: could not load DLL))

The second problem: inability to restart a GUI, is a consequence of wxWidgets using static destructors. These are only executed when the application fully terminates, which is not the case in a GHCi.

It turns out that a good solution to both of these problems is to repackage the C++ part of wxHaskell as a DLL. The reasons are:

  • The C++ code in the DLL needs to be linked with libstdc++, but this can be done at DLL link time. If libstdc++ is statically linked with the DLL, wxcore no longer needs to depend on linking to libstdc++ (only wxc.dll).
  • If the DLL is dynamically unloaded, the C++ static destructors will be executed (as far as I can tell, this should be true for Linux .so shared libraries as well as Windows DLLs). This means that so long as the wxc shared library is loaded at application start-up and unloaded at application termination, we should be able to correctly restart a GUI.
  • GHCi can load DLLs, so wxHaskell should become usable on GHCi again (since the libstdc++ requirement would have gone away).

The problem, and it has occupied quite a bit of my rather limited spare time, is that this requires quite a bit of re-architecture.

Changes to compile wxc as a DLL

This is probably the simplest part, since wxc was compiled as a DLL in an older version of the build system. The downside is that I would like to use cabal to do the building, since asking anyone to have prerequisites much beyond Haskell Platform greatly reduces the attractiveness of a library.

Changes to use dynamically loaded libraries.

Most usage of shared libraries is pseudo-static, that’s to say that the addresses of functions exported from the shared library are fixed-up when the application is loaded by the linker-loader. In my case, I want to do things fully dynamically. The outline code is fairly straightforward (caveat: this is outline code – I haven’t run it!).

On Windows, you need something like:

HINSTANCE dllHandle = LoadLibrary("wxc.dll");
...
if (dllHandle != NULL) {
  FnPtrType fnPtr = (FnPtrType) GetProcAddress(dllHandle, "functionName");
  if (fnPtr != NULL) {
    return fnPtr(params...);
  }
}
FreeLibrary(dllHandle);

Code in Unix-land is very similar indeed:

void* lib_handle = dlopen("path/to/libwxc.so", RTLD_LAZY);
...
if (lab_handle != NULL) {
  FnPtrType fnPtr = dlsym(lab_handle, "functionName");
  if ((error = dlerror()) == NULL) {
    return fnPtr(params...);
  }
}
...
dlclose(lib_handle);

The problem is that I have about 2,000 functions which need to be exported in this way, and the only sane approach will be to automate things. I also need to find the correct locations to load and unload libraries, and to ensure that all function pointers are invalidated when I unload the DLL.

I now have most of the work done on changes to the build system, and my ideas thought through for the dynamically loaded libraries. In the next few entries (which will hopefully be more closely spaced than of late), we’ll look at these in some more detail.

One aside: I mentioned doing some work to enable Swig to be used as a wrapper generator for the wxc wrapper layer. I still want to do this, but it has been put aside in the “too much work right now” pile. Doing things as I am planning is less maintainable in the long run, but stands a fighting chance of being finished sometime this decade, and I want people to use wxHaskell more than I need to have an awesome system for automating the generation of wrappers over C++ code.

Categories: Haskell, wxHaskell
Follow

Get every new post delivered to your Inbox.