Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] View of /sml/trunk/src/ml-nlffigen/README
ViewVC logotype

View of /sml/trunk/src/ml-nlffigen/README

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1067 - (download) (annotate)
Fri Feb 15 17:08:17 2002 UTC (18 years, 9 months ago) by blume
File size: 16286 byte(s)
ml-nlffigen: cpif mechanism and iptr repository implemented
Copyright (c) 2001, 2002, Lucent Technologies, Bell Laboratories

  author: Matthias Blume (blume@research.bell-labs.com)

This directory contains ML-NLFFI-Gen, a glue-code generator for
the new "NLFFI" foreign function interface.  The generator reads
C source code and emits ML code along with a description file for CM.

Compiling this generator requires the C-Kit ($/ckit-lib.cm) to be


January 10, 2002:  Major changes:

I reworked the glue code generator in a way that lets generated code
scale better -- at the expense of some (mostly academic) generality.

Changes involve the following:

1. The functorization is gone.

2. Every top-level C declaration results in a separate top-level
   ML equivalent (implemented by its own ML source file).

3. Incomplete pointer types are treated as new abstract types without
   possibility of later making them "concrete".  (It is here were
   we lose some generality.  Of course, one can always work around
   such problems by going through "voidptr" where necessary.)
   Alternatively, there is the possibility of the programmer
   providing definitions for such incomplete types.

4. All related C sources must be supplied to ml-nlffigen together.
   Types incomplete in one source but complete in another get
   automatically completed in a cross-file fashion.

5. The handle for the shared library to link to is now abstracted as
   a function closure.  Moreover, it must be supplied as a top-level
   variable (by the programmer).  For this purpose, ml-nlffigen has
   corresponding command-line options.

These changes mean that even very large (in number of exported definitions)
libraries such as, e.g., GTK can now be handled gracefully without
reaching the limits of the ML compiler's abilities.

[The example of GTK -- for which ml-nlffigen creates several thousands (!)
of separate ML source files -- puts an unusal burden on CM, though.
However, aside from running a bit longer than usual, CM handles loads
of this magnitute just fine.  Stabilizing the resulting library solves
the problem entirely as far as later clients are concerned.]

Sketch of translation- (and naming-) scheme:

  struct foo { ... }
      -->   structure ST_foo   in    st-foo.sml  (not exported)
               basic type info (name, size)
       &    structure S_foo    in    s-foo.sml
               abstract interface to the type
                    field accessors f_xxx  (unless -light)
                                and f_xxx' (unless -heavy)
                    field types     t_f_xxx
                    field RTTI      typ_f_xxx

  union foo { ... }
      -->   structure UT_foo   in    ut-foo.sml  (not exported)
               basic type info (name, size)
       &    structure U_foo    in    u-foo.sml
               abstract interface to the type
                    field accessors f_xxx  (unless -light)
                                and f_xxx' (unless -heavy)
                    field types     t_f_xxx
                    field RTTI      typ_f_xxx

  struct { ... }
      like struct <n> { ... }, where <n> is a fresh integer or 'bar
      if 'struct { ... }' occurs in the context of a
      'typedef struct { ... } bar'

  union { ... }
      like union <n> { ... }, where <n> is a fresh integer or 'bar
      if 'union { ... }' occurs in the context of a
      'typedef union { ... } bar'

  enum foo { ... }
      -->   structure E_foo   in     e-foo.sml
               enum constants    e_xxx

  enum { ... }
      like enum <n> { ... }, where <n> is a fresh integer or 'bar
      if 'enum { ... }' occurs in the context of a
      'typedef enum { ... } bar'

  T foo (T, ..., T)  (global function/function prototype)
      -->   structure F_foo   in     f-foo.sml
               containing three/four members:
                    typ :  RTTI
                    fptr:  thunkified fptr representing the C function
            maybe   f'  :  light-weight function wrapper around fptr
                              Turned off by -heavy (see below).
            maybe   f   :  heavy-weight function wrapper around fptr
                              Turned off by -light (see below).

  T foo;  (global variable)
      -->   structure G_foo   in     g-foo.sml
               containing three members:
                    t   :  type
                    typ :  RTTI
                    obj :  thunkified object representing the C variable

  struct foo *  (without existing definition of struct foo; incomplete type)
      -->   structure IS_foo  in exported from <a>/is-foo.cm
            which consists of <d>/is-foo.sml and where <a> is a CM
            anchor that points to directory <d>.

            The structure is generated by instantiating functor

            By specifying -incomplete (see below), <a>/is-foo.cm will
            not be included into the main .cm-file (but it will still
            be generated).

  union foo *   Same as with struct foo *, but replace IS_xxx with IU_xxx
            and is-xxx with iu-xxx everywhere.

  Additional files for implementing function entry sequences are created
  and used internally.  They do not contribute exports, though.

Command-line options for ml-nlffigen:

  General syntax:   ml-nlffigen <option> ... [--] <C-file> ...

  Environment variables:

    Ml-nlffigen looks at the environment variable FFIGEN_CPP to obtain
    the template string for the cpp command line.  If FFIGEN_CPP is not
    set, the template defaults to "gcc -E -U__GNUC__ %o %s > %t".
    The actual command line is obtained by substituting occurences of
    %s with the name of the source, and %t with the name of a temporary
    file holding the pre-processed code.


   -dir <dir>   output directory where all generated files are placed
                default:  "NLFFI-Generated"
   -allSU       instructs ml-nlffigen to include all structs and unions,
                even those that are defined in included files (as opposed
                to files explicitly listed as arguments)
                default: off
   -width <w>   sets output line width (just a guess) to <w>
                default: 75
   -lambdasplit <x>   instructs ml-nlffigen to generate "lambdasplit"
                options for all ML files (see CM manual for what this means;
                it does not currently work anyway because cross-module
                inlining is broken).
                default: nothing
   -target <t>  Sets the target to <t> (which must be one of "sparc-unix",
                "x86-unix", or "x86-win32").
                default: current architecture
   -light       suppress "heavy" versions of function wrappers and
                field accessors; also resets any earlier -heavy to default
                default: not suppressed
   -heavy       suppress "light" versions of function wrappers and
                field accessors; also resets any earlier -light to default
                default: not suppressed
   -namedargs   instruct ml-nlffigen to generated function wrappers that
                use named arguments (ML records) instead of tuples if
                there is enough information for this in the C source;
                (this is not always very useful)
                default: off
   -iptrs <d> <a>  Use directory <d> (default is equal to <dir>) as
                a "repository" for files related to incomplete pointers.
                Each incomplete type is represented by its own CM library
                so that different runs of ml-nlffigen can share the same
                repository without problems.  The directory <d> must be
                addressable from within <dir> by the CM standard path <a>
                (which should preferably be anchored).  The default for
                <a> is nothing (since the generated sublibrary will reside
                directly within <dir>).
   -incomplete  Do not include definitions for incomplete types;  these
                will then have to be provided by the programmer
                default: include such definitions, making each incomplete
                type a new abstract type.
                By specifying the -iptrs option (see below), one can
                designate a "repository" of incomplete types that can
                be shared among the outputs of several different runs
                of ml-nlffigen.
   -libhandle <h>   Use the variable <h> to refer to the handle to the
                shared library object.  Given the constraints of CM, <h>
                must have the form of a long ML identifier, e.g.,
                default: Library.libh
   -include <f> Mention file <f> in the generated .cm file.  This option
                is necessary at least once for providing the library handle.
                It can be used arbitrarily many times, resulting in more
                than one such programmer-supplied file to be mentioned.
                If <f> is relative, then it must be relative to the directory
                specified in the -dir <dir> option.
   -cmfile <f>  Specify name of the generated .cm file, relative to
                the directory specified by the -dir <dir> option.
                default: nlffi-generated.cm
   -cppopt <o>  The string <o> gets added to the list of options to be
                passed to cpp (the C preprocessor).  The list of options
                gets substituted for %o in the cpp command line template.
   -U<x>        The string -U<x> gets added to the list of cpp options.
   -D<x>        The string -D<x> gets added to the list of cpp options.
   -I<x>        The string -I<x> gets added to the list of cpp options.
   -version     Just write the version number of ml-nlffigen to standard
                output and then quit.
   -match <r>   Normally ml-nlffigen will include ML definitions for a C
                declaration if the C declaration textually appears in
                one of the files specified at the command line.  Definitions
                in #include-d files will normally not appear (unless
                their absence would lead to inconsistencies).
                By specifying -match <r>, ml-nlffigen will also include
                definitions that occur in recursively #include-d files
                for which the AWK-style regular expression <r> matches
                their names.
   -prefix <p>  Generated ML structure names will all have prefix <p>
                (in addition to the usual "S_" or "U_" or "F_" ...)
   -gensym <g>  Names "gensym-ed" by ml-nlffigen (for anonymous struct/union/
                enums) will get an additional suffix _<g>.  (This should
                be used if output from several indepdendent runs of 
                ml-nlffigen are to coexist in the same ML program.)
   --           Terminate processing of options, remaining arguments are
                taken to be C sources.


Sample usage:

Suppose we have a C interface defined in foo.h.

1. Running ml-nlffigen:

   It is best to let a tool such as Unix' "make" handle the invocation of
   ml-nlffigen.  The following "Makefile" can be used as a template for
   other projects:

  |FILES = foo.h
  |H = FooH.libh
  |D = FFI
  |HF = ../foo-h.sml
  |CF = foo.cm
  |$(D)/$(CF): $(FILES)
  |	ml-nlffigen -include $(HF) -libhandle $(H) -dir $(D) -cmfile $(CF) $^

   Suppose the above file is stored as "foo.make".  Running

     $ make -f foo.make

   will generate a subdirectory "FFI" full of ML files corresponding to
   the definitions in foo.h.  Access to the generated ML code is gained
   by refering to the CM library FFI/foo.cm; the .cm-file (foo.cm) is
   also produced by ml-nlffigen.

2. The ML code uses the library handle specified in the command line
   (here: FooH.libh) for dynamic linking.  The type of FooH.libh must

        FooH.libh : string -> unit -> CMemory.addr

   That is, FooH.libh takes the name of a symbol and produces that
   symbol's suspended address.

   The code that implements FooH.libh must be provided by the programmer.
   In the above example, we assume that it is stored in file foo-h.sml.
   The name of that file must appear in the generated .cm-file, hence the
   "-include" command-line argument.

   Notice that the name provided to ml-nlffigen must be relative to the
   output directory.  Therefore, in our case it is "../foo-h.sml" and not
   just foo-h.sml (because the full path would be FFI/../foo-h.sml).

3. To actually implement FooH.libh, use the "DynLinkage" module.
   Suppose the shared library's name is "/usr/lib/foo.so".  Here is
   the corresponding contents of foo-h.sml:

  |structure FooH = struct
  |    local 
  |        val lh = DynLinkage.open_lib
  |             { name = "/usr/lib/foo.so", global = true, lazy = true }
  |    in
  |        fun libh s = let
  |            val sh = DynLinkage.lib_symbol (lh, s)
  |        in
  |            fn () => DynLinkage.addr sh
  |        end
  |    end

   If all the symbols you are linking to are already available within
   the ML runtime system, then you don't need to open a new shared
   object.  As a result, your FooH implementation would look like this:

  |structure FooH = struct
  |    fun libh s = let
  |        val sh = DynLinkage.lib_symbol (DynLinkage.main_lib, s)
  |    in
  |        fn () => DynLinkage.addr sh
  |    end

   If the symbols your are accessing are strewn across several separate
   shared objects, then there are two possible solutions:

   a)  Open several shared libraries and perform a trial-and-error search
       for every symbol you are looking up.  (The DynLinkage module raises
       an exception (DynLinkError of string) if the lookup fails.  This
       could be used to daisy-chain lookup operations.)

       [Be careful:  Sometimes there are non-obvious inter-dependencies
       between shared libraries.  Consider using DynLinkage.open_lib'
       to express those.]

   b)  A simpler and more robust way of accessing several shared libraries
       is to create a new "summary" library object at the OS level.
       Supposed you are trying to access /usr/lib/foo.so and /usr/lib/bar.so.
       The solution is to make a "foobar.so" object by saying:

        $ ld -shared -o foobar.so /usr/lib/foo.so /usr/lib/bar.so

       The ML code then referes to foobar.so and the Linux dynamic loader
       does the rest.

4. To put it all together, let's wrap it up in a .cm-file.  For example,
   if we simply want to directly make the ml-nlffigen-generated definitions
   available to the "end user", we could write this wrapper .cm-file
   (let's call it foo.cm):

  |	library(FFI/foo.cm)
  |	$/basis.cm
  |	$/c.cm
  |	FFI/foo.cm : make (-f foo.make)

   Now, saying

     $ sml -m foo.cm

   is all one need's to do in order to compile.  (CM will automatically
   invoke "make", so you don't have to run "make" separately.)

   If the goal is not to export the "raw" ml-nlffigen-generated stuff
   but rather something more nicely "wrapped", consider writing wrapper
   ML code.  Suppose you have wrapper definitions for structure Foo_a
   and structure Foo_b with code for those in wrap-foo-a.sml and
   wrap-foo-b.sml.  In this case the corresponding .cm-file would
   look like the following:

  |	structure Foo_a
  |	structure Foo_b
  |	$/basis.cm
  |	$/c.cm
  |	FFI/foo.cm : make (-f foo.make)
  |	wrapper-foo-a.sml
  |	wrapper-foo-b.sml

ViewVC Help
Powered by ViewVC 1.0.0