Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/ml-nlffigen/README
ViewVC logotype

Annotation of /sml/trunk/src/ml-nlffigen/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1039 - (view) (download)

1 : blume 1028 Copyright (c) 2001, 2002, Lucent Technologies, Bell Laboratories
2 : blume 828
3 : blume 1028 author: Matthias Blume (blume@research.bell-labs.com)
4 : blume 828
5 :     This directory contains ML-NLFFI-Gen, a glue-code generator for
6 :     the new "NLFFI" foreign function interface. The generator reads
7 :     C source code and emits ML code along with a description file for CM.
8 :    
9 :     Compiling this generator requires the C-Kit ($/ckit-lib.cm) to be
10 :     installed.
11 : blume 1011
12 :     ---------------------------------------------------------------------
13 :    
14 :     January 10, 2002: Major changes:
15 :    
16 :     I reworked the glue code generator in a way that lets generated code
17 :     scale better -- at the expense of some (mostly academic) generality.
18 :    
19 :     Changes involve the following:
20 :    
21 :     1. The functorization is gone.
22 :    
23 :     2. Every top-level C declaration results in a separate top-level
24 :     ML equivalent (implemented by its own ML source file).
25 :    
26 :     3. Incomplete pointer types are treated as new abstract types without
27 :     possibility of later making them "concrete". (It is here were
28 :     we lose some generality. Of course, one can always work around
29 :     such problems by going through "voidptr" where necessary.)
30 :     Alternatively, there is the possibility of the programmer
31 :     providing definitions for such incomplete types.
32 :    
33 :     4. All related C sources must be supplied to ml-nlffigen together.
34 :     Types incomplete in one source but complete in another get
35 :     automatically completed in a cross-file fashion.
36 :    
37 :     5. The handle for the shared library to link to is now abstracted as
38 :     a function closure. Moreover, it must be supplied as a top-level
39 :     variable (by the programmer). For this purpose, ml-nlffigen has
40 :     corresponding command-line options.
41 :    
42 :     These changes mean that even very large (in number of exported definitions)
43 :     libraries such as, e.g., GTK can now be handled gracefully without
44 :     reaching the limits of the ML compiler's abilities.
45 :    
46 :     [The example of GTK -- for which ml-nlffigen creates several thousands (!)
47 :     of separate ML source files -- puts an unusal burden on CM, though.
48 :     However, aside from running a bit longer than usual, CM handles loads
49 :     of this magnitute just fine. Stabilizing the resulting library solves
50 :     the problem entirely as far as later clients are concerned.]
51 :    
52 :    
53 :     Sketch of translation- (and naming-) scheme:
54 :    
55 :     struct foo { ... }
56 :     --> structure ST_foo in st-foo.sml (not exported)
57 :     basic type info (name, size)
58 :     & structure S_foo in s-foo.sml
59 :     abstract interface to the type
60 :     field accessors f_xxx (unless -light)
61 :     and f_xxx' (unless -heavy)
62 :     field types t_f_xxx
63 :     field RTTI typ_f_xxx
64 :    
65 :     union foo { ... }
66 :     --> structure UT_foo in ut-foo.sml (not exported)
67 :     basic type info (name, size)
68 :     & structure U_foo in u-foo.sml
69 :     abstract interface to the type
70 :     field accessors f_xxx (unless -light)
71 :     and f_xxx' (unless -heavy)
72 :     field types t_f_xxx
73 :     field RTTI typ_f_xxx
74 :    
75 :     struct { ... }
76 : blume 1039 like struct <n> { ... }, where <n> is a fresh integer or 'bar
77 : blume 1036 if 'struct { ... }' occurs in the context of a
78 :     'typedef struct { ... } bar'
79 : blume 1011
80 :     union { ... }
81 : blume 1039 like union <n> { ... }, where <n> is a fresh integer or 'bar
82 : blume 1036 if 'union { ... }' occurs in the context of a
83 :     'typedef union { ... } bar'
84 : blume 1011
85 : blume 1036
86 : blume 1011 enum foo { ... }
87 :     --> structure E_foo in e-foo.sml
88 :     enum constants e_xxx
89 :    
90 :     enum { ... }
91 : blume 1039 like enum <n> { ... }, where <n> is a fresh integer or 'bar
92 : blume 1036 if 'enum { ... }' occurs in the context of a
93 :     'typedef enum { ... } bar'
94 : blume 1011
95 : blume 1036
96 : blume 1011 T foo (T, ..., T) (global function/function prototype)
97 :     --> structure F_foo in f-foo.sml
98 :     containing three/four members:
99 :     typ : RTTI
100 :     fptr: thunkified fptr representing the C function
101 :     maybe f' : light-weight function wrapper around fptr
102 :     Turned off by -heavy (see below).
103 :     maybe f : heavy-weight function wrapper around fptr
104 :     Turned off by -light (see below).
105 :    
106 :     T foo; (global variable)
107 :     --> structure G_foo in g-foo.sml
108 :     containing three members:
109 :     t : type
110 :     typ : RTTI
111 :     obj : thunkified object representing the C variable
112 :    
113 :     struct foo * (without existing definition of struct foo; incomplete type)
114 :     --> structure IS_foo in iptrs.sml
115 :     The structure is generated by instantiating functor
116 :     PointerToIncompletType.
117 :     This is turned off by specifying -incomplete (see below).
118 :    
119 :     union foo * (without existing definition of struct foo; incomplete type)
120 :     --> structure IU_foo in iptrs.sml
121 :     The structure is generated by instantiating functor
122 :     PointerToIncompletType.
123 :     This is turned off by specifying -incomplete (see below).
124 :    
125 :     Additional files for implementing function entry sequences are created
126 :     and used internally. They do not contribute exports, though.
127 :    
128 :    
129 :     Command-line options for ml-nlffigen:
130 :    
131 :     General syntax: ml-nlffigen <option> ... [--] <C-file> ...
132 :    
133 : blume 1036 Environment variables:
134 :    
135 :     Ml-nlffigen looks at the environment variable FFIGEN_CPP to obtain
136 :     the template string for the cpp command line. If FFIGEN_CPP is not
137 :     set, the template defaults to "gcc -E -U__GNUC__ %o %s > %t".
138 :     The actual command line is obtained by substituting occurences of
139 :     %s with the name of the source, and %t with the name of a temporary
140 :     file holding the pre-processed code.
141 :    
142 : blume 1011 Options:
143 :    
144 :     -dir <dir> output directory where all generated files are placed
145 :     default: "NLFFI-Generated"
146 :     -allSU instructs ml-nlffigen to include all structs and unions,
147 :     even those that are defined in included files (as opposed
148 :     to files explicitly listed as arguments)
149 :     default: off
150 :     -width <w> sets output line width (just a guess) to <w>
151 :     default: 75
152 :     -lambdasplit <x> instructs ml-nlffigen to generate "lambdasplit"
153 :     options for all ML files (see CM manual for what this means;
154 :     it does not currently work anyway because cross-module
155 :     inlining is broken).
156 :     default: nothing
157 :     -target <t> Sets the target to <t> (which must be one of "sparc-unix",
158 :     "x86-unix", or "x86-win32").
159 :     default: current architecture
160 :     -light suppress "heavy" versions of function wrappers and
161 :     field accessors; also resets any earlier -heavy to default
162 :     default: not suppressed
163 :     -heavy suppress "light" versions of function wrappers and
164 :     field accessors; also resets any earlier -light to default
165 :     default: not suppressed
166 :     -namedargs instruct ml-nlffigen to generated function wrappers that
167 :     use named arguments (ML records) instead of tuples if
168 :     there is enough information for this in the C source;
169 :     (this is not always very useful)
170 :     default: off
171 : blume 1036 -incomplete Do not generate definitions for incomplete types; these
172 : blume 1011 will then have to be provided by the programmer
173 :     default: generate such definitions, making each incomplete
174 : blume 1036 type a new abstract type.
175 :     (If -incomplete has been specified, ml-nlffigen will
176 :     still generate the file <dir>/iptrs.sml, but it will not
177 :     be mentioned in the generated .cm file. Moreover,
178 :     the definitions in the generated <dir>/iptrs.sml will
179 :     then be commented out, and the usual "credits" comments that
180 :     appear at the top of all other generated files will
181 :     be omitted.)
182 : blume 1011 -libhandle <h> Use the variable <h> to refer to the handle to the
183 :     shared library object. Given the constraints of CM, <h>
184 :     must have the form of a long ML identifier, e.g.,
185 :     MyLibrary.libhandle.
186 :     default: Library.libh
187 :     -include <f> Mention file <f> in the generated .cm file. This option
188 :     is necessary at least once for providing the library handle.
189 :     It can be used arbitrarily many times, resulting in more
190 :     than one such programmer-supplied file to be mentioned.
191 :     If <f> is relative, then it must be relative to the directory
192 :     specified in the -dir <dir> option.
193 :     -cmfile <f> Specify name of the generated .cm file, relative to
194 :     the directory specified by the -dir <dir> option.
195 :     default: nlffi-generated.cm
196 : blume 1036 -cppopt <o> The string <o> gets added to the list of options to be
197 :     passed to cpp (the C preprocessor). The list of options
198 :     gets substituted for %o in the cpp command line template.
199 :     -U<x> The string -U<x> gets added to the list of cpp options.
200 :     -D<x> The string -D<x> gets added to the list of cpp options.
201 :     -I<x> The string -I<x> gets added to the list of cpp options.
202 :     -version Just write the version number of ml-nlffigen to standard
203 :     output and then quit.
204 :     -match <r> Normally ml-nlffigen will include ML definitions for a C
205 :     declaration if the C declaration textually appears in
206 :     one of the files specified at the command line. Definitions
207 :     in #include-d files will normally not appear (unless
208 :     their absence would lead to inconsistencies).
209 :     By specifying -match <r>, ml-nlffigen will also include
210 :     definitions that occur in recursively #include-d files
211 :     for which the AWK-style regular expression <r> matches
212 :     a prefix of their names.
213 : blume 1011 -- Terminate processing of options, remaining arguments are
214 :     taken to be C sources.
215 :    
216 : blume 1028 ----------------------------------------------------------------------
217 :    
218 :     Sample usage:
219 :    
220 :     Suppose we have a C interface defined in foo.h.
221 :    
222 :     1. Running ml-nlffigen:
223 :    
224 :     It is best to let a tool such as Unix' "make" handle the invocation of
225 :     ml-nlffigen. The following "Makefile" can be used as a template for
226 :     other projects:
227 :    
228 :     +----------------------------------------------------------
229 :     |FILES = foo.h
230 :     |H = FooH.libh
231 :     |D = FFI
232 :     |HF = ../foo-h.sml
233 :     |CF = foo.cm
234 :     |
235 :     |$(D)/$(CF): $(FILES)
236 :     | ml-nlffigen -include $(HF) -libhandle $(H) -dir $(D) -cmfile $(CF) $^
237 :     +----------------------------------------------------------
238 :    
239 :     Suppose the above file is stored as "foo.make". Running
240 :    
241 :     $ make -f foo.make
242 :    
243 :     will generate a subdirectory "FFI" full of ML files corresponding to
244 :     the definitions in foo.h. Access to the generated ML code is gained
245 :     by refering to the CM library FFI/foo.cm; the .cm-file (foo.cm) is
246 :     also produced by ml-nlffigen.
247 :    
248 :     2. The ML code uses the library handle specified in the command line
249 :     (here: FooH.libh) for dynamic linking. The type of FooH.libh must
250 :     be:
251 :    
252 :     FooH.libh : string -> unit -> CMemory.addr
253 :    
254 :     That is, FooH.libh takes the name of a symbol and produces that
255 :     symbol's suspended address.
256 :    
257 :     The code that implements FooH.libh must be provided by the programmer.
258 :     In the above example, we assume that it is stored in file foo-h.sml.
259 :     The name of that file must appear in the generated .cm-file, hence the
260 :     "-include" command-line argument.
261 :    
262 :     Notice that the name provided to ml-nlffigen must be relative to the
263 :     output directory. Therefore, in our case it is "../foo-h.sml" and not
264 :     just foo-h.sml (because the full path would be FFI/../foo-h.sml).
265 :    
266 :     3. To actually implement FooH.libh, use the "DynLinkage" module.
267 :     Suppose the shared library's name is "/usr/lib/foo.so". Here is
268 :     the corresponding contents of foo-h.sml:
269 :    
270 :     +-------------------------------------------------------------
271 :     |structure FooH = struct
272 :     | local
273 :     | val lh = DynLinkage.open_lib
274 :     | { name = "/usr/lib/foo.so", global = true, lazy = true }
275 :     | in
276 :     | fun libh s = let
277 :     | val sh = DynLinkage.lib_symbol (lh, s)
278 :     | in
279 :     | fn () => DynLinkage.addr sh
280 :     | end
281 :     | end
282 :     |end
283 :     +-------------------------------------------------------------
284 :    
285 :     If all the symbols you are linking to are already available within
286 :     the ML runtime system, then you don't need to open a new shared
287 :     object. As a result, your FooH implementation would look like this:
288 :    
289 :     +-------------------------------------------------------------
290 :     |structure FooH = struct
291 :     | fun libh s = let
292 :     | val sh = DynLinkage.lib_symbol (DynLinkage.main_lib, s)
293 :     | in
294 :     | fn () => DynLinkage.addr sh
295 :     | end
296 :     |end
297 :     +-------------------------------------------------------------
298 :    
299 :     If the symbols your are accessing are strewn across several separate
300 :     shared objects, then there are two possible solutions:
301 :    
302 :     a) Open several shared libraries and perform a trial-and-error search
303 :     for every symbol you are looking up. (The DynLinkage module raises
304 :     an exception (DynLinkError of string) if the lookup fails. This
305 :     could be used to daisy-chain lookup operations.)
306 :    
307 :     [Be careful: Sometimes there are non-obvious inter-dependencies
308 :     between shared libraries. Consider using DynLinkage.open_lib'
309 :     to express those.]
310 :    
311 :     b) A simpler and more robust way of accessing several shared libraries
312 :     is to create a new "summary" library object at the OS level.
313 :     Supposed you are trying to access /usr/lib/foo.so and /usr/lib/bar.so.
314 :     The solution is to make a "foobar.so" object by saying:
315 :    
316 :     $ ld -shared -o foobar.so /usr/lib/foo.so /usr/lib/bar.so
317 :    
318 :     The ML code then referes to foobar.so and the Linux dynamic loader
319 :     does the rest.
320 :    
321 :     4. To put it all together, let's wrap it up in a .cm-file. For example,
322 :     if we simply want to directly make the ml-nlffigen-generated definitions
323 :     available to the "end user", we could write this wrapper .cm-file
324 :     (let's call it foo.cm):
325 :    
326 :     +-------------------------------------------------------------
327 :     |library
328 :     | library(FFI/foo.cm)
329 :     |is
330 :     | $/basis.cm
331 :     | $/c.cm
332 :     | FFI/foo.cm : make (-f foo.make)
333 :     +-------------------------------------------------------------
334 :    
335 :     Now, saying
336 :    
337 :     $ sml -m foo.cm
338 :    
339 :     is all one need's to do in order to compile. (CM will automatically
340 :     invoke "make", so you don't have to run "make" separately.)
341 :    
342 :     If the goal is not to export the "raw" ml-nlffigen-generated stuff
343 :     but rather something more nicely "wrapped", consider writing wrapper
344 :     ML code. Suppose you have wrapper definitions for structure Foo_a
345 :     and structure Foo_b with code for those in wrap-foo-a.sml and
346 :     wrap-foo-b.sml. In this case the corresponding .cm-file would
347 :     look like the following:
348 :    
349 :     +-------------------------------------------------------------
350 :     |library
351 :     | structure Foo_a
352 :     | structure Foo_b
353 :     |is
354 :     | $/basis.cm
355 :     | $/c.cm
356 :     | FFI/foo.cm : make (-f foo.make)
357 :     | wrapper-foo-a.sml
358 :     | wrapper-foo-b.sml
359 :     +-------------------------------------------------------------

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0