Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/ml-nlffigen/README
ViewVC logotype

Annotation of /sml/trunk/src/ml-nlffigen/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1190 - (view) (download)

1 : blume 1028 Copyright (c) 2001, 2002, Lucent Technologies, Bell Laboratories
2 : blume 828
3 : blume 1028 author: Matthias Blume (blume@research.bell-labs.com)
4 : blume 828
5 :     This directory contains ML-NLFFI-Gen, a glue-code generator for
6 :     the new "NLFFI" foreign function interface. The generator reads
7 :     C source code and emits ML code along with a description file for CM.
8 :    
9 :     Compiling this generator requires the C-Kit ($/ckit-lib.cm) to be
10 :     installed.
11 : blume 1011
12 :     ---------------------------------------------------------------------
13 :    
14 : blume 1096 February 21, 2002: Major changes:
15 : blume 1011
16 :     I reworked the glue code generator in a way that lets generated code
17 :     scale better -- at the expense of some (mostly academic) generality.
18 :    
19 :     Changes involve the following:
20 :    
21 :     1. The functorization is gone.
22 :    
23 :     2. Every top-level C declaration results in a separate top-level
24 :     ML equivalent (implemented by its own ML source file).
25 :    
26 : blume 1096 3. Incomplete pointer types are treated just like their complete
27 :     versions -- the only difference being that no RTTI will be
28 :     available for them. In the "light" interface, this rules out
29 :     precisely those operations over them that C would disallow.
30 : blume 1011
31 :     4. All related C sources must be supplied to ml-nlffigen together.
32 :     Types incomplete in one source but complete in another get
33 :     automatically completed in a cross-file fashion.
34 :    
35 :     5. The handle for the shared library to link to is now abstracted as
36 :     a function closure. Moreover, it must be supplied as a top-level
37 :     variable (by the programmer). For this purpose, ml-nlffigen has
38 :     corresponding command-line options.
39 :    
40 :     These changes mean that even very large (in number of exported definitions)
41 :     libraries such as, e.g., GTK can now be handled gracefully without
42 :     reaching the limits of the ML compiler's abilities.
43 :    
44 :     [The example of GTK -- for which ml-nlffigen creates several thousands (!)
45 :     of separate ML source files -- puts an unusal burden on CM, though.
46 :     However, aside from running a bit longer than usual, CM handles loads
47 :     of this magnitute just fine. Stabilizing the resulting library solves
48 :     the problem entirely as far as later clients are concerned.]
49 :    
50 :    
51 :     Sketch of translation- (and naming-) scheme:
52 :    
53 :     struct foo { ... }
54 :     --> structure ST_foo in st-foo.sml (not exported)
55 :     basic type info (name, size)
56 :     & structure S_foo in s-foo.sml
57 :     abstract interface to the type
58 :     field accessors f_xxx (unless -light)
59 :     and f_xxx' (unless -heavy)
60 :     field types t_f_xxx
61 :     field RTTI typ_f_xxx
62 :    
63 : blume 1096 & (unless "-nosucvt" was set)
64 :     structures IS_foo in <a>/is-foo.sml
65 :     (see discussion of struct *foo below)
66 :    
67 : blume 1011 union foo { ... }
68 :     --> structure UT_foo in ut-foo.sml (not exported)
69 :     basic type info (name, size)
70 :     & structure U_foo in u-foo.sml
71 :     abstract interface to the type
72 :     field accessors f_xxx (unless -light)
73 :     and f_xxx' (unless -heavy)
74 :     field types t_f_xxx
75 :     field RTTI typ_f_xxx
76 :    
77 : blume 1096 & (unless "-nosucvt" was set)
78 :     structures IU_foo in <a>/iu-foo.sml
79 :     (see discussion of union *foo below)
80 :    
81 : blume 1011 struct { ... }
82 : blume 1039 like struct <n> { ... }, where <n> is a fresh integer or 'bar
83 : blume 1036 if 'struct { ... }' occurs in the context of a
84 :     'typedef struct { ... } bar'
85 : blume 1011
86 :     union { ... }
87 : blume 1039 like union <n> { ... }, where <n> is a fresh integer or 'bar
88 : blume 1036 if 'union { ... }' occurs in the context of a
89 :     'typedef union { ... } bar'
90 : blume 1011
91 : blume 1036
92 : blume 1011 enum foo { ... }
93 :     --> structure E_foo in e-foo.sml
94 : blume 1096 external type mlrep with
95 :     enum constants e_xxx
96 :     conversion functions between tag enum and mlrep
97 :     between mlrep and sint
98 :     access functions (get/set) that operate on mlrep
99 :     (as an alternative to C.Get.enum/C.Set.enum which
100 :     operate on sint)
101 : blume 1011
102 : blume 1096 If the command-line optino "-ec" ("-enum-constructors") was set
103 :     and the values of all enum constants are different from each
104 :     other, then mlrep will be a datatype (thus making it possible
105 :     to pattern-match).
106 :    
107 : blume 1011 enum { ... }
108 : blume 1096 If this construct appears in the context of a surrounding
109 :     (non-anonymous) struct or union or typedef, the enumeration gets
110 :     assigned an artificial tag (just like similar structs and unions,
111 :     see above).
112 : blume 1011
113 : blume 1096 Unless the command-line option "-nocollect" was specified, then
114 :     all constants in other (truly) unnamed enumerations will be
115 :     collected into a single enumeration represented by structure E_'.
116 :     This single enumeration is then treated like a regular enumeration
117 :     (including handling of "-ec" -- see above).
118 : blume 1036
119 : blume 1096 The default behavior ("collect") is to assign a fresh integer
120 :     tag (again, just like in the struct/union case).
121 :    
122 : blume 1011 T foo (T, ..., T) (global function/function prototype)
123 :     --> structure F_foo in f-foo.sml
124 :     containing three/four members:
125 :     typ : RTTI
126 :     fptr: thunkified fptr representing the C function
127 :     maybe f' : light-weight function wrapper around fptr
128 :     Turned off by -heavy (see below).
129 :     maybe f : heavy-weight function wrapper around fptr
130 :     Turned off by -light (see below).
131 :    
132 :     T foo; (global variable)
133 :     --> structure G_foo in g-foo.sml
134 :     containing three members:
135 :     t : type
136 :     typ : RTTI
137 :     obj : thunkified object representing the C variable
138 :    
139 :     struct foo * (without existing definition of struct foo; incomplete type)
140 : blume 1096 --> an internal structure ST_foo with a type "tag" (just like in
141 :     the struct foo { ... } case)
142 :     The difference is that no structure S_foo will be generated,
143 :     so there is no field-access interface and no RTTI (size or typ)
144 :     for this. All "light-weight" functions referring to this
145 :     pointer type will be generated, heavy-weight functions will
146 :     be generated only if they do not require access to RTTI.
147 : blume 1011
148 : blume 1096 If "-heavy" was specified but a heavy interface function
149 :     cannot be generated because of incomplete types, then its
150 :     light counterpart will be issued generated anyway.
151 : blume 1011
152 : blume 1096 union foo * Same as with struct foo *, but replace S_foo with U_foo
153 :     and ST_foo with UT_foo.
154 : blume 1067
155 : blume 1011 Additional files for implementing function entry sequences are created
156 :     and used internally. They do not contribute exports, though.
157 :    
158 :    
159 :     Command-line options for ml-nlffigen:
160 :    
161 :     General syntax: ml-nlffigen <option> ... [--] <C-file> ...
162 :    
163 : blume 1036 Environment variables:
164 :    
165 :     Ml-nlffigen looks at the environment variable FFIGEN_CPP to obtain
166 :     the template string for the cpp command line. If FFIGEN_CPP is not
167 :     set, the template defaults to "gcc -E -U__GNUC__ %o %s > %t".
168 :     The actual command line is obtained by substituting occurences of
169 :     %s with the name of the source, and %t with the name of a temporary
170 :     file holding the pre-processed code.
171 :    
172 : blume 1011 Options:
173 :    
174 :     -dir <dir> output directory where all generated files are placed
175 : blume 1096 -d <dir> default: "NLFFI-Generated"
176 :    
177 : blume 1011 -allSU instructs ml-nlffigen to include all structs and unions,
178 :     even those that are defined in included files (as opposed
179 :     to files explicitly listed as arguments)
180 :     default: off
181 : blume 1096
182 : blume 1011 -width <w> sets output line width (just a guess) to <w>
183 : blume 1096 -w <w> default: 75
184 :    
185 : blume 1137 -smloption <x> instructs ml-nlffigen to include <x> into the list
186 :     of options to annotate .sml entries in the generated .cm
187 :     file with. By default, the list consists just of "noguid".
188 :     -guid Removes the default "noguid" from the list of sml options.
189 :     (This re-enables strict handling of type- and object-identity
190 :     but can have negative impact on CM cutoff recompilation
191 :     performance if the programmer routinely removes the entire
192 :     tree of ml-nlffigen-generated files during development.)
193 : blume 1190
194 :     (*
195 :     -lambdasplit <x> instructs ml-nlffigen to generate "lambdasplit"
196 : blume 1096 -ls <x> options for all ML files (see CM manual for what this means;
197 : blume 1011 it does not currently work anyway because cross-module
198 :     inlining is broken).
199 :     default: nothing
200 : blume 1190 *)
201 : blume 1096
202 : blume 1011 -target <t> Sets the target to <t> (which must be one of "sparc-unix",
203 : blume 1096 -t <t> "x86-unix", or "x86-win32").
204 : blume 1011 default: current architecture
205 : blume 1096
206 : blume 1011 -light suppress "heavy" versions of function wrappers and
207 : blume 1096 -l field accessors; also resets any earlier -heavy to default
208 : blume 1011 default: not suppressed
209 : blume 1096
210 : blume 1011 -heavy suppress "light" versions of function wrappers and
211 : blume 1096 -h field accessors; also resets any earlier -light to default
212 : blume 1011 default: not suppressed
213 : blume 1096
214 : blume 1011 -namedargs instruct ml-nlffigen to generated function wrappers that
215 : blume 1096 -na use named arguments (ML records) instead of tuples if
216 : blume 1011 there is enough information for this in the C source;
217 :     (this is not always very useful)
218 :     default: off
219 : blume 1096
220 :     -nocollect Do not do the following:
221 :     Collect enum constants from truly unnamed enumerations
222 :     (those without tags that occur at toplevel or in an
223 :     unnamed context, i.e., not in a typedef or another
224 :     named struct or union) into a single artificial
225 :     enumeration tagged by ' (single apostrohe). The corresponding
226 :     ML-side representative will be a structure named E_'.
227 :    
228 :     -enum-constructors
229 :     -ec When possible (i.e., if all values of a given enumeration
230 :     are different from each other), make the ML representation
231 :     type of the enumeration a datatype. The default (and
232 :     fallback) is to make that type the same as MLRep.Signed.int.
233 :    
234 : blume 1011 -libhandle <h> Use the variable <h> to refer to the handle to the
235 : blume 1096 -lh <h> shared library object. Given the constraints of CM, <h>
236 : blume 1011 must have the form of a long ML identifier, e.g.,
237 :     MyLibrary.libhandle.
238 :     default: Library.libh
239 : blume 1096
240 : blume 1011 -include <f> Mention file <f> in the generated .cm file. This option
241 : blume 1096 -add <f> is necessary at least once for providing the library handle.
242 : blume 1011 It can be used arbitrarily many times, resulting in more
243 :     than one such programmer-supplied file to be mentioned.
244 :     If <f> is relative, then it must be relative to the directory
245 :     specified in the -dir <dir> option.
246 : blume 1096
247 : blume 1011 -cmfile <f> Specify name of the generated .cm file, relative to
248 : blume 1096 -cm <f> the directory specified by the -dir <dir> option.
249 : blume 1011 default: nlffi-generated.cm
250 : blume 1096
251 : blume 1036 -cppopt <o> The string <o> gets added to the list of options to be
252 :     passed to cpp (the C preprocessor). The list of options
253 :     gets substituted for %o in the cpp command line template.
254 : blume 1096
255 : blume 1036 -U<x> The string -U<x> gets added to the list of cpp options.
256 : blume 1096
257 : blume 1036 -D<x> The string -D<x> gets added to the list of cpp options.
258 : blume 1096
259 : blume 1036 -I<x> The string -I<x> gets added to the list of cpp options.
260 : blume 1096
261 : blume 1036 -version Just write the version number of ml-nlffigen to standard
262 :     output and then quit.
263 : blume 1096
264 : blume 1036 -match <r> Normally ml-nlffigen will include ML definitions for a C
265 : blume 1096 -m <r> declaration if the C declaration textually appears in
266 : blume 1036 one of the files specified at the command line. Definitions
267 :     in #include-d files will normally not appear (unless
268 :     their absence would lead to inconsistencies).
269 :     By specifying -match <r>, ml-nlffigen will also include
270 :     definitions that occur in recursively #include-d files
271 :     for which the AWK-style regular expression <r> matches
272 : blume 1062 their names.
273 : blume 1096
274 : blume 1062 -prefix <p> Generated ML structure names will all have prefix <p>
275 : blume 1096 -p <p> (in addition to the usual "S_" or "U_" or "F_" ...)
276 :    
277 : blume 1062 -gensym <g> Names "gensym-ed" by ml-nlffigen (for anonymous struct/union/
278 : blume 1096 -g <g> enums) will get an additional suffix _<g>. (This should
279 : blume 1062 be used if output from several indepdendent runs of
280 :     ml-nlffigen are to coexist in the same ML program.)
281 : blume 1096
282 : blume 1011 -- Terminate processing of options, remaining arguments are
283 :     taken to be C sources.
284 :    
285 : blume 1028 ----------------------------------------------------------------------
286 :    
287 :     Sample usage:
288 :    
289 :     Suppose we have a C interface defined in foo.h.
290 :    
291 :     1. Running ml-nlffigen:
292 :    
293 :     It is best to let a tool such as Unix' "make" handle the invocation of
294 :     ml-nlffigen. The following "Makefile" can be used as a template for
295 :     other projects:
296 :    
297 :     +----------------------------------------------------------
298 :     |FILES = foo.h
299 :     |H = FooH.libh
300 :     |D = FFI
301 :     |HF = ../foo-h.sml
302 :     |CF = foo.cm
303 :     |
304 :     |$(D)/$(CF): $(FILES)
305 :     | ml-nlffigen -include $(HF) -libhandle $(H) -dir $(D) -cmfile $(CF) $^
306 :     +----------------------------------------------------------
307 :    
308 :     Suppose the above file is stored as "foo.make". Running
309 :    
310 :     $ make -f foo.make
311 :    
312 :     will generate a subdirectory "FFI" full of ML files corresponding to
313 :     the definitions in foo.h. Access to the generated ML code is gained
314 :     by refering to the CM library FFI/foo.cm; the .cm-file (foo.cm) is
315 :     also produced by ml-nlffigen.
316 :    
317 :     2. The ML code uses the library handle specified in the command line
318 :     (here: FooH.libh) for dynamic linking. The type of FooH.libh must
319 :     be:
320 :    
321 :     FooH.libh : string -> unit -> CMemory.addr
322 :    
323 :     That is, FooH.libh takes the name of a symbol and produces that
324 :     symbol's suspended address.
325 :    
326 :     The code that implements FooH.libh must be provided by the programmer.
327 :     In the above example, we assume that it is stored in file foo-h.sml.
328 :     The name of that file must appear in the generated .cm-file, hence the
329 :     "-include" command-line argument.
330 :    
331 :     Notice that the name provided to ml-nlffigen must be relative to the
332 :     output directory. Therefore, in our case it is "../foo-h.sml" and not
333 :     just foo-h.sml (because the full path would be FFI/../foo-h.sml).
334 :    
335 :     3. To actually implement FooH.libh, use the "DynLinkage" module.
336 :     Suppose the shared library's name is "/usr/lib/foo.so". Here is
337 :     the corresponding contents of foo-h.sml:
338 :    
339 :     +-------------------------------------------------------------
340 :     |structure FooH = struct
341 :     | local
342 :     | val lh = DynLinkage.open_lib
343 :     | { name = "/usr/lib/foo.so", global = true, lazy = true }
344 :     | in
345 :     | fun libh s = let
346 :     | val sh = DynLinkage.lib_symbol (lh, s)
347 :     | in
348 :     | fn () => DynLinkage.addr sh
349 :     | end
350 :     | end
351 :     |end
352 :     +-------------------------------------------------------------
353 :    
354 :     If all the symbols you are linking to are already available within
355 :     the ML runtime system, then you don't need to open a new shared
356 :     object. As a result, your FooH implementation would look like this:
357 :    
358 :     +-------------------------------------------------------------
359 :     |structure FooH = struct
360 :     | fun libh s = let
361 :     | val sh = DynLinkage.lib_symbol (DynLinkage.main_lib, s)
362 :     | in
363 :     | fn () => DynLinkage.addr sh
364 :     | end
365 :     |end
366 :     +-------------------------------------------------------------
367 :    
368 :     If the symbols your are accessing are strewn across several separate
369 :     shared objects, then there are two possible solutions:
370 :    
371 :     a) Open several shared libraries and perform a trial-and-error search
372 :     for every symbol you are looking up. (The DynLinkage module raises
373 :     an exception (DynLinkError of string) if the lookup fails. This
374 :     could be used to daisy-chain lookup operations.)
375 :    
376 :     [Be careful: Sometimes there are non-obvious inter-dependencies
377 :     between shared libraries. Consider using DynLinkage.open_lib'
378 :     to express those.]
379 :    
380 :     b) A simpler and more robust way of accessing several shared libraries
381 :     is to create a new "summary" library object at the OS level.
382 :     Supposed you are trying to access /usr/lib/foo.so and /usr/lib/bar.so.
383 :     The solution is to make a "foobar.so" object by saying:
384 :    
385 :     $ ld -shared -o foobar.so /usr/lib/foo.so /usr/lib/bar.so
386 :    
387 :     The ML code then referes to foobar.so and the Linux dynamic loader
388 :     does the rest.
389 :    
390 :     4. To put it all together, let's wrap it up in a .cm-file. For example,
391 :     if we simply want to directly make the ml-nlffigen-generated definitions
392 :     available to the "end user", we could write this wrapper .cm-file
393 :     (let's call it foo.cm):
394 :    
395 :     +-------------------------------------------------------------
396 :     |library
397 :     | library(FFI/foo.cm)
398 :     |is
399 :     | $/basis.cm
400 :     | $/c.cm
401 :     | FFI/foo.cm : make (-f foo.make)
402 :     +-------------------------------------------------------------
403 :    
404 :     Now, saying
405 :    
406 :     $ sml -m foo.cm
407 :    
408 :     is all one need's to do in order to compile. (CM will automatically
409 :     invoke "make", so you don't have to run "make" separately.)
410 :    
411 :     If the goal is not to export the "raw" ml-nlffigen-generated stuff
412 :     but rather something more nicely "wrapped", consider writing wrapper
413 :     ML code. Suppose you have wrapper definitions for structure Foo_a
414 :     and structure Foo_b with code for those in wrap-foo-a.sml and
415 :     wrap-foo-b.sml. In this case the corresponding .cm-file would
416 :     look like the following:
417 :    
418 :     +-------------------------------------------------------------
419 :     |library
420 :     | structure Foo_a
421 :     | structure Foo_b
422 :     |is
423 :     | $/basis.cm
424 :     | $/c.cm
425 :     | FFI/foo.cm : make (-f foo.make)
426 :     | wrapper-foo-a.sml
427 :     | wrapper-foo-b.sml
428 :     +-------------------------------------------------------------

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0