Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/ml-nlffi-lib/Doc/mini-tutorial.txt
ViewVC logotype

Annotation of /sml/trunk/src/ml-nlffi-lib/Doc/mini-tutorial.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 836 - (view) (download)

1 : blume 831 ML-NLFFI library and ML-NLFFIGEN glue code generator
2 :     ====================================================
3 :    
4 :     A very incomplete introduction
5 :     (by Matthias Blume (blume@research.bell-labs.com))
6 :    
7 : blume 836 !!! Warning: this currently works on x86/Linux only! !!!
8 : blume 831
9 :     The new NLFFI ("no-longer foreign function interface") is based on the
10 :     idea of data-level interoperability: ML code (a mixture of
11 :     pre-defined code imported from $/c.cm, code generated by ml-nlffigen,
12 :     and code the user writes) operates directly on C datastructures
13 :     without any marshalling/unmarshalling. There are no C stub routines
14 :     (no C glue code at all), and very little code on the ML side, just
15 :     enough to deal with "new" types (struct/union), with generating code
16 :     for C function calls, and with dynamic linking.
17 :    
18 :     There are three libraries that are part of ml-nlffi-lib, accessible
19 :     from CM as $/c.cm, $/c-int.cm, and $/memory.cm, but a user of this FFI
20 :     only needs one: $/c.cm.
21 :    
22 :     Library $/c.cm implements an encoding of the C type system in ML
23 :     types. This is exported as structure C. Moreover, there is a
24 :     structure DynLinkage that handles dynamic linking.
25 :     For details on structure C, see src/ml-nlffi-lib/c.sig.
26 :    
27 :     Thanks to ML's type inference, it is usually not necessary to spell out
28 :     many (if any) of the (rather complicated!) types exported by structure
29 :     C to be able to use this FFI.
30 :    
31 :     Conversely, at least in theory, if you are a competent ML programmer
32 :     but don't know C, then you could simply run the C code through
33 :     ml-nlffigen and read the signatures it produces...
34 :    
35 :     --------------------------------------------------------------------
36 :    
37 :     An example:
38 :    
39 :     Suppose you have a shared library nodelist.so that exports a global
40 :     function to generate lists of nodes. A C header file node.h explains
41 :     the interface:
42 :    
43 :     struct node {
44 :     int i;
45 :     struct node *next;
46 :     };
47 :    
48 :     /* produce n-element node list where first node's i is first,
49 :     * and where (x->next->i - x->i == incr) for all nodes except the
50 :     * last: */
51 :     struct node *gen (int n, int first, int incr);
52 :    
53 :     We run this header file through our FFI generator:
54 :    
55 :     $ ml-nlffigen node.h
56 :    
57 :     The result is a new CM library described by node.h.cm (which, in turn,
58 :     is implemented by node.h.sig and node.h.sml). The library exports a
59 :     structure Node which contains a functor NodeFn. We need to write some
60 :     ML glue code to instantiate this functor. In simple cases like ours,
61 :     the only argument the functor needs is a handle on the dynamic library
62 :     (nodelist.so). So we make a file node-glue.sml and write:
63 :    
64 :     structure Node =
65 :     Node.NodeFn (val library = DynLinkage.open_lib { name = "./nodelist.so",
66 :     global = true,
67 :     lazy = true })
68 :    
69 :     [Structure DynLinkage is an interface to dlopen/dlsym. To get access
70 :     to symbols that are already linked into the main program (i.e., SML/NJ's
71 :     runtime system), use DynLinkage.main_lib.]
72 :    
73 :     With this preparation we can now write a "client" module
74 :     (node-client.sml) that contains code to inspect results from calling
75 :     function gen. As an example, let us write two functions "len" and
76 :     "sum" which calculate the length of a list and the sum of a list's
77 :     elements, respectively, as well as a procedure "incall" which
78 :     traverses a list and increments every "i":
79 :    
80 :     structure NodeClient = struct
81 :     fun len l =
82 :     if C.Ptr.isNull l then 0
83 :     else 1 + len (C.Get.ptr (Node.S_node.f_next (C.Ptr.|*| l)))
84 :    
85 :     fun sum l =
86 :     if C.Ptr.isNull l then 0
87 :     else let val n = C.Ptr.|*| l
88 :     val i = C.Cvt.ml_sint (C.Get.sint (Node.S_node.f_i n))
89 :     val next = C.Get.ptr (Node.S_node.f_next n)
90 :     in
91 :     i + sum next
92 :     end
93 :    
94 :     fun incall l =
95 :     if C.Ptr.isNull l then ()
96 :     else let val n = C.Ptr.|*| l
97 :     val iobj = Node.S_node.f_i n
98 :     val i = C.Cvt.ml_sint (C.Get.sint iobj)
99 :     val next = C.Get.ptr (Node.S_node.f_next n)
100 :     in
101 :     C.Set.sint (iobj, C.Cvt.c_sint (i + 1));
102 :     incall next
103 :     end
104 :     end
105 :    
106 :     Notice how a combination of operators from the predefined structure C
107 :     (exported from $/c.cm) and operations from structure Node (resulting
108 :     from our instantiation of the ML-NLFFIGEN-generated functor
109 :     Node.NodeFn) was sufficient to traverse a C data structure, to inspect
110 :     its every detail, and to even modify it.
111 :    
112 :     Here is the key to this code:
113 :    
114 :     ML C
115 :    
116 :     C.Ptr.isNull <ptr> <ptr> == NULL
117 :     C.Ptr.|*| <ptr> *<ptr>
118 :     Node.S_node.f_next <struct> <struct>.next
119 :     C.Get.<foo> <obj> (lvalue in an rvalue context;
120 :     this is a fetch from memory which in C
121 :     happens implicitly when an lvalue turns
122 :     into an rvalue)
123 :     C.Set.<foo> (<obj>, <value>) <lvalue> = <rvalue>;
124 :     C.Cvt.ml_<foo> <value> abstract C value -> concrete ML value
125 :     C.Cvt.c_<foo> <value> concrete ML value -> abstract C value
126 :    
127 :     We can wrap all this up and make it into a CM library (node.cm):
128 :    
129 :     Library
130 :     structure Node
131 :     structure NodeClient
132 :     is
133 :     $/basis.cm
134 :     $/c.cm
135 :     node.h.cm
136 :     node-glue.sml
137 :     node-client.sml
138 :    
139 :     A better way of doing this -- automating the task of invoking
140 :     ml-ffigen -- would be:
141 :    
142 :     Library
143 :     structure Node
144 :     structure NodeClient
145 :     is
146 :     $/basis.cm
147 :     $/c.cm
148 :     node.h : shell (target:node.h.cm
149 :     ml-nlffigen %s)
150 :     node-glue.sml
151 :     node-client.sml
152 :    
153 :     -------------------------------------------------------------------------
154 :     Despite the fact that one usually does not need to deal with types
155 :     very much (thanks to ML's type inference), I will now briefly describe
156 :     the main ideas behind the types of the C module. I will generally
157 :     omit the "C." prefix, assuming a global "open C" to be in effect.
158 :    
159 :     1. Objects:
160 :    
161 :     Objects describe locations in memory that hold values of some C
162 :     type. (This roughly corresponds to C's notion of lvalues, although
163 :     not every object can appear on the left-hand side of an assignment
164 :     operator. For example, array objects cannot.)
165 :    
166 :     1.1 Object types:
167 :    
168 :     The ML type of objects is
169 :    
170 :     type ('t, 'f, 'c) obj
171 :    
172 :     Here, 't is a "phantom type" that describes the type of the value
173 :     stored in the object, 'c is the "constness" of the object (i.e.,
174 :     "ro" or "rw" --- depending on whether there was a "const" qualifier
175 :     in the C declaration or not), and 'f is a typing artifact having to
176 :     do with the treatment of function pointers. (For objects where the
177 :     instantiation of 't is not somehow based on a function pointer
178 :     type, 'f will always be "unit". For instances of 't that contain
179 :     the type phrase F fptr, 'f is going to be instantiated to F.)
180 :    
181 :     1.2. Fetching and storing:
182 :    
183 :     For certain types 't, there are fetch and store operations for the
184 :     corresponding objects. See substructures "Get" and "Set".
185 :    
186 :     If a type T has fetch/store operations for (T, ?, ?) obj, then we
187 :     call values of type T "first-class C values". For first-class
188 :     values, the phantom type coincides with the type of the value. (For
189 :     other (second-class) values, the phantom type is a true phantom
190 :     type because there are no constructable values. Second-class C
191 :     values do not exist outside of their corresponding objects.)
192 :    
193 :     2. Base types:
194 :    
195 :     Base types to be substituted for 't and their corresponding C types
196 :     are given below:
197 :    
198 :     ML C
199 :    
200 :     schar signed char
201 :     uchar unsigned char
202 :     sint signed int
203 :     uint unsigned int
204 :     sshort signed short
205 :     ushort unsigned short
206 :     slong signed long
207 :     ulong unsigned long
208 :     float float
209 :     double double
210 :     voidptr void *
211 :    
212 :     Notice that there is no equivalent for "void" since it is not a
213 :     "true" type in C either but has many different meanings depending
214 :     on the context where it is used.
215 :    
216 :     All types given above are abstract. To convert to or from concrete
217 :     ML types, use Cvt.ml_<foo> and Cvt.c_<foo>. These routines exist
218 :     for all of the above types except voidptr. They convert to and
219 :     from certain INTEGER, WORD, and REAL types which are collectively
220 :     defined in structure MLRep. For example, the x86 version of
221 :     structure MLRep.SInt is the same as Int32 and MLRep.Float as well
222 :     as MLRep.Double are the same as Real64. (Notice that the ML
223 :     representation type for different C types can be the same, but the
224 :     C types themselves are kept distinct to enforce a typing discipline
225 :     that is equivalent to what a C compiler would do.)
226 :    
227 :     3. Pointers:
228 :    
229 :     Pointers are first-class C types. Their ML type is
230 :    
231 :     type ('t, 'f, 'c) ptr
232 :    
233 :     A pointer of type (T, F, C) ptr points to an object of type
234 :     (T, F, C) obj. One can obtain the object by applying the Ptr.|*|
235 :     operator. Ptr.|&| goes the other way around.
236 :    
237 :     Pointers permit pointer arithmetic just like in C using Ptr.|+|
238 :     (for adding an integer to a pointer) and Ptr.|-| (for subtracting
239 :     two pointers). A pointer can be injected into the voidptr domain
240 :     using Ptr.inject. (It can also be recovered (projected) from the
241 :     voidptr domain using Ptr.project, but this requires run-time type
242 :     information. See below.)
243 :    
244 :     Since they are first-class, pointers can be fetched from and stored
245 :     into pointer objects (of type (('t, 'f, 'pc) ptr, 'f, 'c) obj,
246 :     where 'pc is the constness of the object pointed to by the pointer
247 :     and 'c is the constness of the object containing the pointer).
248 :    
249 :     The Ptr.sub operation is a shorthand for a combination of Ptr.|+|
250 :     and Ptr.|*|. (Or, alternatively, Ptr.|*| is the same as
251 :     fn p => Ptr.sub (p, 0).)
252 :    
253 :     4. Arrays:
254 :    
255 :     Arrays are second-class values. Their (phantom) type is
256 :    
257 : blume 836 type ('t, 'n) arr
258 : blume 831
259 :     Here, 't is the type of the values stored in the array's individual
260 :     elements, 'f here is the same as the 'f in the case of obj or ptr,
261 :     and 'n is a type describing the size of the array.
262 :    
263 :     4.1. Array dimensions:
264 :    
265 :     The Dim substructure defines an infinite family of types in such a
266 :     way that there is a 1-1 correspondence between natural numbers and
267 :     this family. In particular, if a positive natural number is
268 :     written in decimal and without leading zeros as <dn>...<d1><d0>,
269 :     where <di> are decimal digits, then the corresponding Dim type is
270 :    
271 :     dec dg<dn> ... dg<d1> dg<d0> dim
272 :    
273 :     which happens to be an abbreviation for
274 :    
275 :     (dec dg<dn> ... dg<d1> dg<d0>, nonzero) dim0
276 :    
277 :     (In case you wonder: The type corresponding to 0 is (dec, zero) dim0.)
278 :    
279 :     The connection to array types is this: An array of size N has type
280 :    
281 : blume 836 ('t, [N]) arr
282 : blume 831
283 :     iff "[N] dim" is the type assigned to N by our Dim construction.
284 :    
285 :     Example (assume "open Dim"):
286 :    
287 :     The C type (int[312]) is encoded as
288 :    
289 : blume 836 (sint, dec dg3 dg1 dg2) arr
290 : blume 831
291 :     In other words, if you "squint away" the "dec", the "dg"s, and the
292 :     spaces, then the array dimension gets spelled out in decimal.
293 :    
294 :     4.2. Operations over arrays:
295 :    
296 :     Since array types are second-class, there are no operations that
297 : blume 836 produce or consume values of type (?, ?) arr. Instead, we use
298 :     array objects of type ((?, ?) arr, ?, ?) obj.
299 : blume 831
300 :     Most operations related to array objects are in substructure Arr.
301 :    
302 :     Array subscript takes an array object and an integer i and produces
303 :     the object describing the i-th element of the array. It is
304 :     implemented in such a way that it performs bounds-checking: if i<0
305 :     or i>=N where N is the array's size, then General.Subscript will be
306 :     raised.
307 :    
308 :     To get C's behavior (no bounds checks), one can use pointer
309 :     subscript instead. This requires to first let the array "decay"
310 :     into a pointer to its first element. In C this happens implicitly
311 :     in many situtations, but in ML one must ask for it explicitly by
312 :     invoking Arr.decay.
313 :    
314 :     Given a value of type 'n Dim.dim one can reconstruct the array from
315 :     the pointer to its first element.
316 :    
317 :     5. Function pointers:
318 :    
319 :     Function pointers have type 'f fptr where 'f is always instantiated
320 :     to (A -> B) for some A and B. This instantiation for 'f propagates
321 : blume 836 through all those 'f components of obj-, ptr-, or T.typ-types whose
322 : blume 831 't component somehow involves the fptr-type.
323 :    
324 :     A function pointer of type (A -> B) fptr can be invoked with an
325 :     argument of type A and yields a result of type B by invoking the
326 :     "call" operator:
327 :    
328 :     val call: ('a -> 'b) fptr * 'a -> 'b
329 :    
330 :     Function pointers are first-class C values and can be stored in
331 :     function-pointer-objects as usual.
332 :    
333 :     The ML-FFIGEN program generator tool will arrange for every C
334 :     function prototype that occurs in a given piece of C code to define
335 :     a corresponding (A->B) fptr type. Here, A is derived from the
336 :     argument list of the C function and B describes the result type.
337 :     In particular, here is what happens:
338 :    
339 :     0. Vararg functions are not handled.
340 :    
341 :     1. If the argument list is (void) and the result type is not a
342 :     struct or union type, then A is unit.
343 :    
344 :     2. For the case of non-empty argument lists where the types of
345 :     the arguments are C types t1 ... tk, we form a "preliminary
346 :     ML argument list" [t1] ... [tk] as follows:
347 :     - If ti is a first-class C type, then [ti] is the
348 :     (light-weight version (see below) of the) corresponding ML
349 :     type describing ti.
350 :     - Otherwise, ti must be a struct or union type. For each
351 :     struct or union type, the ML-FFIGEN tool will generate a new
352 :     fresh phantom type X (as described later). A function
353 :     argument of such a type will be (X, unit, ro) obj'. This
354 :     is, on the ML side the function will expect a read-only
355 :     struct or union object.
356 :     (Notice the primed type "obj'"! We pass structs in
357 :     light-weight form. For an explanation of "light-weight",
358 :     see the discussion below.)
359 :    
360 :     3. If the result is of struct or union type Y, then an additional
361 :     argument of type (Y, unit, rw) obj' is prepended to the
362 :     preliminary argument list. This means that on the ML side
363 :     functions "returning" a struct or union must be passed a
364 :     corresponding writable struct or union object.
365 :    
366 :     4. Let the final argument type list (formed in step 2. or 3.) be
367 :     x1 ... xn. Type A will be the tuple x1 * ... * xn. In
368 :     particular, if there is only one type x1, then A = x1.
369 :    
370 :     The result type B is formed as follows:
371 :    
372 :     1. If the C return type is "void", then B is "unit".
373 :     2. If the C return type is a struct or union, then B
374 :     coincides with the type of the first argument, i.e.,
375 :     it is the same as the first element of the tuple that is A.
376 :     (On the ML side, the function, when called, will return its
377 :     first argument after having stored the struct or union
378 :     that was returned by the C function into it.)
379 :     3. Otherwise the return type must be a first-class C type and
380 :     B will be that type's (light-weight) ML-side representation.
381 :    
382 :     6. Run-time type information:
383 :    
384 :     For every object of type ('t, 'f, 'c) obj there is corresponding
385 :     run-time type information that describes values of type 't. RTI is
386 :     used mainly to keep track of size information (needed for pointer
387 :     arithmetic), but it also facilitates array bounds checking.
388 :    
389 :     Most of the time this information is kept completely behind the
390 :     scenes, but in some situations the programmer might want to use it
391 :     directly.
392 :    
393 :     In the part of the interface that has been described up to here,
394 :     there is really only one place that requires run-time type
395 :     information: Ptr.project. A voidptr together with type information
396 :     describing a non-void pointer's target type can be used to "cast" the
397 :     voidptr to that pointer type.
398 :    
399 :     RTI is used extensively in the other "light-weight" part of the
400 :     interface. (See below.) It can be extracted from existing objects
401 :     using T.typeof or can be constructed directly using the value
402 :     constructors of substructure(s) T (and Dim).
403 :    
404 :     Example, RTI for a 12-element array of pointers to constant ints:
405 :    
406 :     let open C open Dim in
407 :     T.arr (T.ro (T.ptr T.sint), dec dg1 dg2 dim)
408 :     end
409 :    
410 :     (Note: The "dec dg1 dg2 dim" in the example above is an
411 :     _expression_ that returns a Dim.dim value. And, by construction,
412 :     the type of that expression also happens to be "dec dg1 dg2 dim".)
413 :    
414 :     7. Light-weight interface:
415 :    
416 :     The concrete representation for values of obj-, ptr-, and fptr-type
417 :     carries run-time type information. This makes the interface
418 :     convenient to use, because RTI is hidden behind the scenes. It is
419 :     also somewhat inefficient because RTI must be tracked (and operated
420 :     upon) for most operations.
421 :    
422 :     Light-weight versions of these types (constructors carry a prime in
423 : blume 836 their names: "obj'", "ptr'", "fptr'") do not use RTI in their
424 : blume 831 concrete representations. This is more efficient for all
425 :     operations that don't need access to RTI. On the downside, it
426 :     means that RTI must be passed in explicitly by the programmer for
427 :     operations that do.
428 :    
429 :     To make passing of type information statically safe (i.e., to
430 :     disallow mixing a C value of one type with type information
431 :     corresponding to a different type), RTI itself has a static ML
432 :     type. In particular, the RTI for a value stored in a "('t, 'f, 'c)
433 :     obj" object will have type "('t, 'f) T.typ".
434 :    
435 :     Array subscript, to name one example, on light-weight array objects
436 :     enforces correct usage of RTI using ML's static typing:
437 :    
438 : blume 836 Arr.sub' : (('t, 'n) arr, 'f) T.typ ->
439 :     (('t, 'n) arr, 'f, 'c) obj' * int -> ('t, 'f, 'c) obj'
440 : blume 831
441 :     7.1 Light vs. heavy:
442 :    
443 :     One can convert between light and heavy versions by using the
444 :     functions in substructures Light and Heavy.
445 :    
446 :     7.2 Slimmed-down RTI: Run-time size information
447 :    
448 :     Our RTI contains a lot of information that is not needed in many
449 :     situations. For example, we can extract RTI for a pointer's
450 :     element type from the RTI for the pointer type. In many cases all
451 :     we need is _size_ information (which, internally, is just number).
452 :     Definitions pertaining to run-time size information are collected
453 :     in substructure S. Like RTI itself, we give static types to sizes:
454 :    
455 :     type 't size
456 :    
457 :     Size information can be obtained from RTI (but not vice versa):
458 :    
459 :     T.sizeof : ('t, 'f) T.typ -> 't S.size
460 :    
461 :     Light-weight pointer arithmetic uses size information for the
462 :     element type:
463 :    
464 :     Ptr.|+! : 't S.size -> ('t, 'f, 'c) ptr' * int -> ('t, 'f, 'c) ptr'
465 :    
466 :     (NB: C types are monomorphic. In ML programs we can precompute size
467 :     info for any monomorphic type, so with a bit of help from a
468 :     cross-module inliner and the compiler's value-propagation- and
469 :     constant-folding phases we should see machine code very similar to
470 :     what a C compile would produce.)
471 :    
472 :     8. Struct- and union-types:
473 :    
474 :     A struct- or union-declaration in C declares a brand-new type. In
475 :     C, struct- and union-types are of class "one-and-a-half", so to
476 :     speak. They are not truly first-class because the only operations
477 :     on values of these types end up being what amounts to "copy"
478 :     operations from objects to other objects. Struct/union- assignment
479 :     is clearly in this category and passing structs/unions as function
480 :     arguments is essentially the same. (Passing the argument amounts to
481 :     copying the struct/union into the object that gets allocated for
482 :     the corresponding formal parameter.) The only exception seems to
483 :     come from struct/union return values, but C compilers tend to
484 :     implement this by allocating a new (unnamed) struct object for
485 :     holding the return value, so that struct/union return also amounts
486 :     to copying into struct/union objects.
487 :    
488 :     For these reasons (and to avoid having to implement a struct/union
489 :     value type), this FFI treats struct/union types as second-class
490 :     types and provides copy operations separately. The treatment of
491 :     function calls involving struct/union types has already been
492 :     described above.
493 :    
494 :     On the ML side, each struct/union type is implemented as an
495 :     abstract data type. The type definition as well as operations over
496 :     objects involving this type are generated by the ML-NLFFIGEN tool.
497 :    
498 :     Consider once again our introductory example:
499 :    
500 :     struct node {
501 :     int i;
502 :     struct node * next;
503 :     };
504 :    
505 :     The ML-side equivalent to this is an abstract type "s_node su"
506 :     (which will be the phantom type for struct node) and a
507 :     corresponding structure "S_node" that contains operations for this
508 :     type. (For a union, replace "s_" with "u_" and "S_" with "U_".)
509 :    
510 :     The signature for S_node generated by ML-NLFFIGEN will be the
511 :     following (note that it makes use of several type abbreviations
512 :     such as su_obj, sint_obj, etc. that are provided by the FFI):
513 :    
514 :     structure S_node : sig (* struct node *)
515 :     type tag = s_node
516 :    
517 :     (* size for this struct *)
518 :     val size : s_node su S.size
519 :    
520 :     (* RTI for this struct *)
521 :     val typ : s_node T.su_typ
522 :    
523 :     (* witness types for fields *)
524 :     type t_f_i = sint
525 :     type t_f_next = (s_node su, unit, rw) ptr
526 :    
527 :     (* RTI for fields *)
528 :     val typ_f_i : T.sint_typ
529 :     val typ_f_next : ((s_node su, unit, rw) ptr, unit) T.typ
530 :    
531 :     (* field accessors *)
532 :     val f_i : (s_node, 'c) su_obj -> 'c sint_obj
533 :     val f_next :
534 :     (s_node, 'c) su_obj ->
535 :     ((s_node su, unit, rw) ptr, unit, 'c) obj
536 :    
537 :     (* field accessors (lightweight variety) *)
538 :     val f_i' : (s_node, 'c) su_obj' -> 'c sint_obj'
539 :     val f_next' :
540 :     (s_node, 'c) su_obj' ->
541 :     ((s_node su, unit, rw) ptr, unit, 'c) obj'
542 :     end (* structure S_node *)
543 :    
544 :     We find RTI and size info for the new type, RTI for all the
545 :     field's types, and access methods that map struct objects to
546 :     corresponding field objects. Access methods are provided both in
547 :     normal and in light-weight form.
548 :    
549 :     The access method for a field declared "const" maps struct objects
550 :     of arbitrary constness to field objects where 'c is instantiated
551 :     with "ro". The access method for other fields maps the constness
552 :     for the whole struct object to the constness of the field object.
553 :     The name of an access method is the name of the field prepended
554 :     with "f_" (and followed by "'" in case of the light-weight version).
555 :     The reader can probably infer the other naming conventions from the
556 :     example.
557 :    
558 :     Bitfields (not shown here) are special because they are not
559 :     first-class values and there are no ordinary objects that hold
560 :     bitfields. This FFI provides separate abstract types for signed and
561 :     unsigned bitfields, and access methods for C bitfields map the
562 :     struct object to such (ML-) bitfields.
563 :    
564 :     8.1 Equivalence of struct/union types:
565 :    
566 :     It is not literally true that ML-NLFFIGEN will generate a brand-new
567 :     type for every struct or union it sees. Instead, it draws from
568 :     another infinite family of abstract "tag types" which has been
569 :     predefined. (This works in a way similar to Dim.dim.)
570 :    
571 :     As a result, two separate mentions of struct foo in different C
572 :     source files that belong to the same program will produce ML code
573 :     which still identifies these two struct foos.
574 :    
575 :     9. Global exports and their types:
576 :    
577 :     9.1 Global variables:
578 :    
579 :     Global variables will be represented by a corresponding thunkified
580 :     object. The thunk's name is the same as the variable's name
581 :     prepended with "g_".
582 :     Examples:
583 :    
584 :     C ML
585 :    
586 :     int i; val g_i : unit -> (sint, unit, rw) obj
587 :     const unsigned j; val g_j : unit -> (uint, unit, ro) obj
588 :     int (**f)(void); val g_f : unit ->
589 :     (((unit -> sint) fptr, unit->sint, rw) ptr,
590 :     unit->sint, rw) obj
591 :    
592 :     (Fortunately, the types will all be generated by ML-NLFFIGEN, so
593 :     the programmer will not have to write down ugly things like the
594 :     type for f.)
595 :    
596 :     9.2 Global functions:
597 :    
598 :     Exported C functions will be represented by three distinct ML
599 :     values:
600 :    
601 :     1. A thunkified fptr value of corresponding type. The name of
602 :     the thunk is "fptr_fn_" concatenated with the name of the
603 :     function.
604 :     2. An ML function that takes an argument list similar to the
605 :     fptr in 1., but where those arguments/results that have a
606 :     corresponding concrete ML representation (in MLRep, via
607 :     substructure Cvt) have already been translated and
608 :     light-weight struct/union objects (for passing/returning
609 :     structs and unios) have been translated to their heavy
610 :     versions. The name of the ML function is the name of the C
611 :     function prepended with "fn_".
612 :     3. An ML function like in 2., but with all arguments/results
613 :     that have a light-weight version having been translated to
614 :     that. The name of the ML function is the same as that in
615 :     2. but with a trailing apostrophe ("'") added.
616 :    
617 :     To see the difference between 1. and 2./3., consider a C function
618 :     from int to int. The ML fptr type would be
619 :    
620 :     (sint -> sint) fptr
621 :    
622 :     and calling it via "call" requires an abstract "sint" argument.
623 :     Type "sint" is not equal to its ML representation type
624 :     (MLRep.SInt.int = Int32.int), so in order to pass an ML Int32.int
625 :     value one must apply Cvt.c_sint "by hand".
626 :     (The reason for "sint" not being equal to Int32.int is that the
627 :     representation types for other abstract C types might also be
628 :     Int32.int. For example, the current implementation uses
629 :     MLRep.SShort.int = Int32.int which would force sint = sshort had the
630 :     C types not been abstract. But we definitely want to have types
631 :     sint and sshort be distinct!)
632 :    
633 :     9.4: Persistence of C values:
634 :    
635 :     C values are transient in that they do not stay valid across
636 :     SMLofNJ.export{ML,Fn} and a restart using the resulting heap
637 :     image. The only things that stay valid are the thunks for global
638 :     variables and global function pointers. (Since the generated
639 :     global ML functions that represent global C functions re-invoke
640 :     the function-pointer thunk every time they are called, they also
641 :     stay valid.)
642 :    
643 :     10. Functorization:
644 :    
645 :     The ML-NLFFIGEN tool produces a structure containing a functor for
646 :     every C source file it is presented with. The functor will at least
647 :     take the library argument shown in the example. However, there are
648 :     case when it requires additional arguments.
649 :    
650 :     Extra functor arguments are required every time the C source file
651 :     refers to "incomplete pointer types" -- pointers to structs that
652 :     are not declared.
653 :    
654 :     For example, if the source file mentions "struct foo*" without
655 :     spelling out what "struct foo" is, then the resulting functor will
656 :     take an argument of the form:
657 :    
658 :     structure I_S_foo : POINTER_TO_INCOMPLETE_TYPE
659 :    
660 :     That is, the functor argument must be a structure satisfying
661 :     signature POINTER_TO_INCOMPLETE_TYPE.
662 :    
663 :     There are two ways of obtaining a matching structure for the
664 :     purpose of passing it to the functor:
665 :    
666 :     1. If the type is to be treated as "abstract", then a fresh
667 :     incomplete pointer type can be obtained by invoking functor
668 :     PointerToIncompleteType (without arguments).
669 :     If the same incomplete type is mentioned in more than one place,
670 :     make sure you generate only one fresh instantiation for it,
671 :     i.e., invoke PointerToIncompleteType only once and pass the
672 :     result to all functors that require it.
673 :    
674 :     2. If the type is incomplete in one file but gets spelled out in
675 :     another, then one can produce the matching structure from
676 :     that by applying functor PointerToCompleteType to the
677 :     structure S_foo that describes "struct foo".
678 :    
679 :     Suppose module Bar defines struct foo. Then we have a
680 :     structure Bar and a functor Bar.BarFn which, when applied,
681 :     would define structure S_foo. PointerToCompleteType could be
682 :     applied to this structure. However, there is a partial version
683 :     of the same structure (only containing a type definition and
684 :     some RTI) known as Bar.S_foo. The partial version is
685 :     sufficient for invoking PointerToCompleteType -- which is
686 :     important to break dependency cycles and avoiding the
687 :     chicken-and-egg problem in the case of mutually recursive
688 :     types involving incomplete pointers.
689 :    
690 :     The main point of using PointerToCompleteType is to let
691 :     client code "see" that 'c I_S_foo.iptr is the same as
692 :     (s_foo su, unit, 'c) ptr.
693 :    
694 :     Client code that must be written without the benefit of having
695 :     access to the real definition of "struct foo" but which must
696 :     leave open the possibility of interacting with other code that
697 :     does must itself be functorized (leaving the instantiation of
698 :     I_S_foo to _its_ clients.)
699 :    
700 :     ----------------------------------------------------------------
701 :    
702 :     Invoking ml-nlffigen:
703 :    
704 :     The ML-NLFFIGEN tool is a stand-alone program ml-nlffigen which can be
705 :     invoked from the shell command line. It takes one mandatory argument
706 :     <cfile> which is the file name of the C code that describes the
707 :     interface to be implemented.
708 :    
709 :     The mandatory argument can be preceeded by any combination of the
710 :     following options:
711 :    
712 :     -sigfile <file> name of the signature file to be generated
713 :     (default: <cfile>.sig)
714 :     -strfile <file> name of the structure file to be generated
715 :     (default: <cfile>.sml)
716 :     -cmfile <file> name of the .cm-file to be generated
717 :     (default: <cfile>.cm;
718 :     This is the file that needs to be mentioned
719 :     in the client .cm-file. See node.h.cm
720 :     vs. node.cm in our example.)
721 :     -signame <name> name of the signature to be generated
722 :     (The default is obtained by taking <cfile>,
723 :     stripping the extension, capitalizing
724 :     all letters, and turning embedded dots
725 :     and dashes into underscores.
726 :     Example: f.oo-bar.h --> F_OO_BAR)
727 :     -strname <name> name of the structure to be generated
728 :     If the structure's name is <foo>, then
729 :     the name of the functor contained therein
730 :     will be <foo>Fn.
731 :     (The default is obtained by taking <cfile>,
732 :     stripping the extension, dividing the
733 :     remainder into sections at dot- and
734 :     dash-boundaries, capitalizing the first
735 :     letter of each section, and then joining
736 :     them.
737 :     Example: foo-bar.h --> FooBar)
738 :     -allSU Normally the tool will treat all
739 :     struct or union definitions that are
740 :     not spelled out in <cfile> as
741 :     incomplete (even if <cfile> includes
742 :     a header file that spells them out).
743 :     This flag will force ml-ffigen to follow
744 :     treat included header files the same
745 :     as <cfile>.
746 :     (Structs and unions whose tags start with
747 :     an underscore are _always_ treated
748 :     incomplete.)
749 :     -width Target text width for pretty-printing
750 :     ML code. The pretty-printer occasionally
751 :     overruns this limit, though.
752 :     -lambdasplit <arg> places "(lambdasplit:<arg>)" after
753 :     the names of ML source files in the
754 :     generated .cm file. (This controls
755 :     the cross-module inlining machinery
756 :     of the SML/NJ compiler.)

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0