Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/system/README
ViewVC logotype

Annotation of /sml/trunk/src/system/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 645 - (view) (download)

1 : blume 537 Compiler Hacker's Guide to the new CM...
2 :     ========================================
3 : monnier 416
4 : blume 643 Last change: 12/5/2000
5 : blume 537
6 : monnier 416 * Libraries
7 :     -----------
8 :    
9 :     The new way of building the compiler is heavily library-oriented.
10 :     Aside from a tiny portion of code that is responsible for defining the
11 :     pervasive environment, _everything_ lives in libraries. Building the
12 :     compiler means compiling and stabilizing these libraries first. Some
13 :     of the libraries exist just for reasons of organizing the code, the
14 :     other ones are potentially useful in their own right. Therefore, as a
15 :     beneficial side-effect of compiling the compiler, you will end up with
16 :     stable versions of these libraries.
17 :    
18 :     At the moment, the following libraries are constructed when compiling
19 :     the compiler ("*" means that I consider the library potentially useful
20 :     in its own right):
21 :    
22 : blume 643 * $/basis.cm SML'97 Basis Library (pre-loaded)
23 :     * $/smlnj-lib.cm SML/NJ Utility Library
24 :     * $/html-lib.cm SML/NJ HTML Library
25 :     * $/pp-lib.cm SML/NJ Pretty-print Library
26 : monnier 416
27 : blume 643 * $/ml-yacc-lib.cm SML/NJ ML-Yacc runtime library
28 : blume 573
29 : blume 643 * $smlnj/compiler/{alpha32,hppa,ppc,sparc,x86}.cm
30 : blume 573 cross-compiler libraries, exporting
31 :     structure <Arch>Compiler
32 : blume 643 * $smlnj/compiler/current.cm structure Compiler (current arch)
33 :     * $smlnj/compiler/all.cm all cross-compilers and all cross-CMBs
34 : blume 573
35 : blume 643 * $smlnj/cm/minimal.cm minimal CM (pre-loaded)
36 :     * $smlnj/cm/full.cm full structure CM (see manual)
37 :     * $smlnj/cm/tools.cm CM tools library
38 : blume 573
39 : blume 643 * $smlnj/cmb/{alpha32,hppa,ppc,sparc,x86}-unix.cm
40 : blume 573 cross-bootstrap-compilers for Unix
41 :     (structure <Arch>UnixCMB)
42 : blume 643 * $smlnj/cmb/ppc-macos.cm ...for Mac (structure PPCMacosCMB)
43 :     * $smlnj/cmb/x86-win32.cm ...for Windoze (structure X86Win32CMB)
44 :     * $smlnj/cmb/current.cm structure CMB (current arch/os)
45 : blume 573
46 : blume 643 * $smlnj/compiler.cm abbrev. for $smlnj/compiler/current.cm
47 :     * $smlnj/cm.cm abbrev. for $smlnj/cm/full.cm
48 :     * $smlnj/cmb.cm abbrev. for $smlnj/cmb/current.cm
49 : blume 573
50 : blume 643 * $comp-lib.cm Utility library for compiler
51 : blume 573
52 : blume 643 - $smlnj/viscomp/core.cm Compiler core functionality
53 :     - $smlnj/viscomp/{alpha32,hppa,ppc,sparc,x86}.cm
54 : blume 573 Machine-specific parts of compiler
55 :    
56 : blume 643 - $smlnj/internal/{intsys,cm-lib,cm-hook,host-compiler-0}.cm
57 : blume 573 Glue that holds the interactive system
58 :     together
59 :    
60 : blume 643 * $MLRISC/{MLRISC,Control,Lib,ALPHA,HPPA,PPC,SPARC,IA32}.cm
61 : blume 573 Various MLRISC bits
62 : blume 643 (Other MLRISC libraries such as
63 :     Graph, Visual, etc. do not currently
64 :     take part in the SML/NJ build.)
65 : blume 573
66 : blume 643 * ${mlyacc,mllex,mlburg}-tool.cm CM plug-in libraries for common tools
67 :     * ${grm,lex,burg}-ext.cm CM plug-in libraries for common file
68 : blume 573 extensions
69 :    
70 : blume 643 Paths of the form $/foo/... are shorthands for $foo/foo/..., paths of
71 :     the form $/singlearc can also be written as $singlearc (e.g.,
72 :     $basis.cm instead of $/basis.cm). The usefulness of the latter
73 :     shorthand is controversial, so you should probably try to avoid it.
74 :    
75 :     A more complete explanation of the $-notation can be found later in
76 :     this document or in the CM manual.
77 :    
78 :     To learn about the definitions of the $-anchors (and, thus, where in
79 :     the source tree the above libraries are defined), consult the
80 :     "pathconfig" file here in this directory.
81 :    
82 : monnier 416 * Before you can use the bootstrap compiler (CMB)...
83 :     ----------------------------------------------------
84 :    
85 :     To be able to use CMB at all, you must first say
86 :    
87 : blume 643 CM.autoload "$smlnj/cmb.cm";
88 : monnier 416
89 : blume 569 after you start sml. Alternatively -- and perhaps more conveniently --
90 : blume 643 you can provide "$smlnj/cmb.cm" as a command-line argument to sml:
91 : monnier 416
92 : blume 643 $ sml '$smlnj/cmb.cm'
93 : monnier 416
94 : blume 643 (Be sure to protect the dollar symbol which usually has its own
95 :     special meaning to the shell.)
96 :    
97 : blume 569 * Compiling the compiler
98 :     ------------------------
99 : monnier 416
100 : blume 569 We are now back to the old scheme where a call to CMB.make() suffices to
101 :     build a bootable set of files (libraries in our case). CMB.make maintains
102 :     two parallel hierarchies of derived files:
103 : monnier 416
104 : blume 569 1. the binfile hierarchy ("binfiles"), containing compiled objects for
105 :     each individual ML source file; this hierarchy is rooted at
106 :     <prefix>.bin.<arch>-<opsys>
107 :     2. the stable library hierarchy ("boot files"), containing library files
108 :     for each library that participates in building SML/NJ; this hierarchy
109 :     is rooted at
110 :     <prefix>.boot.<arch>-<opsys>
111 : monnier 416
112 : blume 569 The default for <prefix> is "sml". It can be changed by using
113 :     CMB.make' with the new <prefix> as the optional string argument.
114 : monnier 416
115 : blume 643 CMB.make reuses existing bootfiles after it has verified that they are
116 :     consistent with their corresponding binfiles. Bootfiles do not need
117 :     to be deleted in order for CMB.make to work correctly.
118 : monnier 416
119 : blume 569 To bootstrap a new system (using the runtime system boot loader), the
120 :     bootfiles _must_ be present, the binfiles need not be present (but
121 :     their presence does not hurt either).
122 : monnier 416
123 : monnier 498 You can reduce the number of extra files compiled and stabilized
124 : blume 569 during CMB.make at the expense of not building any cross-compilers.
125 : monnier 498 For that, say
126 :     #set (CMB.symval "LIGHT") (SOME 1);
127 : blume 569 before running CMB.make.
128 : monnier 498
129 : monnier 416 * Making the heap image
130 :     -----------------------
131 :    
132 :     The heap image is made by running the "makeml" script that you find
133 :     here in this directory. By default it will try to refer to the
134 : monnier 498 sml.boot.<arch>-<os> directory. You can change this using the -boot
135 : monnier 416 argument (which takes the full name of the boot directory to be used).
136 :    
137 :     The "feel" of using makeml should be mostly as it used to. However,
138 :     internally, there are some changes that you should be aware of:
139 :    
140 : blume 569 1. The script will make a heap image and build a separate library directory
141 : blume 643 that contains (hard) links to the library files in the bootfile directory.
142 : monnier 416
143 :     2. There is no "-full" option anymore. This functionality should
144 :     eventually be provided by a library with a sufficiently rich export
145 :     interface.
146 :    
147 :     3. No image will be generated if you use the -rebuild option.
148 :     Instead, the script quits after making new bin and new boot
149 :     directories. You must re-invoke makeml with a suitable "-boot"
150 :     option to actually make the image. The argument to "-rebuild"
151 :     is the <prefix> for the new bin and boot directories (see above).
152 :    
153 : blume 643 [Note: When the -rebuild option is specified, then the boot procedure
154 :     will not read static environments from the boot directory. Instead,
155 :     after the ML code has been loaded and linked, the system will invoke
156 :     CMB.make' with the argument that was given to -rebuild. After
157 :     CMB.make' is done, the system quits. In essence, makeml with -rebuild
158 :     acts as a bootstrap compiler that is not dependent on any usable
159 : blume 645 static environments.]
160 : blume 643
161 : blume 569 Makeml will not destroy the bootfile directory.
162 : monnier 416
163 :     * Testing a newly generated heap image
164 :     --------------------------------------
165 :    
166 :     If you use a new heap image by saying "sml @SMLload=..." then things
167 :     will not go as you may expect because along with the new heap image
168 :     should go those new stable libraries, but unless you do something
169 : blume 569 about it, the newly booted system will look for its stable libraries
170 : blume 643 in places where you stored your _old_ stable libraries. (After just
171 :     having done "makeml", these "places" would be within the boot file
172 :     hierarchy under <prefix>.boot.<arch>-<os>.)
173 : monnier 416
174 :     After you have made the new heap image, the new libraries are in a
175 :     separate directory whose name is derived from the name of the heap
176 : blume 569 image. (Actually, only the directory hierachy is separate, the
177 : blume 643 library files themselves are hard links.) The "testml" script that
178 :     you also find here will run the heap image and instruct it to look for
179 :     its libraries in that new library directory by setting the
180 :     CM_PATHCONFIG environment variable to point to a different pathconfig
181 :     file under <prefix>.lib.
182 : monnier 416
183 : monnier 498 "testml" takes the <prefix> of the heap image as its first
184 :     argument. All other arguments are passed verbatim to the ML process.
185 :    
186 :     The <prefix> is the same as the one used when you did "makeml". If
187 :     you run "testml" without arguments, <prefix> defaults to "sml".
188 : blume 645 Thus, if you just said "makeml" without arguments you can also say
189 :     "testml" without arguments. (Note that you _must_ supply the <prefix>
190 : monnier 498 argument if you intend to pass any additional arguments.)
191 :    
192 : monnier 416 * Installing a heap image for more permanent use
193 :     ------------------------------------------------
194 :    
195 : monnier 498 You can "install" a newly generated heap image by replacing the old
196 : blume 645 image with the new one AND AT THE SAME TIME replacing the old stable
197 : monnier 498 libaries with the new ones. To do this, run the "installml" script.
198 : monnier 416
199 : monnier 498 Like "testml", "installml" also expects the <prefix> as its first
200 :     argument. <prefix> defaults to "sml" if no argument is specified.
201 :    
202 :     "installml" patches the ../../lib/pathconfig file to reflect any
203 : blume 643 changes or additions to the path name mapping. (I say "patches"
204 :     because entries unrelated to the SML/NJ build process are retained in
205 :     their original form.) If you want to use a destination directory that
206 :     is different from ../../lib, then you must do this by hand (i.e.,
207 : blume 645 installml does not have an option for that).
208 : monnier 498
209 : blume 569 Thus, after a successful CMB.make, you should say
210 : monnier 498
211 :     ./makeml
212 :    
213 :     to make the new heap image + libraries, then
214 :    
215 :     ./testml
216 :    
217 :     to make sure everything works, and finally
218 :    
219 :     ./installml
220 :    
221 :     to replace your existing compiler with the one you just built and tested.
222 :    
223 : monnier 416 * Cross-compiling
224 :     -----------------
225 :    
226 : blume 643 All cross-compilers live in the "$smlnj/compiler/all.cm" library.
227 :     (The source tree for the "$smlnj" anchor -- see "pathconfig" -- is
228 :     src/system/smlnj, but this should normally not concern you.)
229 :     You must first say
230 : monnier 416
231 : blume 643 CM.autoload "$smlnj/compiler/all.cm";
232 : monnier 416
233 :     before you can access them. (This step corresponds to the old
234 :     CMB.retarget call.) After that, _all_ cross-compilers are available
235 :     at the same time. However, the ones that you are not using don't take
236 :     up any undue space because they only get loaded once you actually
237 : blume 645 mention them at top level. The names of the structures currently
238 : blume 643 exported by $smlnj/compiler/all.cm are:
239 : monnier 416
240 :     structure Alpha32UnixCMB
241 :     structure HppaUnixCMB
242 :     structure PPCMacOSCMB
243 :     structure PPCUnixCMB
244 :     structure SparcUnixCMB
245 :     structure X86UnixCMB
246 :     structure X86Win32CMB
247 :    
248 :     structure Alpha32Compiler
249 :     structure HppaCompiler
250 :     structure PPCCompiler
251 :     structure SparcCompiler
252 :     structure X86Compiler
253 :    
254 :     (PPCMacOSCMB is not very useful at the moment because there is no
255 :     implementation of the basis library for the MacOS.)
256 :    
257 : monnier 498 Alternatively, you can select just the one single structure that you
258 : blume 643 are interested in by auto-loading $smlnj/compiler/<arch>.cm or
259 :     $smlnj/cmb/<arch>-<os>.cm.
260 : monnier 498 <arch> currently ranges over "alpha32", "hppa", "ppc", "sparc", and "x86.
261 :     <os> can be either "unix" or "macos" or "win32".
262 :     (Obviously, not all combinations are valid.)
263 :    
264 : blume 643 Again, as with $smlnj/cmb.cm, you can specify the .cm file as an
265 : blume 569 argument to the sml command:
266 :    
267 : blume 643 $ sml '$smlnj/compiler/all.cm'
268 : blume 569
269 :     or
270 :    
271 : blume 643 $ sml '$smlnj/cmb/alpha32-unix.cm'
272 : blume 569
273 : blume 643 [Note: The command line for the "sml" command accepts configuration
274 :     parameters of the form "@SMLxxx...", mode switches of the form "-m"
275 :     and "-a", names of ML files -- which are passed to "use" -- and
276 :     arguments suitable for CM.make or CM.autoload. CM.autoload is the
277 :     default; the "-m" and "-a" mode switches can be used to change the
278 :     default -- even several times within the same command line.
279 :     A single argument "@CMslave" is also accepted, but it should not be
280 :     used directly as it is intended for use by the parallel compilation
281 :     facility within CM.]
282 :    
283 : monnier 416 * Path configuration
284 :     --------------------
285 :    
286 :     + Basics:
287 :    
288 :     One of the new features of CM is its handling of path names. In the
289 :     old CM, one particular point of trouble was the autoloader. It
290 :     analyzes a group or library and remembers the locations of associated
291 :     files. Later, when the necessity arises, those files will be read.
292 :     Therefore, one was asking for trouble if the current working directory
293 :     was changed between analysis- and load-time, or, worse, if files
294 :     actually moved about (as is often the case if build- and
295 :     installation-directories are different, or, to put it more generally,
296 :     if CM's state is frozen into a heap image and used in a different
297 :     environment).
298 :    
299 :     Maybe it would have been possible to work around most of these
300 :     problems by fixing the path-lookup mechanism in the old CM and using
301 :     it extensively. But path-lookup (as in the Unix-shell's "PATH") is
302 :     inherently dangerous because one can never be too sure what it will be
303 :     that is found on the path. A new file in one of the directories early
304 :     in the path can upset the program that hopes to find something under
305 :     the same name later on the path. Even when ignoring security-issues
306 :     like trojan horses and such, this definitely opens the door for
307 : monnier 498 various unpleasant surprises. (Who has never named a test version
308 : monnier 416 of a program "test" an found that it acts strangely only to discover
309 :     later that /bin/test was run instead?)
310 :    
311 :     Thus, the new scheme used by CM is a fixed mapping of what I call
312 :     "configuration anchors" to corresponding directories. The mapping can
313 :     be changed, but one must do so explicitly. In effect, it does not
314 :     depend on the contents of the file system. Here is how it works:
315 :    
316 : blume 643 If I specify a pathname that starts with a "$", then the first arc
317 :     between "$" and the first "/" is taken as the name of a so-called
318 :     "anchor". CM knows a mapping from anchor names to directory names and
319 :     replaces the prefix $<anchor> with the name of the corresponding
320 :     directory. Therefore, an anchored path has the general form
321 : monnier 416
322 : blume 643 $<anchor>/<path>
323 : monnier 416
324 : blume 643 It is important that there is at least one arc in <path>. In other
325 :     words, the form $<anchor> is NOT valid. (Actually, it currently is
326 :     valid, but CM interprets it in a different way.)
327 :    
328 :     Examples:
329 :    
330 :     $smlnj/compiler/all.cm
331 :     $basis.cm/basis.cm
332 :     $MLRISC/Control.cm
333 :    
334 :     The special case where <anchor> coincides with the first arc of <path>
335 :     can be abbreviated by ommitting <anchor>. This leads to the shorthand
336 :    
337 :     $/<anchor>/<more>...
338 :    
339 :     for the longer
340 :    
341 :     $<anchor>/<anchor>/<more>...
342 :    
343 :     Examples:
344 :    
345 :     $/foo/bar/baz.cm (* same as $foo/foo/bar/baz.cm *)
346 :     $/basis.cm (* same as $basis.cm/basis.cm *)
347 :    
348 :     Currently, CM accepts one additional shorthand for the case where
349 :     <path> has precisely one arc that coincides with <anchor>. Here, the
350 :     slash "/" can be ommitted, too.
351 :    
352 :     Examples:
353 :    
354 :     $basis.cm (* same as $/basis.cm or $basis.cm/basis.cm *)
355 :     $nw-ext.cm (* same as $/nw-ext.cm or $nw-ext.cm/nw-ext.cm *)
356 :    
357 :     Previously, CM used "implicit" anchors where anchored paths simply
358 :     have the form
359 :    
360 :     <anchor>/<more>...
361 :    
362 :     The distinction between anchored paths and relative paths was made
363 :     based on whether or not <anchor> had a known mapping at the time it
364 :     was seen by CM. Since this is hard to read and fragile, support for
365 :     implicit anchors (while still there) is considered obsolete and will
366 :     be faded out soon. The meaning of an implicitly anchored path <path>
367 :     is the same as $/<path>.
368 :    
369 :     Recognition of implicit anchors can be turned off by issuing the
370 :     following command:
371 :    
372 :     CM.autoload "$smlnj/cm.cm";
373 :     #set CM.Control.implicit_anchors false;
374 :    
375 :     + Why anchored paths?
376 :    
377 :     The important point is that one can change the mapping of the anchor,
378 :     and the tranlation of the (anchored) path name will also change
379 :     accordingly -- even very late in the game. CM avoids "elaborating"
380 :     path names until it really needs them when it is time to open files.
381 :     CM is also willing to re-elaborate the same names if there is reason
382 :     to do so. Thus, the "basis.cm" library that was analyzed "here" but
383 :     then moved "there" will also be found "there" if the anchor has been
384 :     re-set accordingly.
385 :    
386 :     The anchor mapping is (re-)initialized at startup time by reading two
387 :     configuration files. Normally, those are the "../../lib/pathconfig"
388 :     file and the ".smlnj-pathconfig" file in your home directory (if such
389 :     exists). During an ongoing session, function CM.Anchor.anchor can be
390 :     used to query and modify the anchor mapping.
391 :    
392 : monnier 416 + Different configurations at different times:
393 :    
394 :     During compilation of the compiler, CMB uses a path configuration that
395 :     is read from the file "pathconfig" located here in this directory.
396 :    
397 : blume 643 At bootstrap time (while running "makeml"), the same anchors are
398 :     mapped to the corresponding sub-directory of the "boot" directory:
399 :     basis.cm is mapped to sml.boot.<arch>-<os>/basis.cm -- which means
400 :     that CM will look for a library named
401 :     sml.boot.<arch>-<os>/basis.cm/basis.cm -- and so forth.
402 : monnier 416
403 : blume 643 [Note, there are some anchors in "pathconfig" that have no
404 :     corresponding sub-directory of the boot director. Examples are
405 :     "root.cm", "cm", and so on. The reason is that there are no stable
406 :     libraries whose description files are named using these anchors;
407 :     everything anchored at "$cm" is a group but not a library.]
408 :    
409 : monnier 416 By the way, you will perhaps notice that there is no file
410 : monnier 498 sml.boot.<arch>-<os>/basis.cm/basis.cm
411 : monnier 416 but there _is_ the corresponding stable archive
412 : monnier 498 sml.boot.<arch>-<os>/basis.cm/CM/<arch>-<os>/basis.cm
413 : monnier 416 CM always looks for stable archives first.
414 :    
415 :     This mapping (from anchors to names in the boot directory) is the one
416 :     that will get frozen into the generated heap image at boot time.
417 :     Thus, unless it is changed, CM will look for its libraries in the boot
418 : blume 643 directory. The aforementioned "testml" script will make sure (by
419 :     setting the environment variable CM_PATHCONFIG) that the mapping be
420 :     changed to the one specified in a new "pathconfig" file which was
421 :     created by makeml and placed into the test library directory. It
422 :     points all anchors to the corresponding entry in the test library
423 :     directory. Thus, "testml" will let a new heap image run with its
424 :     corresponding new libraries.
425 : monnier 416
426 :     Normally, however, CM consults other pathconfig files at startup --
427 :     files that live in standard locations. These files are used to modify
428 :     the path configuration to let anchors point to their "usual" places.
429 :     The names of the files that are read (if present) are configurable via
430 :     environment variables. At the moment they default to
431 :     /usr/lib/smlnj-pathconfig
432 :     and
433 :     $HOME/.smlnj-pathconfig
434 :     The first one is configurable via CM_PATHCONFIG (and the default is
435 :     configurable at boot time via CM_PATHCONFIG_DEFAULT); the last is
436 :     configurable via CM_LOCAL_PATHCONFIG and CM_LOCAL_PATHCONFIG_DEFAULT.
437 :     In fact, the makeml script sets the CM_PATHCONFIG_DEFAULT variable
438 :     before making the heap image. Therefore, heap images generated by
439 :     makeml will look for their global pathconfig file in
440 :    
441 : blume 643 ../../lib/pathconfig
442 : monnier 416
443 : blume 643 [Note: The "makeml" script will not re-set the CM_PATHCONFIG_DEFAULT
444 :     variable if it was already set before. If it does re-set the
445 :     variable, it uses an absolute path name instead of the relative path
446 :     that I used for illustration above.]
447 :    
448 : monnier 429 For example, I always keep my "good" libraries in `pwd`/../../lib --
449 : blume 643 where both the main "install" script (in config/install.sh) and the
450 :     "installml" script (see above) also put them -- so I don't have to do
451 :     anything special about my pathconfig file.
452 : monnier 416
453 :     Once I have new heap image and libraries working, I replace the old
454 :     "good" image with the new one:
455 :    
456 :     mv <image>.<arch>-<osvariant> ../../bin/.heap/sml.<arch>-<osvariant>
457 :    
458 : blume 645 After this I must also move all libraries from <image>.libs/* to their
459 : blume 573 corresponding position in ../../lib.
460 : monnier 416
461 : blume 573 Since this is cumbersome to do by hand, there is a script called
462 :     "installml" that automates this task. Using the script has the added
463 :     advantage that it will not clobber libraries that belong to other than
464 :     the current architecture. (A rather heavy-handed "rm/mv" approach
465 :     will delete all stable libraries for all architectures.)
466 :     "installml" also patches the ../../lib/pathconfig file as necessary.
467 : monnier 416
468 :     Of course, you can organize things differently for yourself -- the
469 : blume 643 path configuration mechanism should be sufficiently flexible. If you
470 :     do so, you will have to set CM_PATHCONFIG. This must be done before
471 :     you start sml. If you want to change the pathname mapping at the time
472 :     sml is already running, then use the functions in CM.Anchor.
473 : monnier 416
474 :     * Libraries vs. Groups
475 :     ----------------------
476 :    
477 :     With the old CM, "group" was the primary concept while "library" and
478 :     "stabilization" could be considered afterthoughts. This has changed.
479 :     Now "library" is the primary concept, "stabilization" is semantically
480 :     significant, and "groups" are a secondary mechanism.
481 :    
482 :     Libraries are used to "structure the world"; groups are used to give
483 :     structure to libraries. Each group can be used either in precisely
484 :     one library (in which case it cannot be used at the interactive
485 :     toplevel) or at the toplevel (in which case it cannot be used in any
486 :     library). In other words, if you count the toplevel as a library,
487 :     then each group has a unique "owner" library. Of course, there still
488 :     is no limit on how many times a group can be mentioned as a member of
489 :     other groups -- as long as all these other groups belong to the same
490 :     owner library.
491 :    
492 : blume 643 Normally, collections of files that belong together should be made
493 :     into proper CM libraries. CM groups (aka "library components") should
494 :     be used only when there are namespace problems within a library.
495 : monnier 416
496 :     Aside from the fact that I find this design quite natural, there is
497 :     actually a technical reason for it: when you stabilize a library
498 :     (groups cannot be stabilized), then all its sub-groups (not
499 :     sub-libraries!) get "sucked into" the stable archive of the library.
500 :     In other words, even if you have n+1 CM description files (1 for the
501 :     library, n for n sub-groups), there will be just one file representing
502 :     the one stable archive (per architecture/os) for the whole thing. For
503 :     example, I structured the standard basis into one library with two
504 : blume 569 sub-groups, but once you compile it (CMB.make) there is only one
505 : monnier 416 stable file that represents the whole basis library. If groups were
506 :     allowed to appear in more than one library, then stabilization would
507 :     duplicate the group (its code, its environment data structures, and
508 :     even its dynamic state).
509 :    
510 :     There is a small change to the syntax of group description files: they
511 :     must explicitly state which library they belong to. CM will verify
512 :     that. The owner library is specified in parentheses after the "group"
513 :     keyword. If the specification is missing (that's the "old" syntax),
514 :     then the the owner will be taken to be the interactive toplevel.
515 :    
516 : blume 643 * Pervasive environment, core environment, the init library "init.cmi"
517 : monnier 416 -------------------------------------------------------------------------
518 :    
519 : blume 569 CMB.make starts out by building and compiling the
520 : blume 643 "init library". This library cannot be described in the "usual" way
521 : blume 537 because it uses "magic" in three ways:
522 :     - it is used to later tie in the runtime system
523 : blume 643 - it binds the "_Core" structure
524 : blume 569 - it exports the "pervasive" environment
525 : monnier 416
526 : blume 537 The pervasive environment no longer includes the entire basis library
527 :     but only non-modular bindings (top-level bindings of variables and
528 :     types).
529 :    
530 : blume 569 CM cannot automatically determine dependencies (or exports) for the
531 : blume 643 init library source files, but it still does use its regular cutoff
532 : blume 569 recompilation mechanism. Therefore, dependencies must be given
533 :     explicitly. This is done by a special description file which
534 : blume 643 currently lives in smlnj/init/init.cmi (as an anchored path:
535 :     "$smlnj/init/init.cmi"). See the long comment at the beginning of
536 :     that file for more details.
537 : monnier 416
538 : blume 643 After it is built, $smlnj/init/init.cmi can be used as an "ordinary"
539 :     library by other libraries. (This is done, for example, by the
540 :     implementation of the Basis library.) Access to
541 :     "$smlnj/init/init.cmi" is protected by the privilege named
542 :     "primitive". Also, note that the .cmi-file is not automatically
543 :     recognized as as CM description file. ("cmi" should remind you of "CM
544 :     - Initial library".) Therefore, it must be given an explicit member
545 :     class:
546 : blume 537
547 : blume 643 $smlnj/init/init.cmi : cm
548 : blume 569
549 : monnier 416 * Autoloader
550 :     ------------
551 :    
552 :     The new system heavily relies on the autoloader. As a result, almost
553 : blume 569 no static environments need to get unpickled at bootstrap time. The
554 : monnier 416 construction of such environments is deferred until they become
555 : blume 632 necessary. Thanks to this, it was possible to reduce the size of the
556 : blume 569 heap image by more than one megabyte (depending on the architecture).
557 :     The downside (although not really terribly bad) is that there is a
558 :     short wait when you first touch an identifier that hasn't been touched
559 : monnier 416 before. (I acknowledge that the notion of "short" may depend on your
560 :     sense of urgency. :-)
561 :    
562 :     The reliance on the autoloader (and therefore CM's library mechanism)
563 :     means that in order to be able to use the system, your paths must be
564 :     properly configured.
565 :    
566 :     Two libraries get pre-registered at bootstap time: the basis library
567 : blume 643 ("$/basis.cm") and CM itself ("$smlnj/cm/minimal.cm"). The latter is
568 :     crucial: without it one wouldn't be able to register any other
569 :     libraries via CM.autoload. The registration of $/basis.cm is a mere
570 :     convenience.
571 : monnier 416
572 :     Here are some other useful libraries that are not pre-registered but
573 :     which can easily be made accessible via CM.autoload (or, non-lazily,
574 :     via CM.make):
575 :    
576 : blume 643 $smlnj/cm.cm - provides the actual ("full") structure CM
577 : monnier 498 as described in the CM manual
578 : blume 643 $smlnj/cm/full.cm - same as $smlnj/cm.cm
579 :     $smlnj/compiler.cm - provides "structure Compiler"
580 :     $smlnj/compiler/current.cm - same as $smlnj/compiler.cm
581 :     $smlnj/cmb.cm - provides "structure CMB"
582 :     $smlnj/cmb/current.cm - same as $smlnj/cmb.cm
583 :     $smlnj/compiler/all.cm - provides "structure <Arch>Compiler" and
584 : monnier 416 "structure <Arch><OS>CMB" for various
585 :     values of <Arch> and <OS>
586 : blume 643 $smlnj-lib.cm - the SML/NJ library
587 : monnier 416
588 : blume 643 [Note: The fact that $smlnj/compiler.cm is not among the
589 :     pre-registered libraries seems like an oversight and could lead to
590 :     some inconveniences to users who want to, for example, set compiler
591 :     flags. However, pre-registration of this library significantly
592 :     increases the size of the heap image. Moreover, since the library can
593 :     easily be loaded by giving the string as a command line argument, this
594 :     does not really appear to be a big burden to me. Just create a shell
595 :     alias or a little wrapper script if you think you really need this.]
596 :    
597 : monnier 416 * Internal sharing
598 :     ------------------
599 :    
600 :     Dynamic values of loaded modules are shared. This is true even for
601 :     those modules that are used by the interactive compiler itself. If
602 :     you load a module from a library that is also used by the interactive
603 :     compiler, then "loading" means "loading the static environmnent" -- it
604 :     does not mean "loading the code and linking it". Instead, you get to
605 :     share the compiler's dynamic values (and therefore the executable
606 :     code as well).
607 :    
608 :     Of course, if you load a module that hasn't been loaded before and
609 :     also isn't used by the interactive system, then CM will get the code
610 :     and link (execute) it.
611 :    
612 :     * Access control
613 :     ----------------
614 :    
615 :     In some places, you will find that the "group" and "library" keywords
616 :     in description files are preceeded by certain strings, sometimes in
617 :     parentheses. These strings are the names of "privileges". Don't
618 :     worry about them too much at the moment. For the time being, access
619 :     control is not enforced, but the infrastructure is in place.
620 :    
621 :     * Preprocessor
622 :     --------------
623 :    
624 :     The syntax of expressions in #if and #elif clauses is now more ML-ish
625 :     instead of C-ish. (Hey, this is ML after all!) In particular, you
626 :     must use "andalso", "orelse", and "not" instead of "&&", "||" and "!".
627 :     Unary minus is "~".
628 :    
629 :     A more interesting change is that you can now query the exports of
630 :     sources/subgroups/sublibraries:
631 :    
632 :     - Within the "members" section of the description (i.e., after "is"):
633 :     The expression
634 :     defined(<namespace> <name>)
635 :     is true if any of the included members preceeding this clause exports
636 :     a symbol "<namespace> <name>".
637 : blume 632 - Within the "exports" section of the description (i.e., before "is"):
638 : monnier 416 The same expression is true if _any_ of the members exports the
639 :     named symbol.
640 :     (It would be more logical if the exports section would follow the
641 :     members section, but for esthetic reasons I prefer the exports
642 :     section to come first.)
643 :    
644 :     Example:
645 :    
646 :     +--------------------------+
647 :     |Library |
648 :     | structure Foo |
649 :     |#if defined(structure Bar)|
650 :     | structure Bar |
651 :     |#endif |
652 :     |is |
653 :     |#if SMLNJ_VERSION > 110 |
654 :     | new-foo.sml |
655 :     |#else |
656 :     | old-foo.sml |
657 :     |#endif |
658 :     |#if defined(structure Bar)|
659 :     | bar-client.sml |
660 :     |#else |
661 :     | no-bar-so-far.sml |
662 :     |#endif |
663 :     +--------------------------+
664 :    
665 :     Here, the file "bar-client.sml" gets included if SMLNJ_VERSION is
666 :     greater than 110 and new-foo.sml exports a structure Bar _or_ if
667 :     SMLNJ_VERSION <= 110 and old-foo.sml exports structure Bar. Otherwise
668 :     "no-bar-so-far.sml" gets included instead. In addition, the export of
669 :     structure Bar is guarded by its own existence. (Structure Bar could
670 :     also be defined by "no-bar-so-far.sml" in which case it would get
671 :     exported regardless of the outcome of the other "defined" test.)
672 :    
673 :     Some things to note:
674 :    
675 :     - For the purpose of the pre-processor, order among members is
676 :     significant. (For the purpose of dependency analysis, order continues
677 :     to be not significant).
678 :     - As a consequence, in some cases pre-processor dependencies and
679 :     compilation-dependencies may end up to be opposites of each other.
680 :     (This is not a problem; it may very well be a feature.)
681 :    
682 :     * The Basis Library is no longer built-in
683 :     -----------------------------------------
684 :    
685 :     The SML'97 basis is no longer built-in. If you want to use it, you
686 :     must specify "basis.cm" as a member of your group/library.
687 :    
688 :     * No more aliases
689 :     -----------------
690 :    
691 :     The "alias" feature is no longer with us. At first I thought I could
692 :     keep it, but it turns out that it causes some fairly fundamental
693 :     problems with the autoloader. However, I don't think that this is a
694 :     big loss because path anchors make up for most of it. Moreover,
695 :     stable libraries can now easily be moved to convenient locations
696 :     without having to move large source trees at the same time. (See my
697 : blume 643 new config/install.sh script for examples of that.)
698 : monnier 416
699 : blume 573 It is possible to simulate aliases (in a way that is safer than the
700 :     original alias mechanism). For example, the root.cm file (which is the
701 :     root of the whole system as far as CMB.make is concerned) acts as an
702 : blume 643 alias for $smlnj/internal/intsys.cm. In this case, root.cm is a group
703 : blume 573 to avoid having a (trivial) stable library file built for it.
704 :    
705 :     A library can act as an "alias" for another library if it has a
706 :     verbatim copy of the export list and mentions the other library as its
707 : blume 643 only member. Examples for this are $smlnj/cm.cm (for
708 :     $smlnj/cm/full.cm), $smlnj/compiler.cm (for $smlnj/compiler/current.cm),
709 :     etc. The stable library file for such an "alias" is typically very
710 :     small because it basically just points to the other library. (For
711 :     example, the file representing $smlnj/cm.cm is currently 234 bytes
712 :     long.)
713 : blume 573
714 : monnier 416 * Don't use relative or absolute pathnames to refer to libraries
715 :     ----------------------------------------------------------------
716 :    
717 :     Don't use relative or absolute pathnames to refer to libraries. If
718 :     you do it anyway, you'll get an appropriate warning at the time when
719 : blume 569 you do CMB.make(). If you use relative or absolute pathnames to
720 : monnier 416 refer to library B from library A, you will be committed to keeping B
721 :     in the same relative (to A) or absolute location. This, clearly,
722 : blume 643 would be undesirable in many situations (although perhaps not always).

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0