Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/system/README
ViewVC logotype

Annotation of /sml/trunk/src/system/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 652 - (view) (download)

1 : blume 537 Compiler Hacker's Guide to the new CM...
2 :     ========================================
3 : monnier 416
4 : blume 643 Last change: 12/5/2000
5 : blume 537
6 : monnier 416 * Libraries
7 :     -----------
8 :    
9 :     The new way of building the compiler is heavily library-oriented.
10 :     Aside from a tiny portion of code that is responsible for defining the
11 :     pervasive environment, _everything_ lives in libraries. Building the
12 :     compiler means compiling and stabilizing these libraries first. Some
13 :     of the libraries exist just for reasons of organizing the code, the
14 :     other ones are potentially useful in their own right. Therefore, as a
15 :     beneficial side-effect of compiling the compiler, you will end up with
16 :     stable versions of these libraries.
17 :    
18 :     At the moment, the following libraries are constructed when compiling
19 :     the compiler ("*" means that I consider the library potentially useful
20 :     in its own right):
21 :    
22 : blume 643 * $/basis.cm SML'97 Basis Library (pre-loaded)
23 :     * $/smlnj-lib.cm SML/NJ Utility Library
24 :     * $/html-lib.cm SML/NJ HTML Library
25 :     * $/pp-lib.cm SML/NJ Pretty-print Library
26 : monnier 416
27 : blume 643 * $/ml-yacc-lib.cm SML/NJ ML-Yacc runtime library
28 : blume 573
29 : blume 643 * $smlnj/compiler/{alpha32,hppa,ppc,sparc,x86}.cm
30 : blume 573 cross-compiler libraries, exporting
31 :     structure <Arch>Compiler
32 : blume 643 * $smlnj/compiler/current.cm structure Compiler (current arch)
33 :     * $smlnj/compiler/all.cm all cross-compilers and all cross-CMBs
34 : blume 573
35 : blume 652 * $smlnj/cm/full.cm structure CM (see manual)
36 : blume 643 * $smlnj/cm/tools.cm CM tools library
37 : blume 573
38 : blume 643 * $smlnj/cmb/{alpha32,hppa,ppc,sparc,x86}-unix.cm
39 : blume 573 cross-bootstrap-compilers for Unix
40 :     (structure <Arch>UnixCMB)
41 : blume 643 * $smlnj/cmb/ppc-macos.cm ...for Mac (structure PPCMacosCMB)
42 :     * $smlnj/cmb/x86-win32.cm ...for Windoze (structure X86Win32CMB)
43 :     * $smlnj/cmb/current.cm structure CMB (current arch/os)
44 : blume 573
45 : blume 643 * $smlnj/compiler.cm abbrev. for $smlnj/compiler/current.cm
46 :     * $smlnj/cm.cm abbrev. for $smlnj/cm/full.cm
47 :     * $smlnj/cmb.cm abbrev. for $smlnj/cmb/current.cm
48 : blume 573
49 : blume 643 * $comp-lib.cm Utility library for compiler
50 : blume 573
51 : blume 643 - $smlnj/viscomp/core.cm Compiler core functionality
52 :     - $smlnj/viscomp/{alpha32,hppa,ppc,sparc,x86}.cm
53 : blume 573 Machine-specific parts of compiler
54 :    
55 : blume 652 - $smlnj/internal/{intsys,cm-lib,host-compiler-0}.cm
56 : blume 573 Glue that holds the interactive system
57 :     together
58 :    
59 : blume 643 * $MLRISC/{MLRISC,Control,Lib,ALPHA,HPPA,PPC,SPARC,IA32}.cm
60 : blume 573 Various MLRISC bits
61 : blume 643 (Other MLRISC libraries such as
62 :     Graph, Visual, etc. do not currently
63 :     take part in the SML/NJ build.)
64 : blume 573
65 : blume 643 * ${mlyacc,mllex,mlburg}-tool.cm CM plug-in libraries for common tools
66 :     * ${grm,lex,burg}-ext.cm CM plug-in libraries for common file
67 : blume 573 extensions
68 :    
69 : blume 643 Paths of the form $/foo/... are shorthands for $foo/foo/..., paths of
70 :     the form $/singlearc can also be written as $singlearc (e.g.,
71 :     $basis.cm instead of $/basis.cm). The usefulness of the latter
72 :     shorthand is controversial, so you should probably try to avoid it.
73 :    
74 :     A more complete explanation of the $-notation can be found later in
75 :     this document or in the CM manual.
76 :    
77 :     To learn about the definitions of the $-anchors (and, thus, where in
78 :     the source tree the above libraries are defined), consult the
79 :     "pathconfig" file here in this directory.
80 :    
81 : monnier 416 * Before you can use the bootstrap compiler (CMB)...
82 :     ----------------------------------------------------
83 :    
84 :     To be able to use CMB at all, you must first say
85 :    
86 : blume 643 CM.autoload "$smlnj/cmb.cm";
87 : monnier 416
88 : blume 569 after you start sml. Alternatively -- and perhaps more conveniently --
89 : blume 643 you can provide "$smlnj/cmb.cm" as a command-line argument to sml:
90 : monnier 416
91 : blume 643 $ sml '$smlnj/cmb.cm'
92 : monnier 416
93 : blume 643 (Be sure to protect the dollar symbol which usually has its own
94 :     special meaning to the shell.)
95 :    
96 : blume 569 * Compiling the compiler
97 :     ------------------------
98 : monnier 416
99 : blume 569 We are now back to the old scheme where a call to CMB.make() suffices to
100 :     build a bootable set of files (libraries in our case). CMB.make maintains
101 :     two parallel hierarchies of derived files:
102 : monnier 416
103 : blume 569 1. the binfile hierarchy ("binfiles"), containing compiled objects for
104 :     each individual ML source file; this hierarchy is rooted at
105 :     <prefix>.bin.<arch>-<opsys>
106 :     2. the stable library hierarchy ("boot files"), containing library files
107 :     for each library that participates in building SML/NJ; this hierarchy
108 :     is rooted at
109 :     <prefix>.boot.<arch>-<opsys>
110 : monnier 416
111 : blume 569 The default for <prefix> is "sml". It can be changed by using
112 :     CMB.make' with the new <prefix> as the optional string argument.
113 : monnier 416
114 : blume 643 CMB.make reuses existing bootfiles after it has verified that they are
115 :     consistent with their corresponding binfiles. Bootfiles do not need
116 :     to be deleted in order for CMB.make to work correctly.
117 : monnier 416
118 : blume 569 To bootstrap a new system (using the runtime system boot loader), the
119 :     bootfiles _must_ be present, the binfiles need not be present (but
120 :     their presence does not hurt either).
121 : monnier 416
122 : monnier 498 You can reduce the number of extra files compiled and stabilized
123 : blume 569 during CMB.make at the expense of not building any cross-compilers.
124 : monnier 498 For that, say
125 :     #set (CMB.symval "LIGHT") (SOME 1);
126 : blume 569 before running CMB.make.
127 : monnier 498
128 : monnier 416 * Making the heap image
129 :     -----------------------
130 :    
131 :     The heap image is made by running the "makeml" script that you find
132 :     here in this directory. By default it will try to refer to the
133 : monnier 498 sml.boot.<arch>-<os> directory. You can change this using the -boot
134 : monnier 416 argument (which takes the full name of the boot directory to be used).
135 :    
136 :     The "feel" of using makeml should be mostly as it used to. However,
137 :     internally, there are some changes that you should be aware of:
138 :    
139 : blume 569 1. The script will make a heap image and build a separate library directory
140 : blume 643 that contains (hard) links to the library files in the bootfile directory.
141 : monnier 416
142 :     2. There is no "-full" option anymore. This functionality should
143 :     eventually be provided by a library with a sufficiently rich export
144 :     interface.
145 :    
146 :     3. No image will be generated if you use the -rebuild option.
147 :     Instead, the script quits after making new bin and new boot
148 :     directories. You must re-invoke makeml with a suitable "-boot"
149 :     option to actually make the image. The argument to "-rebuild"
150 :     is the <prefix> for the new bin and boot directories (see above).
151 :    
152 : blume 643 [Note: When the -rebuild option is specified, then the boot procedure
153 :     will not read static environments from the boot directory. Instead,
154 :     after the ML code has been loaded and linked, the system will invoke
155 :     CMB.make' with the argument that was given to -rebuild. After
156 :     CMB.make' is done, the system quits. In essence, makeml with -rebuild
157 :     acts as a bootstrap compiler that is not dependent on any usable
158 : blume 645 static environments.]
159 : blume 643
160 : blume 569 Makeml will not destroy the bootfile directory.
161 : monnier 416
162 :     * Testing a newly generated heap image
163 :     --------------------------------------
164 :    
165 :     If you use a new heap image by saying "sml @SMLload=..." then things
166 :     will not go as you may expect because along with the new heap image
167 :     should go those new stable libraries, but unless you do something
168 : blume 569 about it, the newly booted system will look for its stable libraries
169 : blume 643 in places where you stored your _old_ stable libraries. (After just
170 :     having done "makeml", these "places" would be within the boot file
171 :     hierarchy under <prefix>.boot.<arch>-<os>.)
172 : monnier 416
173 :     After you have made the new heap image, the new libraries are in a
174 :     separate directory whose name is derived from the name of the heap
175 : blume 569 image. (Actually, only the directory hierachy is separate, the
176 : blume 643 library files themselves are hard links.) The "testml" script that
177 :     you also find here will run the heap image and instruct it to look for
178 :     its libraries in that new library directory by setting the
179 :     CM_PATHCONFIG environment variable to point to a different pathconfig
180 :     file under <prefix>.lib.
181 : monnier 416
182 : monnier 498 "testml" takes the <prefix> of the heap image as its first
183 :     argument. All other arguments are passed verbatim to the ML process.
184 :    
185 :     The <prefix> is the same as the one used when you did "makeml". If
186 :     you run "testml" without arguments, <prefix> defaults to "sml".
187 : blume 645 Thus, if you just said "makeml" without arguments you can also say
188 :     "testml" without arguments. (Note that you _must_ supply the <prefix>
189 : monnier 498 argument if you intend to pass any additional arguments.)
190 :    
191 : monnier 416 * Installing a heap image for more permanent use
192 :     ------------------------------------------------
193 :    
194 : monnier 498 You can "install" a newly generated heap image by replacing the old
195 : blume 645 image with the new one AND AT THE SAME TIME replacing the old stable
196 : monnier 498 libaries with the new ones. To do this, run the "installml" script.
197 : monnier 416
198 : monnier 498 Like "testml", "installml" also expects the <prefix> as its first
199 :     argument. <prefix> defaults to "sml" if no argument is specified.
200 :    
201 :     "installml" patches the ../../lib/pathconfig file to reflect any
202 : blume 643 changes or additions to the path name mapping. (I say "patches"
203 :     because entries unrelated to the SML/NJ build process are retained in
204 :     their original form.) If you want to use a destination directory that
205 :     is different from ../../lib, then you must do this by hand (i.e.,
206 : blume 645 installml does not have an option for that).
207 : monnier 498
208 : blume 569 Thus, after a successful CMB.make, you should say
209 : monnier 498
210 :     ./makeml
211 :    
212 :     to make the new heap image + libraries, then
213 :    
214 :     ./testml
215 :    
216 :     to make sure everything works, and finally
217 :    
218 :     ./installml
219 :    
220 :     to replace your existing compiler with the one you just built and tested.
221 :    
222 : monnier 416 * Cross-compiling
223 :     -----------------
224 :    
225 : blume 643 All cross-compilers live in the "$smlnj/compiler/all.cm" library.
226 :     (The source tree for the "$smlnj" anchor -- see "pathconfig" -- is
227 :     src/system/smlnj, but this should normally not concern you.)
228 :     You must first say
229 : monnier 416
230 : blume 643 CM.autoload "$smlnj/compiler/all.cm";
231 : monnier 416
232 :     before you can access them. (This step corresponds to the old
233 :     CMB.retarget call.) After that, _all_ cross-compilers are available
234 :     at the same time. However, the ones that you are not using don't take
235 :     up any undue space because they only get loaded once you actually
236 : blume 645 mention them at top level. The names of the structures currently
237 : blume 643 exported by $smlnj/compiler/all.cm are:
238 : monnier 416
239 :     structure Alpha32UnixCMB
240 :     structure HppaUnixCMB
241 :     structure PPCMacOSCMB
242 :     structure PPCUnixCMB
243 :     structure SparcUnixCMB
244 :     structure X86UnixCMB
245 :     structure X86Win32CMB
246 :    
247 :     structure Alpha32Compiler
248 :     structure HppaCompiler
249 :     structure PPCCompiler
250 :     structure SparcCompiler
251 :     structure X86Compiler
252 :    
253 :     (PPCMacOSCMB is not very useful at the moment because there is no
254 :     implementation of the basis library for the MacOS.)
255 :    
256 : monnier 498 Alternatively, you can select just the one single structure that you
257 : blume 643 are interested in by auto-loading $smlnj/compiler/<arch>.cm or
258 :     $smlnj/cmb/<arch>-<os>.cm.
259 : monnier 498 <arch> currently ranges over "alpha32", "hppa", "ppc", "sparc", and "x86.
260 :     <os> can be either "unix" or "macos" or "win32".
261 :     (Obviously, not all combinations are valid.)
262 :    
263 : blume 643 Again, as with $smlnj/cmb.cm, you can specify the .cm file as an
264 : blume 569 argument to the sml command:
265 :    
266 : blume 643 $ sml '$smlnj/compiler/all.cm'
267 : blume 569
268 :     or
269 :    
270 : blume 643 $ sml '$smlnj/cmb/alpha32-unix.cm'
271 : blume 569
272 : blume 643 [Note: The command line for the "sml" command accepts configuration
273 :     parameters of the form "@SMLxxx...", mode switches of the form "-m"
274 :     and "-a", names of ML files -- which are passed to "use" -- and
275 :     arguments suitable for CM.make or CM.autoload. CM.autoload is the
276 :     default; the "-m" and "-a" mode switches can be used to change the
277 :     default -- even several times within the same command line.
278 :     A single argument "@CMslave" is also accepted, but it should not be
279 :     used directly as it is intended for use by the parallel compilation
280 :     facility within CM.]
281 :    
282 : monnier 416 * Path configuration
283 :     --------------------
284 :    
285 :     + Basics:
286 :    
287 :     One of the new features of CM is its handling of path names. In the
288 :     old CM, one particular point of trouble was the autoloader. It
289 :     analyzes a group or library and remembers the locations of associated
290 :     files. Later, when the necessity arises, those files will be read.
291 :     Therefore, one was asking for trouble if the current working directory
292 :     was changed between analysis- and load-time, or, worse, if files
293 :     actually moved about (as is often the case if build- and
294 :     installation-directories are different, or, to put it more generally,
295 :     if CM's state is frozen into a heap image and used in a different
296 :     environment).
297 :    
298 :     Maybe it would have been possible to work around most of these
299 :     problems by fixing the path-lookup mechanism in the old CM and using
300 :     it extensively. But path-lookup (as in the Unix-shell's "PATH") is
301 :     inherently dangerous because one can never be too sure what it will be
302 :     that is found on the path. A new file in one of the directories early
303 :     in the path can upset the program that hopes to find something under
304 :     the same name later on the path. Even when ignoring security-issues
305 :     like trojan horses and such, this definitely opens the door for
306 : monnier 498 various unpleasant surprises. (Who has never named a test version
307 : monnier 416 of a program "test" an found that it acts strangely only to discover
308 :     later that /bin/test was run instead?)
309 :    
310 :     Thus, the new scheme used by CM is a fixed mapping of what I call
311 :     "configuration anchors" to corresponding directories. The mapping can
312 :     be changed, but one must do so explicitly. In effect, it does not
313 :     depend on the contents of the file system. Here is how it works:
314 :    
315 : blume 643 If I specify a pathname that starts with a "$", then the first arc
316 :     between "$" and the first "/" is taken as the name of a so-called
317 :     "anchor". CM knows a mapping from anchor names to directory names and
318 :     replaces the prefix $<anchor> with the name of the corresponding
319 :     directory. Therefore, an anchored path has the general form
320 : monnier 416
321 : blume 643 $<anchor>/<path>
322 : monnier 416
323 : blume 643 It is important that there is at least one arc in <path>. In other
324 :     words, the form $<anchor> is NOT valid. (Actually, it currently is
325 :     valid, but CM interprets it in a different way.)
326 :    
327 :     Examples:
328 :    
329 :     $smlnj/compiler/all.cm
330 :     $basis.cm/basis.cm
331 :     $MLRISC/Control.cm
332 :    
333 :     The special case where <anchor> coincides with the first arc of <path>
334 :     can be abbreviated by ommitting <anchor>. This leads to the shorthand
335 :    
336 :     $/<anchor>/<more>...
337 :    
338 :     for the longer
339 :    
340 :     $<anchor>/<anchor>/<more>...
341 :    
342 :     Examples:
343 :    
344 :     $/foo/bar/baz.cm (* same as $foo/foo/bar/baz.cm *)
345 :     $/basis.cm (* same as $basis.cm/basis.cm *)
346 :    
347 :     Currently, CM accepts one additional shorthand for the case where
348 :     <path> has precisely one arc that coincides with <anchor>. Here, the
349 :     slash "/" can be ommitted, too.
350 :    
351 :     Examples:
352 :    
353 :     $basis.cm (* same as $/basis.cm or $basis.cm/basis.cm *)
354 :     $nw-ext.cm (* same as $/nw-ext.cm or $nw-ext.cm/nw-ext.cm *)
355 :    
356 :     Previously, CM used "implicit" anchors where anchored paths simply
357 :     have the form
358 :    
359 :     <anchor>/<more>...
360 :    
361 :     The distinction between anchored paths and relative paths was made
362 :     based on whether or not <anchor> had a known mapping at the time it
363 :     was seen by CM. Since this is hard to read and fragile, support for
364 :     implicit anchors (while still there) is considered obsolete and will
365 :     be faded out soon. The meaning of an implicitly anchored path <path>
366 :     is the same as $/<path>.
367 :    
368 :     Recognition of implicit anchors can be turned off by issuing the
369 :     following command:
370 :    
371 :     CM.autoload "$smlnj/cm.cm";
372 :     #set CM.Control.implicit_anchors false;
373 :    
374 :     + Why anchored paths?
375 :    
376 :     The important point is that one can change the mapping of the anchor,
377 :     and the tranlation of the (anchored) path name will also change
378 :     accordingly -- even very late in the game. CM avoids "elaborating"
379 :     path names until it really needs them when it is time to open files.
380 :     CM is also willing to re-elaborate the same names if there is reason
381 :     to do so. Thus, the "basis.cm" library that was analyzed "here" but
382 :     then moved "there" will also be found "there" if the anchor has been
383 :     re-set accordingly.
384 :    
385 :     The anchor mapping is (re-)initialized at startup time by reading two
386 :     configuration files. Normally, those are the "../../lib/pathconfig"
387 :     file and the ".smlnj-pathconfig" file in your home directory (if such
388 :     exists). During an ongoing session, function CM.Anchor.anchor can be
389 :     used to query and modify the anchor mapping.
390 :    
391 : monnier 416 + Different configurations at different times:
392 :    
393 :     During compilation of the compiler, CMB uses a path configuration that
394 :     is read from the file "pathconfig" located here in this directory.
395 :    
396 : blume 643 At bootstrap time (while running "makeml"), the same anchors are
397 :     mapped to the corresponding sub-directory of the "boot" directory:
398 :     basis.cm is mapped to sml.boot.<arch>-<os>/basis.cm -- which means
399 :     that CM will look for a library named
400 :     sml.boot.<arch>-<os>/basis.cm/basis.cm -- and so forth.
401 : monnier 416
402 : blume 643 [Note, there are some anchors in "pathconfig" that have no
403 :     corresponding sub-directory of the boot director. Examples are
404 :     "root.cm", "cm", and so on. The reason is that there are no stable
405 :     libraries whose description files are named using these anchors;
406 :     everything anchored at "$cm" is a group but not a library.]
407 :    
408 : monnier 416 By the way, you will perhaps notice that there is no file
409 : monnier 498 sml.boot.<arch>-<os>/basis.cm/basis.cm
410 : monnier 416 but there _is_ the corresponding stable archive
411 : monnier 498 sml.boot.<arch>-<os>/basis.cm/CM/<arch>-<os>/basis.cm
412 : monnier 416 CM always looks for stable archives first.
413 :    
414 :     This mapping (from anchors to names in the boot directory) is the one
415 :     that will get frozen into the generated heap image at boot time.
416 :     Thus, unless it is changed, CM will look for its libraries in the boot
417 : blume 643 directory. The aforementioned "testml" script will make sure (by
418 :     setting the environment variable CM_PATHCONFIG) that the mapping be
419 :     changed to the one specified in a new "pathconfig" file which was
420 :     created by makeml and placed into the test library directory. It
421 :     points all anchors to the corresponding entry in the test library
422 :     directory. Thus, "testml" will let a new heap image run with its
423 :     corresponding new libraries.
424 : monnier 416
425 :     Normally, however, CM consults other pathconfig files at startup --
426 :     files that live in standard locations. These files are used to modify
427 :     the path configuration to let anchors point to their "usual" places.
428 :     The names of the files that are read (if present) are configurable via
429 :     environment variables. At the moment they default to
430 :     /usr/lib/smlnj-pathconfig
431 :     and
432 :     $HOME/.smlnj-pathconfig
433 :     The first one is configurable via CM_PATHCONFIG (and the default is
434 :     configurable at boot time via CM_PATHCONFIG_DEFAULT); the last is
435 :     configurable via CM_LOCAL_PATHCONFIG and CM_LOCAL_PATHCONFIG_DEFAULT.
436 :     In fact, the makeml script sets the CM_PATHCONFIG_DEFAULT variable
437 :     before making the heap image. Therefore, heap images generated by
438 :     makeml will look for their global pathconfig file in
439 :    
440 : blume 643 ../../lib/pathconfig
441 : monnier 416
442 : blume 643 [Note: The "makeml" script will not re-set the CM_PATHCONFIG_DEFAULT
443 :     variable if it was already set before. If it does re-set the
444 :     variable, it uses an absolute path name instead of the relative path
445 :     that I used for illustration above.]
446 :    
447 : monnier 429 For example, I always keep my "good" libraries in `pwd`/../../lib --
448 : blume 643 where both the main "install" script (in config/install.sh) and the
449 :     "installml" script (see above) also put them -- so I don't have to do
450 :     anything special about my pathconfig file.
451 : monnier 416
452 :     Once I have new heap image and libraries working, I replace the old
453 :     "good" image with the new one:
454 :    
455 :     mv <image>.<arch>-<osvariant> ../../bin/.heap/sml.<arch>-<osvariant>
456 :    
457 : blume 645 After this I must also move all libraries from <image>.libs/* to their
458 : blume 573 corresponding position in ../../lib.
459 : monnier 416
460 : blume 573 Since this is cumbersome to do by hand, there is a script called
461 :     "installml" that automates this task. Using the script has the added
462 :     advantage that it will not clobber libraries that belong to other than
463 :     the current architecture. (A rather heavy-handed "rm/mv" approach
464 :     will delete all stable libraries for all architectures.)
465 :     "installml" also patches the ../../lib/pathconfig file as necessary.
466 : monnier 416
467 :     Of course, you can organize things differently for yourself -- the
468 : blume 643 path configuration mechanism should be sufficiently flexible. If you
469 :     do so, you will have to set CM_PATHCONFIG. This must be done before
470 :     you start sml. If you want to change the pathname mapping at the time
471 :     sml is already running, then use the functions in CM.Anchor.
472 : monnier 416
473 :     * Libraries vs. Groups
474 :     ----------------------
475 :    
476 :     With the old CM, "group" was the primary concept while "library" and
477 :     "stabilization" could be considered afterthoughts. This has changed.
478 :     Now "library" is the primary concept, "stabilization" is semantically
479 :     significant, and "groups" are a secondary mechanism.
480 :    
481 :     Libraries are used to "structure the world"; groups are used to give
482 :     structure to libraries. Each group can be used either in precisely
483 :     one library (in which case it cannot be used at the interactive
484 :     toplevel) or at the toplevel (in which case it cannot be used in any
485 :     library). In other words, if you count the toplevel as a library,
486 :     then each group has a unique "owner" library. Of course, there still
487 :     is no limit on how many times a group can be mentioned as a member of
488 :     other groups -- as long as all these other groups belong to the same
489 :     owner library.
490 :    
491 : blume 643 Normally, collections of files that belong together should be made
492 :     into proper CM libraries. CM groups (aka "library components") should
493 :     be used only when there are namespace problems within a library.
494 : monnier 416
495 :     Aside from the fact that I find this design quite natural, there is
496 :     actually a technical reason for it: when you stabilize a library
497 :     (groups cannot be stabilized), then all its sub-groups (not
498 :     sub-libraries!) get "sucked into" the stable archive of the library.
499 :     In other words, even if you have n+1 CM description files (1 for the
500 :     library, n for n sub-groups), there will be just one file representing
501 :     the one stable archive (per architecture/os) for the whole thing. For
502 :     example, I structured the standard basis into one library with two
503 : blume 569 sub-groups, but once you compile it (CMB.make) there is only one
504 : monnier 416 stable file that represents the whole basis library. If groups were
505 :     allowed to appear in more than one library, then stabilization would
506 :     duplicate the group (its code, its environment data structures, and
507 :     even its dynamic state).
508 :    
509 :     There is a small change to the syntax of group description files: they
510 :     must explicitly state which library they belong to. CM will verify
511 :     that. The owner library is specified in parentheses after the "group"
512 :     keyword. If the specification is missing (that's the "old" syntax),
513 :     then the the owner will be taken to be the interactive toplevel.
514 :    
515 : blume 643 * Pervasive environment, core environment, the init library "init.cmi"
516 : monnier 416 -------------------------------------------------------------------------
517 :    
518 : blume 569 CMB.make starts out by building and compiling the
519 : blume 643 "init library". This library cannot be described in the "usual" way
520 : blume 537 because it uses "magic" in three ways:
521 :     - it is used to later tie in the runtime system
522 : blume 643 - it binds the "_Core" structure
523 : blume 569 - it exports the "pervasive" environment
524 : monnier 416
525 : blume 537 The pervasive environment no longer includes the entire basis library
526 :     but only non-modular bindings (top-level bindings of variables and
527 :     types).
528 :    
529 : blume 569 CM cannot automatically determine dependencies (or exports) for the
530 : blume 643 init library source files, but it still does use its regular cutoff
531 : blume 569 recompilation mechanism. Therefore, dependencies must be given
532 :     explicitly. This is done by a special description file which
533 : blume 643 currently lives in smlnj/init/init.cmi (as an anchored path:
534 :     "$smlnj/init/init.cmi"). See the long comment at the beginning of
535 :     that file for more details.
536 : monnier 416
537 : blume 643 After it is built, $smlnj/init/init.cmi can be used as an "ordinary"
538 :     library by other libraries. (This is done, for example, by the
539 :     implementation of the Basis library.) Access to
540 :     "$smlnj/init/init.cmi" is protected by the privilege named
541 :     "primitive". Also, note that the .cmi-file is not automatically
542 :     recognized as as CM description file. ("cmi" should remind you of "CM
543 :     - Initial library".) Therefore, it must be given an explicit member
544 :     class:
545 : blume 537
546 : blume 643 $smlnj/init/init.cmi : cm
547 : blume 569
548 : monnier 416 * Autoloader
549 :     ------------
550 :    
551 :     The new system heavily relies on the autoloader. As a result, almost
552 : blume 569 no static environments need to get unpickled at bootstrap time. The
553 : monnier 416 construction of such environments is deferred until they become
554 : blume 632 necessary. Thanks to this, it was possible to reduce the size of the
555 : blume 569 heap image by more than one megabyte (depending on the architecture).
556 :     The downside (although not really terribly bad) is that there is a
557 :     short wait when you first touch an identifier that hasn't been touched
558 : monnier 416 before. (I acknowledge that the notion of "short" may depend on your
559 :     sense of urgency. :-)
560 :    
561 :     The reliance on the autoloader (and therefore CM's library mechanism)
562 :     means that in order to be able to use the system, your paths must be
563 :     properly configured.
564 :    
565 : blume 652 Several libraries get pre-registered at bootstap time. Here, at least
566 :     the following two should be included: the basis library ("$/basis.cm")
567 :     and CM itself ("$smlnj/cm.cm"). Currently, we also pre-register the
568 :     library exporting structure Compiler ($smlnj/compiler.cm) and the
569 :     SML/NJ library ($/smlnj-lib.cm).
570 : monnier 416
571 :     Here are some other useful libraries that are not pre-registered but
572 :     which can easily be made accessible via CM.autoload (or, non-lazily,
573 :     via CM.make):
574 :    
575 : blume 643 $smlnj/cmb.cm - provides "structure CMB"
576 :     $smlnj/cmb/current.cm - same as $smlnj/cmb.cm
577 :     $smlnj/compiler/all.cm - provides "structure <Arch>Compiler" and
578 : monnier 416 "structure <Arch><OS>CMB" for various
579 :     values of <Arch> and <OS>
580 :    
581 : blume 652 The file preloads.standard here in this directory currently includes
582 :     $smlnj/cmb.cm. This means that by doing ./makeml one obtains a heap
583 :     image with the bootstrap compiler being pre-registered as well. This
584 :     seems reasonable for compiler hackers. (The config/install.sh script
585 :     uses config/preloads where $smlnj/cmb.cm is not pre-registered. This
586 :     is appropriate as a setup for general users.)
587 : blume 643
588 : blume 652 [****** Note: The following is NO LONGER TRUE:
589 :     The fact that $smlnj/compiler.cm is not among the pre-registered
590 :     libraries seems like an oversight and could lead to some
591 :     inconveniences to users who want to, for example, set compiler flags.
592 :     However, pre-registration of this library significantly increases the
593 :     size of the heap image. Moreover, since the library can easily be
594 :     loaded by giving the string as a command line argument, this does not
595 :     really appear to be a big burden to me. Just create a shell alias or
596 :     a little wrapper script if you think you really need this.]
597 :    
598 : monnier 416 * Internal sharing
599 :     ------------------
600 :    
601 :     Dynamic values of loaded modules are shared. This is true even for
602 :     those modules that are used by the interactive compiler itself. If
603 :     you load a module from a library that is also used by the interactive
604 :     compiler, then "loading" means "loading the static environmnent" -- it
605 :     does not mean "loading the code and linking it". Instead, you get to
606 :     share the compiler's dynamic values (and therefore the executable
607 :     code as well).
608 :    
609 :     Of course, if you load a module that hasn't been loaded before and
610 :     also isn't used by the interactive system, then CM will get the code
611 :     and link (execute) it.
612 :    
613 :     * Access control
614 :     ----------------
615 :    
616 :     In some places, you will find that the "group" and "library" keywords
617 :     in description files are preceeded by certain strings, sometimes in
618 :     parentheses. These strings are the names of "privileges". Don't
619 :     worry about them too much at the moment. For the time being, access
620 :     control is not enforced, but the infrastructure is in place.
621 :    
622 :     * Preprocessor
623 :     --------------
624 :    
625 :     The syntax of expressions in #if and #elif clauses is now more ML-ish
626 :     instead of C-ish. (Hey, this is ML after all!) In particular, you
627 :     must use "andalso", "orelse", and "not" instead of "&&", "||" and "!".
628 :     Unary minus is "~".
629 :    
630 :     A more interesting change is that you can now query the exports of
631 :     sources/subgroups/sublibraries:
632 :    
633 :     - Within the "members" section of the description (i.e., after "is"):
634 :     The expression
635 :     defined(<namespace> <name>)
636 :     is true if any of the included members preceeding this clause exports
637 :     a symbol "<namespace> <name>".
638 : blume 632 - Within the "exports" section of the description (i.e., before "is"):
639 : monnier 416 The same expression is true if _any_ of the members exports the
640 :     named symbol.
641 :     (It would be more logical if the exports section would follow the
642 :     members section, but for esthetic reasons I prefer the exports
643 :     section to come first.)
644 :    
645 :     Example:
646 :    
647 :     +--------------------------+
648 :     |Library |
649 :     | structure Foo |
650 :     |#if defined(structure Bar)|
651 :     | structure Bar |
652 :     |#endif |
653 :     |is |
654 :     |#if SMLNJ_VERSION > 110 |
655 :     | new-foo.sml |
656 :     |#else |
657 :     | old-foo.sml |
658 :     |#endif |
659 :     |#if defined(structure Bar)|
660 :     | bar-client.sml |
661 :     |#else |
662 :     | no-bar-so-far.sml |
663 :     |#endif |
664 :     +--------------------------+
665 :    
666 :     Here, the file "bar-client.sml" gets included if SMLNJ_VERSION is
667 :     greater than 110 and new-foo.sml exports a structure Bar _or_ if
668 :     SMLNJ_VERSION <= 110 and old-foo.sml exports structure Bar. Otherwise
669 :     "no-bar-so-far.sml" gets included instead. In addition, the export of
670 :     structure Bar is guarded by its own existence. (Structure Bar could
671 :     also be defined by "no-bar-so-far.sml" in which case it would get
672 :     exported regardless of the outcome of the other "defined" test.)
673 :    
674 :     Some things to note:
675 :    
676 :     - For the purpose of the pre-processor, order among members is
677 :     significant. (For the purpose of dependency analysis, order continues
678 :     to be not significant).
679 :     - As a consequence, in some cases pre-processor dependencies and
680 :     compilation-dependencies may end up to be opposites of each other.
681 :     (This is not a problem; it may very well be a feature.)
682 :    
683 :     * The Basis Library is no longer built-in
684 :     -----------------------------------------
685 :    
686 :     The SML'97 basis is no longer built-in. If you want to use it, you
687 :     must specify "basis.cm" as a member of your group/library.
688 :    
689 :     * No more aliases
690 :     -----------------
691 :    
692 :     The "alias" feature is no longer with us. At first I thought I could
693 :     keep it, but it turns out that it causes some fairly fundamental
694 :     problems with the autoloader. However, I don't think that this is a
695 :     big loss because path anchors make up for most of it. Moreover,
696 :     stable libraries can now easily be moved to convenient locations
697 :     without having to move large source trees at the same time. (See my
698 : blume 643 new config/install.sh script for examples of that.)
699 : monnier 416
700 : blume 573 It is possible to simulate aliases (in a way that is safer than the
701 :     original alias mechanism). For example, the root.cm file (which is the
702 :     root of the whole system as far as CMB.make is concerned) acts as an
703 : blume 643 alias for $smlnj/internal/intsys.cm. In this case, root.cm is a group
704 : blume 573 to avoid having a (trivial) stable library file built for it.
705 :    
706 :     A library can act as an "alias" for another library if it has a
707 :     verbatim copy of the export list and mentions the other library as its
708 : blume 643 only member. Examples for this are $smlnj/cm.cm (for
709 :     $smlnj/cm/full.cm), $smlnj/compiler.cm (for $smlnj/compiler/current.cm),
710 :     etc. The stable library file for such an "alias" is typically very
711 :     small because it basically just points to the other library. (For
712 :     example, the file representing $smlnj/cm.cm is currently 234 bytes
713 :     long.)
714 : blume 573
715 : monnier 416 * Don't use relative or absolute pathnames to refer to libraries
716 :     ----------------------------------------------------------------
717 :    
718 :     Don't use relative or absolute pathnames to refer to libraries. If
719 :     you do it anyway, you'll get an appropriate warning at the time when
720 : blume 569 you do CMB.make(). If you use relative or absolute pathnames to
721 : monnier 416 refer to library B from library A, you will be committed to keeping B
722 :     in the same relative (to A) or absolute location. This, clearly,
723 : blume 643 would be undesirable in many situations (although perhaps not always).

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0