Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/system/README
ViewVC logotype

Annotation of /sml/trunk/src/system/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 677 - (view) (download)

1 : blume 537 Compiler Hacker's Guide to the new CM...
2 :     ========================================
3 : monnier 416
4 : blume 643 Last change: 12/5/2000
5 : blume 537
6 : monnier 416 * Libraries
7 :     -----------
8 :    
9 :     The new way of building the compiler is heavily library-oriented.
10 :     Aside from a tiny portion of code that is responsible for defining the
11 :     pervasive environment, _everything_ lives in libraries. Building the
12 :     compiler means compiling and stabilizing these libraries first. Some
13 :     of the libraries exist just for reasons of organizing the code, the
14 :     other ones are potentially useful in their own right. Therefore, as a
15 :     beneficial side-effect of compiling the compiler, you will end up with
16 :     stable versions of these libraries.
17 :    
18 :     At the moment, the following libraries are constructed when compiling
19 :     the compiler ("*" means that I consider the library potentially useful
20 :     in its own right):
21 :    
22 : blume 643 * $/basis.cm SML'97 Basis Library (pre-loaded)
23 :     * $/smlnj-lib.cm SML/NJ Utility Library
24 :     * $/html-lib.cm SML/NJ HTML Library
25 :     * $/pp-lib.cm SML/NJ Pretty-print Library
26 : monnier 416
27 : blume 643 * $/ml-yacc-lib.cm SML/NJ ML-Yacc runtime library
28 : blume 573
29 : blume 643 * $smlnj/compiler/{alpha32,hppa,ppc,sparc,x86}.cm
30 : blume 573 cross-compiler libraries, exporting
31 :     structure <Arch>Compiler
32 : blume 643 * $smlnj/compiler/current.cm structure Compiler (current arch)
33 :     * $smlnj/compiler/all.cm all cross-compilers and all cross-CMBs
34 : blume 573
35 : blume 652 * $smlnj/cm/full.cm structure CM (see manual)
36 : blume 643 * $smlnj/cm/tools.cm CM tools library
37 : blume 573
38 : blume 643 * $smlnj/cmb/{alpha32,hppa,ppc,sparc,x86}-unix.cm
39 : blume 573 cross-bootstrap-compilers for Unix
40 :     (structure <Arch>UnixCMB)
41 : blume 643 * $smlnj/cmb/ppc-macos.cm ...for Mac (structure PPCMacosCMB)
42 :     * $smlnj/cmb/x86-win32.cm ...for Windoze (structure X86Win32CMB)
43 :     * $smlnj/cmb/current.cm structure CMB (current arch/os)
44 : blume 573
45 : blume 643 * $smlnj/compiler.cm abbrev. for $smlnj/compiler/current.cm
46 :     * $smlnj/cm.cm abbrev. for $smlnj/cm/full.cm
47 :     * $smlnj/cmb.cm abbrev. for $smlnj/cmb/current.cm
48 : blume 573
49 : blume 677 * $/comp-lib.cm Utility library for compiler
50 : blume 573
51 : blume 643 - $smlnj/viscomp/core.cm Compiler core functionality
52 :     - $smlnj/viscomp/{alpha32,hppa,ppc,sparc,x86}.cm
53 : blume 573 Machine-specific parts of compiler
54 :    
55 : blume 652 - $smlnj/internal/{intsys,cm-lib,host-compiler-0}.cm
56 : blume 573 Glue that holds the interactive system
57 :     together
58 :    
59 : blume 643 * $MLRISC/{MLRISC,Control,Lib,ALPHA,HPPA,PPC,SPARC,IA32}.cm
60 : blume 573 Various MLRISC bits
61 : blume 643 (Other MLRISC libraries such as
62 :     Graph, Visual, etc. do not currently
63 :     take part in the SML/NJ build.)
64 : blume 573
65 : blume 677 * $/{mlyacc,mllex,mlburg}-tool.cm CM plug-in libraries for common tools
66 :     * $/{grm,lex,burg}-ext.cm CM plug-in libraries for common file
67 : blume 573 extensions
68 :    
69 : blume 677 Paths of the form $/foo/<more> are shorthands for $foo/foo/<more>.
70 : blume 643
71 :     A more complete explanation of the $-notation can be found later in
72 :     this document or in the CM manual.
73 :    
74 :     To learn about the definitions of the $-anchors (and, thus, where in
75 :     the source tree the above libraries are defined), consult the
76 :     "pathconfig" file here in this directory.
77 :    
78 : monnier 416 * Before you can use the bootstrap compiler (CMB)...
79 :     ----------------------------------------------------
80 :    
81 :     To be able to use CMB at all, you must first say
82 :    
83 : blume 643 CM.autoload "$smlnj/cmb.cm";
84 : monnier 416
85 : blume 569 after you start sml. Alternatively -- and perhaps more conveniently --
86 : blume 643 you can provide "$smlnj/cmb.cm" as a command-line argument to sml:
87 : monnier 416
88 : blume 643 $ sml '$smlnj/cmb.cm'
89 : monnier 416
90 : blume 643 (Be sure to protect the dollar symbol which usually has its own
91 :     special meaning to the shell.)
92 :    
93 : blume 569 * Compiling the compiler
94 :     ------------------------
95 : monnier 416
96 : blume 569 We are now back to the old scheme where a call to CMB.make() suffices to
97 :     build a bootable set of files (libraries in our case). CMB.make maintains
98 :     two parallel hierarchies of derived files:
99 : monnier 416
100 : blume 569 1. the binfile hierarchy ("binfiles"), containing compiled objects for
101 :     each individual ML source file; this hierarchy is rooted at
102 :     <prefix>.bin.<arch>-<opsys>
103 :     2. the stable library hierarchy ("boot files"), containing library files
104 :     for each library that participates in building SML/NJ; this hierarchy
105 :     is rooted at
106 :     <prefix>.boot.<arch>-<opsys>
107 : monnier 416
108 : blume 569 The default for <prefix> is "sml". It can be changed by using
109 :     CMB.make' with the new <prefix> as the optional string argument.
110 : monnier 416
111 : blume 643 CMB.make reuses existing bootfiles after it has verified that they are
112 :     consistent with their corresponding binfiles. Bootfiles do not need
113 :     to be deleted in order for CMB.make to work correctly.
114 : monnier 416
115 : blume 569 To bootstrap a new system (using the runtime system boot loader), the
116 :     bootfiles _must_ be present, the binfiles need not be present (but
117 :     their presence does not hurt either).
118 : monnier 416
119 : monnier 498 You can reduce the number of extra files compiled and stabilized
120 : blume 569 during CMB.make at the expense of not building any cross-compilers.
121 : monnier 498 For that, say
122 :     #set (CMB.symval "LIGHT") (SOME 1);
123 : blume 569 before running CMB.make.
124 : monnier 498
125 : monnier 416 * Making the heap image
126 :     -----------------------
127 :    
128 :     The heap image is made by running the "makeml" script that you find
129 :     here in this directory. By default it will try to refer to the
130 : monnier 498 sml.boot.<arch>-<os> directory. You can change this using the -boot
131 : monnier 416 argument (which takes the full name of the boot directory to be used).
132 :    
133 :     The "feel" of using makeml should be mostly as it used to. However,
134 :     internally, there are some changes that you should be aware of:
135 :    
136 : blume 569 1. The script will make a heap image and build a separate library directory
137 : blume 643 that contains (hard) links to the library files in the bootfile directory.
138 : monnier 416
139 :     2. There is no "-full" option anymore. This functionality should
140 :     eventually be provided by a library with a sufficiently rich export
141 :     interface.
142 :    
143 :     3. No image will be generated if you use the -rebuild option.
144 :     Instead, the script quits after making new bin and new boot
145 :     directories. You must re-invoke makeml with a suitable "-boot"
146 :     option to actually make the image. The argument to "-rebuild"
147 :     is the <prefix> for the new bin and boot directories (see above).
148 :    
149 : blume 643 [Note: When the -rebuild option is specified, then the boot procedure
150 :     will not read static environments from the boot directory. Instead,
151 :     after the ML code has been loaded and linked, the system will invoke
152 :     CMB.make' with the argument that was given to -rebuild. After
153 :     CMB.make' is done, the system quits. In essence, makeml with -rebuild
154 :     acts as a bootstrap compiler that is not dependent on any usable
155 : blume 645 static environments.]
156 : blume 643
157 : blume 569 Makeml will not destroy the bootfile directory.
158 : monnier 416
159 :     * Testing a newly generated heap image
160 :     --------------------------------------
161 :    
162 :     If you use a new heap image by saying "sml @SMLload=..." then things
163 :     will not go as you may expect because along with the new heap image
164 :     should go those new stable libraries, but unless you do something
165 : blume 569 about it, the newly booted system will look for its stable libraries
166 : blume 643 in places where you stored your _old_ stable libraries. (After just
167 :     having done "makeml", these "places" would be within the boot file
168 :     hierarchy under <prefix>.boot.<arch>-<os>.)
169 : monnier 416
170 :     After you have made the new heap image, the new libraries are in a
171 :     separate directory whose name is derived from the name of the heap
172 : blume 569 image. (Actually, only the directory hierachy is separate, the
173 : blume 643 library files themselves are hard links.) The "testml" script that
174 :     you also find here will run the heap image and instruct it to look for
175 :     its libraries in that new library directory by setting the
176 :     CM_PATHCONFIG environment variable to point to a different pathconfig
177 :     file under <prefix>.lib.
178 : monnier 416
179 : monnier 498 "testml" takes the <prefix> of the heap image as its first
180 :     argument. All other arguments are passed verbatim to the ML process.
181 :    
182 :     The <prefix> is the same as the one used when you did "makeml". If
183 :     you run "testml" without arguments, <prefix> defaults to "sml".
184 : blume 645 Thus, if you just said "makeml" without arguments you can also say
185 :     "testml" without arguments. (Note that you _must_ supply the <prefix>
186 : monnier 498 argument if you intend to pass any additional arguments.)
187 :    
188 : monnier 416 * Installing a heap image for more permanent use
189 :     ------------------------------------------------
190 :    
191 : monnier 498 You can "install" a newly generated heap image by replacing the old
192 : blume 645 image with the new one AND AT THE SAME TIME replacing the old stable
193 : monnier 498 libaries with the new ones. To do this, run the "installml" script.
194 : monnier 416
195 : monnier 498 Like "testml", "installml" also expects the <prefix> as its first
196 :     argument. <prefix> defaults to "sml" if no argument is specified.
197 :    
198 :     "installml" patches the ../../lib/pathconfig file to reflect any
199 : blume 643 changes or additions to the path name mapping. (I say "patches"
200 :     because entries unrelated to the SML/NJ build process are retained in
201 :     their original form.) If you want to use a destination directory that
202 :     is different from ../../lib, then you must do this by hand (i.e.,
203 : blume 645 installml does not have an option for that).
204 : monnier 498
205 : blume 569 Thus, after a successful CMB.make, you should say
206 : monnier 498
207 :     ./makeml
208 :    
209 :     to make the new heap image + libraries, then
210 :    
211 :     ./testml
212 :    
213 :     to make sure everything works, and finally
214 :    
215 :     ./installml
216 :    
217 :     to replace your existing compiler with the one you just built and tested.
218 :    
219 : monnier 416 * Cross-compiling
220 :     -----------------
221 :    
222 : blume 643 All cross-compilers live in the "$smlnj/compiler/all.cm" library.
223 :     (The source tree for the "$smlnj" anchor -- see "pathconfig" -- is
224 :     src/system/smlnj, but this should normally not concern you.)
225 :     You must first say
226 : monnier 416
227 : blume 643 CM.autoload "$smlnj/compiler/all.cm";
228 : monnier 416
229 :     before you can access them. (This step corresponds to the old
230 :     CMB.retarget call.) After that, _all_ cross-compilers are available
231 :     at the same time. However, the ones that you are not using don't take
232 :     up any undue space because they only get loaded once you actually
233 : blume 645 mention them at top level. The names of the structures currently
234 : blume 643 exported by $smlnj/compiler/all.cm are:
235 : monnier 416
236 :     structure Alpha32UnixCMB
237 :     structure HppaUnixCMB
238 :     structure PPCMacOSCMB
239 :     structure PPCUnixCMB
240 :     structure SparcUnixCMB
241 :     structure X86UnixCMB
242 :     structure X86Win32CMB
243 :    
244 :     structure Alpha32Compiler
245 :     structure HppaCompiler
246 :     structure PPCCompiler
247 :     structure SparcCompiler
248 :     structure X86Compiler
249 :    
250 :     (PPCMacOSCMB is not very useful at the moment because there is no
251 :     implementation of the basis library for the MacOS.)
252 :    
253 : monnier 498 Alternatively, you can select just the one single structure that you
254 : blume 643 are interested in by auto-loading $smlnj/compiler/<arch>.cm or
255 :     $smlnj/cmb/<arch>-<os>.cm.
256 : monnier 498 <arch> currently ranges over "alpha32", "hppa", "ppc", "sparc", and "x86.
257 :     <os> can be either "unix" or "macos" or "win32".
258 :     (Obviously, not all combinations are valid.)
259 :    
260 : blume 643 Again, as with $smlnj/cmb.cm, you can specify the .cm file as an
261 : blume 569 argument to the sml command:
262 :    
263 : blume 643 $ sml '$smlnj/compiler/all.cm'
264 : blume 569
265 :     or
266 :    
267 : blume 643 $ sml '$smlnj/cmb/alpha32-unix.cm'
268 : blume 569
269 : blume 643 [Note: The command line for the "sml" command accepts configuration
270 :     parameters of the form "@SMLxxx...", mode switches of the form "-m"
271 :     and "-a", names of ML files -- which are passed to "use" -- and
272 :     arguments suitable for CM.make or CM.autoload. CM.autoload is the
273 :     default; the "-m" and "-a" mode switches can be used to change the
274 :     default -- even several times within the same command line.
275 :     A single argument "@CMslave" is also accepted, but it should not be
276 :     used directly as it is intended for use by the parallel compilation
277 :     facility within CM.]
278 :    
279 : monnier 416 * Path configuration
280 :     --------------------
281 :    
282 :     + Basics:
283 :    
284 :     One of the new features of CM is its handling of path names. In the
285 :     old CM, one particular point of trouble was the autoloader. It
286 :     analyzes a group or library and remembers the locations of associated
287 :     files. Later, when the necessity arises, those files will be read.
288 :     Therefore, one was asking for trouble if the current working directory
289 :     was changed between analysis- and load-time, or, worse, if files
290 :     actually moved about (as is often the case if build- and
291 :     installation-directories are different, or, to put it more generally,
292 :     if CM's state is frozen into a heap image and used in a different
293 :     environment).
294 :    
295 :     Maybe it would have been possible to work around most of these
296 :     problems by fixing the path-lookup mechanism in the old CM and using
297 :     it extensively. But path-lookup (as in the Unix-shell's "PATH") is
298 :     inherently dangerous because one can never be too sure what it will be
299 :     that is found on the path. A new file in one of the directories early
300 :     in the path can upset the program that hopes to find something under
301 :     the same name later on the path. Even when ignoring security-issues
302 :     like trojan horses and such, this definitely opens the door for
303 : monnier 498 various unpleasant surprises. (Who has never named a test version
304 : monnier 416 of a program "test" an found that it acts strangely only to discover
305 :     later that /bin/test was run instead?)
306 :    
307 :     Thus, the new scheme used by CM is a fixed mapping of what I call
308 :     "configuration anchors" to corresponding directories. The mapping can
309 :     be changed, but one must do so explicitly. In effect, it does not
310 :     depend on the contents of the file system. Here is how it works:
311 :    
312 : blume 643 If I specify a pathname that starts with a "$", then the first arc
313 :     between "$" and the first "/" is taken as the name of a so-called
314 :     "anchor". CM knows a mapping from anchor names to directory names and
315 :     replaces the prefix $<anchor> with the name of the corresponding
316 :     directory. Therefore, an anchored path has the general form
317 : monnier 416
318 : blume 643 $<anchor>/<path>
319 : monnier 416
320 : blume 643 It is important that there is at least one arc in <path>. In other
321 :     words, the form $<anchor> is NOT valid. (Actually, it currently is
322 :     valid, but CM interprets it in a different way.)
323 :    
324 :     Examples:
325 :    
326 :     $smlnj/compiler/all.cm
327 :     $basis.cm/basis.cm
328 :     $MLRISC/Control.cm
329 :    
330 :     The special case where <anchor> coincides with the first arc of <path>
331 :     can be abbreviated by ommitting <anchor>. This leads to the shorthand
332 :    
333 :     $/<anchor>/<more>...
334 :    
335 :     for the longer
336 :    
337 :     $<anchor>/<anchor>/<more>...
338 :    
339 :     Examples:
340 :    
341 :     $/foo/bar/baz.cm (* same as $foo/foo/bar/baz.cm *)
342 :     $/basis.cm (* same as $basis.cm/basis.cm *)
343 :    
344 : blume 672 There used to be a notion of "implicit" anchors where in the case that
345 :     <anchor> is a known anchor, paths of the form
346 : blume 643
347 :     <anchor>/<more>...
348 :    
349 : blume 672 were interpreted as if they had been written
350 : blume 643
351 : blume 672 $<anchor>/<anchor>/<more>...
352 : blume 643
353 : blume 672 This is no longer the case. <foo>/<bar>... now always means what it
354 :     seems to mean: a relative path starting with an arc named <foo>.
355 : blume 643
356 :     + Why anchored paths?
357 :    
358 :     The important point is that one can change the mapping of the anchor,
359 :     and the tranlation of the (anchored) path name will also change
360 :     accordingly -- even very late in the game. CM avoids "elaborating"
361 :     path names until it really needs them when it is time to open files.
362 :     CM is also willing to re-elaborate the same names if there is reason
363 :     to do so. Thus, the "basis.cm" library that was analyzed "here" but
364 :     then moved "there" will also be found "there" if the anchor has been
365 :     re-set accordingly.
366 :    
367 :     The anchor mapping is (re-)initialized at startup time by reading two
368 :     configuration files. Normally, those are the "../../lib/pathconfig"
369 :     file and the ".smlnj-pathconfig" file in your home directory (if such
370 :     exists). During an ongoing session, function CM.Anchor.anchor can be
371 :     used to query and modify the anchor mapping.
372 :    
373 : monnier 416 + Different configurations at different times:
374 :    
375 :     During compilation of the compiler, CMB uses a path configuration that
376 :     is read from the file "pathconfig" located here in this directory.
377 :    
378 : blume 643 At bootstrap time (while running "makeml"), the same anchors are
379 :     mapped to the corresponding sub-directory of the "boot" directory:
380 :     basis.cm is mapped to sml.boot.<arch>-<os>/basis.cm -- which means
381 :     that CM will look for a library named
382 :     sml.boot.<arch>-<os>/basis.cm/basis.cm -- and so forth.
383 : monnier 416
384 : blume 643 [Note, there are some anchors in "pathconfig" that have no
385 :     corresponding sub-directory of the boot director. Examples are
386 :     "root.cm", "cm", and so on. The reason is that there are no stable
387 :     libraries whose description files are named using these anchors;
388 :     everything anchored at "$cm" is a group but not a library.]
389 :    
390 : monnier 416 By the way, you will perhaps notice that there is no file
391 : monnier 498 sml.boot.<arch>-<os>/basis.cm/basis.cm
392 : monnier 416 but there _is_ the corresponding stable archive
393 : monnier 498 sml.boot.<arch>-<os>/basis.cm/CM/<arch>-<os>/basis.cm
394 : monnier 416 CM always looks for stable archives first.
395 :    
396 :     This mapping (from anchors to names in the boot directory) is the one
397 :     that will get frozen into the generated heap image at boot time.
398 :     Thus, unless it is changed, CM will look for its libraries in the boot
399 : blume 643 directory. The aforementioned "testml" script will make sure (by
400 :     setting the environment variable CM_PATHCONFIG) that the mapping be
401 :     changed to the one specified in a new "pathconfig" file which was
402 :     created by makeml and placed into the test library directory. It
403 :     points all anchors to the corresponding entry in the test library
404 :     directory. Thus, "testml" will let a new heap image run with its
405 :     corresponding new libraries.
406 : monnier 416
407 :     Normally, however, CM consults other pathconfig files at startup --
408 :     files that live in standard locations. These files are used to modify
409 :     the path configuration to let anchors point to their "usual" places.
410 :     The names of the files that are read (if present) are configurable via
411 :     environment variables. At the moment they default to
412 :     /usr/lib/smlnj-pathconfig
413 :     and
414 :     $HOME/.smlnj-pathconfig
415 :     The first one is configurable via CM_PATHCONFIG (and the default is
416 :     configurable at boot time via CM_PATHCONFIG_DEFAULT); the last is
417 :     configurable via CM_LOCAL_PATHCONFIG and CM_LOCAL_PATHCONFIG_DEFAULT.
418 :     In fact, the makeml script sets the CM_PATHCONFIG_DEFAULT variable
419 :     before making the heap image. Therefore, heap images generated by
420 :     makeml will look for their global pathconfig file in
421 :    
422 : blume 643 ../../lib/pathconfig
423 : monnier 416
424 : blume 643 [Note: The "makeml" script will not re-set the CM_PATHCONFIG_DEFAULT
425 :     variable if it was already set before. If it does re-set the
426 :     variable, it uses an absolute path name instead of the relative path
427 :     that I used for illustration above.]
428 :    
429 : monnier 429 For example, I always keep my "good" libraries in `pwd`/../../lib --
430 : blume 643 where both the main "install" script (in config/install.sh) and the
431 :     "installml" script (see above) also put them -- so I don't have to do
432 :     anything special about my pathconfig file.
433 : monnier 416
434 :     Once I have new heap image and libraries working, I replace the old
435 :     "good" image with the new one:
436 :    
437 :     mv <image>.<arch>-<osvariant> ../../bin/.heap/sml.<arch>-<osvariant>
438 :    
439 : blume 645 After this I must also move all libraries from <image>.libs/* to their
440 : blume 573 corresponding position in ../../lib.
441 : monnier 416
442 : blume 573 Since this is cumbersome to do by hand, there is a script called
443 :     "installml" that automates this task. Using the script has the added
444 :     advantage that it will not clobber libraries that belong to other than
445 :     the current architecture. (A rather heavy-handed "rm/mv" approach
446 :     will delete all stable libraries for all architectures.)
447 :     "installml" also patches the ../../lib/pathconfig file as necessary.
448 : monnier 416
449 :     Of course, you can organize things differently for yourself -- the
450 : blume 643 path configuration mechanism should be sufficiently flexible. If you
451 :     do so, you will have to set CM_PATHCONFIG. This must be done before
452 :     you start sml. If you want to change the pathname mapping at the time
453 :     sml is already running, then use the functions in CM.Anchor.
454 : monnier 416
455 :     * Libraries vs. Groups
456 :     ----------------------
457 :    
458 :     With the old CM, "group" was the primary concept while "library" and
459 :     "stabilization" could be considered afterthoughts. This has changed.
460 :     Now "library" is the primary concept, "stabilization" is semantically
461 :     significant, and "groups" are a secondary mechanism.
462 :    
463 :     Libraries are used to "structure the world"; groups are used to give
464 :     structure to libraries. Each group can be used either in precisely
465 :     one library (in which case it cannot be used at the interactive
466 :     toplevel) or at the toplevel (in which case it cannot be used in any
467 :     library). In other words, if you count the toplevel as a library,
468 :     then each group has a unique "owner" library. Of course, there still
469 :     is no limit on how many times a group can be mentioned as a member of
470 :     other groups -- as long as all these other groups belong to the same
471 :     owner library.
472 :    
473 : blume 643 Normally, collections of files that belong together should be made
474 :     into proper CM libraries. CM groups (aka "library components") should
475 :     be used only when there are namespace problems within a library.
476 : monnier 416
477 :     Aside from the fact that I find this design quite natural, there is
478 :     actually a technical reason for it: when you stabilize a library
479 :     (groups cannot be stabilized), then all its sub-groups (not
480 :     sub-libraries!) get "sucked into" the stable archive of the library.
481 :     In other words, even if you have n+1 CM description files (1 for the
482 :     library, n for n sub-groups), there will be just one file representing
483 :     the one stable archive (per architecture/os) for the whole thing. For
484 :     example, I structured the standard basis into one library with two
485 : blume 569 sub-groups, but once you compile it (CMB.make) there is only one
486 : monnier 416 stable file that represents the whole basis library. If groups were
487 :     allowed to appear in more than one library, then stabilization would
488 :     duplicate the group (its code, its environment data structures, and
489 :     even its dynamic state).
490 :    
491 :     There is a small change to the syntax of group description files: they
492 :     must explicitly state which library they belong to. CM will verify
493 :     that. The owner library is specified in parentheses after the "group"
494 :     keyword. If the specification is missing (that's the "old" syntax),
495 :     then the the owner will be taken to be the interactive toplevel.
496 :    
497 : blume 643 * Pervasive environment, core environment, the init library "init.cmi"
498 : monnier 416 -------------------------------------------------------------------------
499 :    
500 : blume 569 CMB.make starts out by building and compiling the
501 : blume 643 "init library". This library cannot be described in the "usual" way
502 : blume 537 because it uses "magic" in three ways:
503 :     - it is used to later tie in the runtime system
504 : blume 643 - it binds the "_Core" structure
505 : blume 569 - it exports the "pervasive" environment
506 : monnier 416
507 : blume 537 The pervasive environment no longer includes the entire basis library
508 :     but only non-modular bindings (top-level bindings of variables and
509 :     types).
510 :    
511 : blume 569 CM cannot automatically determine dependencies (or exports) for the
512 : blume 643 init library source files, but it still does use its regular cutoff
513 : blume 569 recompilation mechanism. Therefore, dependencies must be given
514 :     explicitly. This is done by a special description file which
515 : blume 643 currently lives in smlnj/init/init.cmi (as an anchored path:
516 :     "$smlnj/init/init.cmi"). See the long comment at the beginning of
517 :     that file for more details.
518 : monnier 416
519 : blume 643 After it is built, $smlnj/init/init.cmi can be used as an "ordinary"
520 :     library by other libraries. (This is done, for example, by the
521 :     implementation of the Basis library.) Access to
522 :     "$smlnj/init/init.cmi" is protected by the privilege named
523 :     "primitive". Also, note that the .cmi-file is not automatically
524 :     recognized as as CM description file. ("cmi" should remind you of "CM
525 :     - Initial library".) Therefore, it must be given an explicit member
526 :     class:
527 : blume 537
528 : blume 643 $smlnj/init/init.cmi : cm
529 : blume 569
530 : monnier 416 * Autoloader
531 :     ------------
532 :    
533 :     The new system heavily relies on the autoloader. As a result, almost
534 : blume 569 no static environments need to get unpickled at bootstrap time. The
535 : monnier 416 construction of such environments is deferred until they become
536 : blume 632 necessary. Thanks to this, it was possible to reduce the size of the
537 : blume 569 heap image by more than one megabyte (depending on the architecture).
538 :     The downside (although not really terribly bad) is that there is a
539 :     short wait when you first touch an identifier that hasn't been touched
540 : monnier 416 before. (I acknowledge that the notion of "short" may depend on your
541 :     sense of urgency. :-)
542 :    
543 :     The reliance on the autoloader (and therefore CM's library mechanism)
544 :     means that in order to be able to use the system, your paths must be
545 :     properly configured.
546 :    
547 : blume 652 Several libraries get pre-registered at bootstap time. Here, at least
548 :     the following two should be included: the basis library ("$/basis.cm")
549 :     and CM itself ("$smlnj/cm.cm"). Currently, we also pre-register the
550 :     library exporting structure Compiler ($smlnj/compiler.cm) and the
551 :     SML/NJ library ($/smlnj-lib.cm).
552 : monnier 416
553 :     Here are some other useful libraries that are not pre-registered but
554 :     which can easily be made accessible via CM.autoload (or, non-lazily,
555 :     via CM.make):
556 :    
557 : blume 643 $smlnj/cmb.cm - provides "structure CMB"
558 :     $smlnj/cmb/current.cm - same as $smlnj/cmb.cm
559 :     $smlnj/compiler/all.cm - provides "structure <Arch>Compiler" and
560 : monnier 416 "structure <Arch><OS>CMB" for various
561 :     values of <Arch> and <OS>
562 :    
563 : blume 652 The file preloads.standard here in this directory currently includes
564 :     $smlnj/cmb.cm. This means that by doing ./makeml one obtains a heap
565 :     image with the bootstrap compiler being pre-registered as well. This
566 :     seems reasonable for compiler hackers. (The config/install.sh script
567 :     uses config/preloads where $smlnj/cmb.cm is not pre-registered. This
568 :     is appropriate as a setup for general users.)
569 : blume 643
570 : blume 652 [****** Note: The following is NO LONGER TRUE:
571 :     The fact that $smlnj/compiler.cm is not among the pre-registered
572 :     libraries seems like an oversight and could lead to some
573 :     inconveniences to users who want to, for example, set compiler flags.
574 :     However, pre-registration of this library significantly increases the
575 :     size of the heap image. Moreover, since the library can easily be
576 :     loaded by giving the string as a command line argument, this does not
577 :     really appear to be a big burden to me. Just create a shell alias or
578 :     a little wrapper script if you think you really need this.]
579 :    
580 : monnier 416 * Internal sharing
581 :     ------------------
582 :    
583 :     Dynamic values of loaded modules are shared. This is true even for
584 :     those modules that are used by the interactive compiler itself. If
585 :     you load a module from a library that is also used by the interactive
586 :     compiler, then "loading" means "loading the static environmnent" -- it
587 :     does not mean "loading the code and linking it". Instead, you get to
588 :     share the compiler's dynamic values (and therefore the executable
589 :     code as well).
590 :    
591 :     Of course, if you load a module that hasn't been loaded before and
592 :     also isn't used by the interactive system, then CM will get the code
593 :     and link (execute) it.
594 :    
595 :     * Access control
596 :     ----------------
597 :    
598 :     In some places, you will find that the "group" and "library" keywords
599 :     in description files are preceeded by certain strings, sometimes in
600 :     parentheses. These strings are the names of "privileges". Don't
601 :     worry about them too much at the moment. For the time being, access
602 :     control is not enforced, but the infrastructure is in place.
603 :    
604 :     * Preprocessor
605 :     --------------
606 :    
607 :     The syntax of expressions in #if and #elif clauses is now more ML-ish
608 :     instead of C-ish. (Hey, this is ML after all!) In particular, you
609 :     must use "andalso", "orelse", and "not" instead of "&&", "||" and "!".
610 :     Unary minus is "~".
611 :    
612 :     A more interesting change is that you can now query the exports of
613 :     sources/subgroups/sublibraries:
614 :    
615 :     - Within the "members" section of the description (i.e., after "is"):
616 :     The expression
617 :     defined(<namespace> <name>)
618 :     is true if any of the included members preceeding this clause exports
619 :     a symbol "<namespace> <name>".
620 : blume 632 - Within the "exports" section of the description (i.e., before "is"):
621 : monnier 416 The same expression is true if _any_ of the members exports the
622 :     named symbol.
623 :     (It would be more logical if the exports section would follow the
624 :     members section, but for esthetic reasons I prefer the exports
625 :     section to come first.)
626 :    
627 :     Example:
628 :    
629 :     +--------------------------+
630 :     |Library |
631 :     | structure Foo |
632 :     |#if defined(structure Bar)|
633 :     | structure Bar |
634 :     |#endif |
635 :     |is |
636 :     |#if SMLNJ_VERSION > 110 |
637 :     | new-foo.sml |
638 :     |#else |
639 :     | old-foo.sml |
640 :     |#endif |
641 :     |#if defined(structure Bar)|
642 :     | bar-client.sml |
643 :     |#else |
644 :     | no-bar-so-far.sml |
645 :     |#endif |
646 :     +--------------------------+
647 :    
648 :     Here, the file "bar-client.sml" gets included if SMLNJ_VERSION is
649 :     greater than 110 and new-foo.sml exports a structure Bar _or_ if
650 :     SMLNJ_VERSION <= 110 and old-foo.sml exports structure Bar. Otherwise
651 :     "no-bar-so-far.sml" gets included instead. In addition, the export of
652 :     structure Bar is guarded by its own existence. (Structure Bar could
653 :     also be defined by "no-bar-so-far.sml" in which case it would get
654 :     exported regardless of the outcome of the other "defined" test.)
655 :    
656 :     Some things to note:
657 :    
658 :     - For the purpose of the pre-processor, order among members is
659 :     significant. (For the purpose of dependency analysis, order continues
660 :     to be not significant).
661 :     - As a consequence, in some cases pre-processor dependencies and
662 :     compilation-dependencies may end up to be opposites of each other.
663 :     (This is not a problem; it may very well be a feature.)
664 :    
665 :     * The Basis Library is no longer built-in
666 :     -----------------------------------------
667 :    
668 :     The SML'97 basis is no longer built-in. If you want to use it, you
669 :     must specify "basis.cm" as a member of your group/library.
670 :    
671 :     * No more aliases
672 :     -----------------
673 :    
674 :     The "alias" feature is no longer with us. At first I thought I could
675 :     keep it, but it turns out that it causes some fairly fundamental
676 :     problems with the autoloader. However, I don't think that this is a
677 :     big loss because path anchors make up for most of it. Moreover,
678 :     stable libraries can now easily be moved to convenient locations
679 :     without having to move large source trees at the same time. (See my
680 : blume 643 new config/install.sh script for examples of that.)
681 : monnier 416
682 : blume 573 It is possible to simulate aliases (in a way that is safer than the
683 :     original alias mechanism). For example, the root.cm file (which is the
684 :     root of the whole system as far as CMB.make is concerned) acts as an
685 : blume 643 alias for $smlnj/internal/intsys.cm. In this case, root.cm is a group
686 : blume 573 to avoid having a (trivial) stable library file built for it.
687 :    
688 :     A library can act as an "alias" for another library if it has a
689 :     verbatim copy of the export list and mentions the other library as its
690 : blume 643 only member. Examples for this are $smlnj/cm.cm (for
691 :     $smlnj/cm/full.cm), $smlnj/compiler.cm (for $smlnj/compiler/current.cm),
692 :     etc. The stable library file for such an "alias" is typically very
693 :     small because it basically just points to the other library. (For
694 :     example, the file representing $smlnj/cm.cm is currently 234 bytes
695 :     long.)
696 : blume 573
697 : monnier 416 * Don't use relative or absolute pathnames to refer to libraries
698 :     ----------------------------------------------------------------
699 :    
700 :     Don't use relative or absolute pathnames to refer to libraries. If
701 :     you do it anyway, you'll get an appropriate warning at the time when
702 : blume 569 you do CMB.make(). If you use relative or absolute pathnames to
703 : monnier 416 refer to library B from library A, you will be committed to keeping B
704 :     in the same relative (to A) or absolute location. This, clearly,
705 : blume 643 would be undesirable in many situations (although perhaps not always).

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0