Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/system/README
ViewVC logotype

Annotation of /sml/trunk/src/system/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 710 - (view) (download)

1 : blume 537 Compiler Hacker's Guide to the new CM...
2 :     ========================================
3 : monnier 416
4 : blume 643 Last change: 12/5/2000
5 : blume 537
6 : monnier 416 * Libraries
7 :     -----------
8 :    
9 :     The new way of building the compiler is heavily library-oriented.
10 :     Aside from a tiny portion of code that is responsible for defining the
11 :     pervasive environment, _everything_ lives in libraries. Building the
12 :     compiler means compiling and stabilizing these libraries first. Some
13 :     of the libraries exist just for reasons of organizing the code, the
14 :     other ones are potentially useful in their own right. Therefore, as a
15 :     beneficial side-effect of compiling the compiler, you will end up with
16 :     stable versions of these libraries.
17 :    
18 :     At the moment, the following libraries are constructed when compiling
19 :     the compiler ("*" means that I consider the library potentially useful
20 :     in its own right):
21 :    
22 : blume 643 * $/basis.cm SML'97 Basis Library (pre-loaded)
23 :     * $/smlnj-lib.cm SML/NJ Utility Library
24 :     * $/html-lib.cm SML/NJ HTML Library
25 :     * $/pp-lib.cm SML/NJ Pretty-print Library
26 : monnier 416
27 : blume 643 * $/ml-yacc-lib.cm SML/NJ ML-Yacc runtime library
28 : blume 573
29 : blume 643 * $smlnj/compiler/{alpha32,hppa,ppc,sparc,x86}.cm
30 : blume 573 cross-compiler libraries, exporting
31 :     structure <Arch>Compiler
32 : blume 643 * $smlnj/compiler/current.cm structure Compiler (current arch)
33 :     * $smlnj/compiler/all.cm all cross-compilers and all cross-CMBs
34 : blume 573
35 : blume 652 * $smlnj/cm/full.cm structure CM (see manual)
36 : blume 643 * $smlnj/cm/tools.cm CM tools library
37 : blume 573
38 : blume 643 * $smlnj/cmb/{alpha32,hppa,ppc,sparc,x86}-unix.cm
39 : blume 573 cross-bootstrap-compilers for Unix
40 :     (structure <Arch>UnixCMB)
41 : blume 643 * $smlnj/cmb/ppc-macos.cm ...for Mac (structure PPCMacosCMB)
42 :     * $smlnj/cmb/x86-win32.cm ...for Windoze (structure X86Win32CMB)
43 :     * $smlnj/cmb/current.cm structure CMB (current arch/os)
44 : blume 573
45 : blume 643 * $smlnj/compiler.cm abbrev. for $smlnj/compiler/current.cm
46 :     * $smlnj/cm.cm abbrev. for $smlnj/cm/full.cm
47 :     * $smlnj/cmb.cm abbrev. for $smlnj/cmb/current.cm
48 : blume 573
49 : blume 677 * $/comp-lib.cm Utility library for compiler
50 : blume 573
51 : blume 643 - $smlnj/viscomp/core.cm Compiler core functionality
52 :     - $smlnj/viscomp/{alpha32,hppa,ppc,sparc,x86}.cm
53 : blume 573 Machine-specific parts of compiler
54 :    
55 : blume 652 - $smlnj/internal/{intsys,cm-lib,host-compiler-0}.cm
56 : blume 573 Glue that holds the interactive system
57 :     together
58 :    
59 : blume 643 * $MLRISC/{MLRISC,Control,Lib,ALPHA,HPPA,PPC,SPARC,IA32}.cm
60 : blume 573 Various MLRISC bits
61 : blume 643 (Other MLRISC libraries such as
62 :     Graph, Visual, etc. do not currently
63 :     take part in the SML/NJ build.)
64 : blume 573
65 : blume 677 * $/{mlyacc,mllex,mlburg}-tool.cm CM plug-in libraries for common tools
66 :     * $/{grm,lex,burg}-ext.cm CM plug-in libraries for common file
67 : blume 573 extensions
68 :    
69 : blume 677 Paths of the form $/foo/<more> are shorthands for $foo/foo/<more>.
70 : blume 643
71 :     A more complete explanation of the $-notation can be found later in
72 :     this document or in the CM manual.
73 :    
74 :     To learn about the definitions of the $-anchors (and, thus, where in
75 :     the source tree the above libraries are defined), consult the
76 :     "pathconfig" file here in this directory.
77 :    
78 : monnier 416 * Before you can use the bootstrap compiler (CMB)...
79 :     ----------------------------------------------------
80 :    
81 :     To be able to use CMB at all, you must first say
82 :    
83 : blume 643 CM.autoload "$smlnj/cmb.cm";
84 : monnier 416
85 : blume 569 after you start sml. Alternatively -- and perhaps more conveniently --
86 : blume 643 you can provide "$smlnj/cmb.cm" as a command-line argument to sml:
87 : monnier 416
88 : blume 643 $ sml '$smlnj/cmb.cm'
89 : monnier 416
90 : blume 643 (Be sure to protect the dollar symbol which usually has its own
91 :     special meaning to the shell.)
92 :    
93 : blume 569 * Compiling the compiler
94 :     ------------------------
95 : monnier 416
96 : blume 569 We are now back to the old scheme where a call to CMB.make() suffices to
97 :     build a bootable set of files (libraries in our case). CMB.make maintains
98 :     two parallel hierarchies of derived files:
99 : monnier 416
100 : blume 569 1. the binfile hierarchy ("binfiles"), containing compiled objects for
101 :     each individual ML source file; this hierarchy is rooted at
102 :     <prefix>.bin.<arch>-<opsys>
103 :     2. the stable library hierarchy ("boot files"), containing library files
104 :     for each library that participates in building SML/NJ; this hierarchy
105 :     is rooted at
106 :     <prefix>.boot.<arch>-<opsys>
107 : monnier 416
108 : blume 569 The default for <prefix> is "sml". It can be changed by using
109 :     CMB.make' with the new <prefix> as the optional string argument.
110 : monnier 416
111 : blume 643 CMB.make reuses existing bootfiles after it has verified that they are
112 :     consistent with their corresponding binfiles. Bootfiles do not need
113 :     to be deleted in order for CMB.make to work correctly.
114 : monnier 416
115 : blume 569 To bootstrap a new system (using the runtime system boot loader), the
116 :     bootfiles _must_ be present, the binfiles need not be present (but
117 :     their presence does not hurt either).
118 : monnier 416
119 : monnier 498 You can reduce the number of extra files compiled and stabilized
120 : blume 569 during CMB.make at the expense of not building any cross-compilers.
121 : monnier 498 For that, say
122 :     #set (CMB.symval "LIGHT") (SOME 1);
123 : blume 569 before running CMB.make.
124 : monnier 498
125 : monnier 416 * Making the heap image
126 :     -----------------------
127 :    
128 :     The heap image is made by running the "makeml" script that you find
129 :     here in this directory. By default it will try to refer to the
130 : monnier 498 sml.boot.<arch>-<os> directory. You can change this using the -boot
131 : monnier 416 argument (which takes the full name of the boot directory to be used).
132 :    
133 :     The "feel" of using makeml should be mostly as it used to. However,
134 :     internally, there are some changes that you should be aware of:
135 :    
136 : blume 569 1. The script will make a heap image and build a separate library directory
137 : blume 643 that contains (hard) links to the library files in the bootfile directory.
138 : monnier 416
139 :     2. There is no "-full" option anymore. This functionality should
140 :     eventually be provided by a library with a sufficiently rich export
141 :     interface.
142 :    
143 :     3. No image will be generated if you use the -rebuild option.
144 :     Instead, the script quits after making new bin and new boot
145 :     directories. You must re-invoke makeml with a suitable "-boot"
146 :     option to actually make the image. The argument to "-rebuild"
147 :     is the <prefix> for the new bin and boot directories (see above).
148 :    
149 : blume 643 [Note: When the -rebuild option is specified, then the boot procedure
150 :     will not read static environments from the boot directory. Instead,
151 :     after the ML code has been loaded and linked, the system will invoke
152 :     CMB.make' with the argument that was given to -rebuild. After
153 :     CMB.make' is done, the system quits. In essence, makeml with -rebuild
154 :     acts as a bootstrap compiler that is not dependent on any usable
155 : blume 645 static environments.]
156 : blume 643
157 : blume 569 Makeml will not destroy the bootfile directory.
158 : monnier 416
159 :     * Testing a newly generated heap image
160 :     --------------------------------------
161 :    
162 :     If you use a new heap image by saying "sml @SMLload=..." then things
163 :     will not go as you may expect because along with the new heap image
164 :     should go those new stable libraries, but unless you do something
165 : blume 569 about it, the newly booted system will look for its stable libraries
166 : blume 643 in places where you stored your _old_ stable libraries. (After just
167 :     having done "makeml", these "places" would be within the boot file
168 :     hierarchy under <prefix>.boot.<arch>-<os>.)
169 : monnier 416
170 :     After you have made the new heap image, the new libraries are in a
171 :     separate directory whose name is derived from the name of the heap
172 : blume 569 image. (Actually, only the directory hierachy is separate, the
173 : blume 643 library files themselves are hard links.) The "testml" script that
174 :     you also find here will run the heap image and instruct it to look for
175 :     its libraries in that new library directory by setting the
176 :     CM_PATHCONFIG environment variable to point to a different pathconfig
177 :     file under <prefix>.lib.
178 : monnier 416
179 : monnier 498 "testml" takes the <prefix> of the heap image as its first
180 :     argument. All other arguments are passed verbatim to the ML process.
181 :    
182 :     The <prefix> is the same as the one used when you did "makeml". If
183 :     you run "testml" without arguments, <prefix> defaults to "sml".
184 : blume 645 Thus, if you just said "makeml" without arguments you can also say
185 :     "testml" without arguments. (Note that you _must_ supply the <prefix>
186 : monnier 498 argument if you intend to pass any additional arguments.)
187 :    
188 : monnier 416 * Installing a heap image for more permanent use
189 :     ------------------------------------------------
190 :    
191 : monnier 498 You can "install" a newly generated heap image by replacing the old
192 : blume 645 image with the new one AND AT THE SAME TIME replacing the old stable
193 : monnier 498 libaries with the new ones. To do this, run the "installml" script.
194 : monnier 416
195 : monnier 498 Like "testml", "installml" also expects the <prefix> as its first
196 :     argument. <prefix> defaults to "sml" if no argument is specified.
197 :    
198 :     "installml" patches the ../../lib/pathconfig file to reflect any
199 : blume 643 changes or additions to the path name mapping. (I say "patches"
200 :     because entries unrelated to the SML/NJ build process are retained in
201 :     their original form.) If you want to use a destination directory that
202 :     is different from ../../lib, then you must do this by hand (i.e.,
203 : blume 645 installml does not have an option for that).
204 : monnier 498
205 : blume 569 Thus, after a successful CMB.make, you should say
206 : monnier 498
207 :     ./makeml
208 :    
209 :     to make the new heap image + libraries, then
210 :    
211 :     ./testml
212 :    
213 :     to make sure everything works, and finally
214 :    
215 :     ./installml
216 :    
217 :     to replace your existing compiler with the one you just built and tested.
218 :    
219 : monnier 416 * Cross-compiling
220 :     -----------------
221 :    
222 : blume 643 All cross-compilers live in the "$smlnj/compiler/all.cm" library.
223 :     (The source tree for the "$smlnj" anchor -- see "pathconfig" -- is
224 :     src/system/smlnj, but this should normally not concern you.)
225 :     You must first say
226 : monnier 416
227 : blume 643 CM.autoload "$smlnj/compiler/all.cm";
228 : monnier 416
229 :     before you can access them. (This step corresponds to the old
230 :     CMB.retarget call.) After that, _all_ cross-compilers are available
231 :     at the same time. However, the ones that you are not using don't take
232 :     up any undue space because they only get loaded once you actually
233 : blume 645 mention them at top level. The names of the structures currently
234 : blume 643 exported by $smlnj/compiler/all.cm are:
235 : monnier 416
236 :     structure Alpha32UnixCMB
237 :     structure HppaUnixCMB
238 :     structure PPCMacOSCMB
239 :     structure PPCUnixCMB
240 :     structure SparcUnixCMB
241 :     structure X86UnixCMB
242 :     structure X86Win32CMB
243 :    
244 :     structure Alpha32Compiler
245 :     structure HppaCompiler
246 :     structure PPCCompiler
247 :     structure SparcCompiler
248 :     structure X86Compiler
249 :    
250 :     (PPCMacOSCMB is not very useful at the moment because there is no
251 :     implementation of the basis library for the MacOS.)
252 :    
253 : monnier 498 Alternatively, you can select just the one single structure that you
254 : blume 643 are interested in by auto-loading $smlnj/compiler/<arch>.cm or
255 :     $smlnj/cmb/<arch>-<os>.cm.
256 : monnier 498 <arch> currently ranges over "alpha32", "hppa", "ppc", "sparc", and "x86.
257 :     <os> can be either "unix" or "macos" or "win32".
258 :     (Obviously, not all combinations are valid.)
259 :    
260 : blume 643 Again, as with $smlnj/cmb.cm, you can specify the .cm file as an
261 : blume 569 argument to the sml command:
262 :    
263 : blume 643 $ sml '$smlnj/compiler/all.cm'
264 : blume 569
265 :     or
266 :    
267 : blume 643 $ sml '$smlnj/cmb/alpha32-unix.cm'
268 : blume 569
269 : blume 643 [Note: The command line for the "sml" command accepts configuration
270 :     parameters of the form "@SMLxxx...", mode switches of the form "-m"
271 :     and "-a", names of ML files -- which are passed to "use" -- and
272 :     arguments suitable for CM.make or CM.autoload. CM.autoload is the
273 :     default; the "-m" and "-a" mode switches can be used to change the
274 :     default -- even several times within the same command line.
275 :     A single argument "@CMslave" is also accepted, but it should not be
276 :     used directly as it is intended for use by the parallel compilation
277 :     facility within CM.]
278 :    
279 : monnier 416 * Path configuration
280 :     --------------------
281 :    
282 :     + Basics:
283 :    
284 :     One of the new features of CM is its handling of path names. In the
285 :     old CM, one particular point of trouble was the autoloader. It
286 :     analyzes a group or library and remembers the locations of associated
287 :     files. Later, when the necessity arises, those files will be read.
288 :     Therefore, one was asking for trouble if the current working directory
289 :     was changed between analysis- and load-time, or, worse, if files
290 :     actually moved about (as is often the case if build- and
291 :     installation-directories are different, or, to put it more generally,
292 :     if CM's state is frozen into a heap image and used in a different
293 :     environment).
294 :    
295 :     Maybe it would have been possible to work around most of these
296 :     problems by fixing the path-lookup mechanism in the old CM and using
297 :     it extensively. But path-lookup (as in the Unix-shell's "PATH") is
298 :     inherently dangerous because one can never be too sure what it will be
299 :     that is found on the path. A new file in one of the directories early
300 :     in the path can upset the program that hopes to find something under
301 :     the same name later on the path. Even when ignoring security-issues
302 :     like trojan horses and such, this definitely opens the door for
303 : monnier 498 various unpleasant surprises. (Who has never named a test version
304 : monnier 416 of a program "test" an found that it acts strangely only to discover
305 :     later that /bin/test was run instead?)
306 :    
307 :     Thus, the new scheme used by CM is a fixed mapping of what I call
308 :     "configuration anchors" to corresponding directories. The mapping can
309 :     be changed, but one must do so explicitly. In effect, it does not
310 :     depend on the contents of the file system. Here is how it works:
311 :    
312 : blume 643 If I specify a pathname that starts with a "$", then the first arc
313 :     between "$" and the first "/" is taken as the name of a so-called
314 :     "anchor". CM knows a mapping from anchor names to directory names and
315 :     replaces the prefix $<anchor> with the name of the corresponding
316 :     directory. Therefore, an anchored path has the general form
317 : monnier 416
318 : blume 643 $<anchor>/<path>
319 : monnier 416
320 : blume 643 It is important that there is at least one arc in <path>. In other
321 : blume 710 words, the form $<anchor> is NOT valid.
322 : blume 643
323 : blume 710 (Actually, under certain circumstances it _is_ valid -- and means what
324 :     it seems to mean, namely the directory denoted by the name that
325 :     <anchor> is mapped to. However, since directory names do not usually
326 :     occur by themselves, you can think of this form as being invalid.
327 :     There is one exception to this: "bind" specifications for .cm files.
328 :     See the CM manual for more details.)
329 :    
330 : blume 643 Examples:
331 :    
332 :     $smlnj/compiler/all.cm
333 :     $basis.cm/basis.cm
334 :     $MLRISC/Control.cm
335 :    
336 :     The special case where <anchor> coincides with the first arc of <path>
337 :     can be abbreviated by ommitting <anchor>. This leads to the shorthand
338 :    
339 :     $/<anchor>/<more>...
340 :    
341 :     for the longer
342 :    
343 :     $<anchor>/<anchor>/<more>...
344 :    
345 :     Examples:
346 :    
347 :     $/foo/bar/baz.cm (* same as $foo/foo/bar/baz.cm *)
348 :     $/basis.cm (* same as $basis.cm/basis.cm *)
349 :    
350 : blume 672 There used to be a notion of "implicit" anchors where in the case that
351 :     <anchor> is a known anchor, paths of the form
352 : blume 643
353 :     <anchor>/<more>...
354 :    
355 : blume 672 were interpreted as if they had been written
356 : blume 643
357 : blume 672 $<anchor>/<anchor>/<more>...
358 : blume 643
359 : blume 672 This is no longer the case. <foo>/<bar>... now always means what it
360 :     seems to mean: a relative path starting with an arc named <foo>.
361 : blume 643
362 :     + Why anchored paths?
363 :    
364 :     The important point is that one can change the mapping of the anchor,
365 : blume 710 and the tranlation of the (anchored) path name -- together will all
366 :     file names derived from it -- will also change accordingly -- even
367 :     very late in the game. CM avoids "elaborating" path names until it
368 :     really needs them when it is time to open files. CM is also willing
369 :     to re-elaborate the same names if there is reason to do so. Thus, the
370 :     "basis.cm" library that was analyzed "here" but then moved "there"
371 :     will also be found "there" if the anchor has been re-set accordingly.
372 : blume 643
373 :     The anchor mapping is (re-)initialized at startup time by reading two
374 :     configuration files. Normally, those are the "../../lib/pathconfig"
375 :     file and the ".smlnj-pathconfig" file in your home directory (if such
376 :     exists). During an ongoing session, function CM.Anchor.anchor can be
377 :     used to query and modify the anchor mapping.
378 :    
379 : monnier 416 + Different configurations at different times:
380 :    
381 :     During compilation of the compiler, CMB uses a path configuration that
382 :     is read from the file "pathconfig" located here in this directory.
383 :    
384 : blume 643 At bootstrap time (while running "makeml"), the same anchors are
385 :     mapped to the corresponding sub-directory of the "boot" directory:
386 :     basis.cm is mapped to sml.boot.<arch>-<os>/basis.cm -- which means
387 :     that CM will look for a library named
388 :     sml.boot.<arch>-<os>/basis.cm/basis.cm -- and so forth.
389 : monnier 416
390 : blume 643 [Note, there are some anchors in "pathconfig" that have no
391 :     corresponding sub-directory of the boot director. Examples are
392 :     "root.cm", "cm", and so on. The reason is that there are no stable
393 :     libraries whose description files are named using these anchors;
394 :     everything anchored at "$cm" is a group but not a library.]
395 :    
396 : monnier 416 By the way, you will perhaps notice that there is no file
397 : monnier 498 sml.boot.<arch>-<os>/basis.cm/basis.cm
398 : monnier 416 but there _is_ the corresponding stable archive
399 : monnier 498 sml.boot.<arch>-<os>/basis.cm/CM/<arch>-<os>/basis.cm
400 : monnier 416 CM always looks for stable archives first.
401 :    
402 :     This mapping (from anchors to names in the boot directory) is the one
403 :     that will get frozen into the generated heap image at boot time.
404 :     Thus, unless it is changed, CM will look for its libraries in the boot
405 : blume 643 directory. The aforementioned "testml" script will make sure (by
406 :     setting the environment variable CM_PATHCONFIG) that the mapping be
407 :     changed to the one specified in a new "pathconfig" file which was
408 :     created by makeml and placed into the test library directory. It
409 :     points all anchors to the corresponding entry in the test library
410 :     directory. Thus, "testml" will let a new heap image run with its
411 :     corresponding new libraries.
412 : monnier 416
413 :     Normally, however, CM consults other pathconfig files at startup --
414 :     files that live in standard locations. These files are used to modify
415 :     the path configuration to let anchors point to their "usual" places.
416 :     The names of the files that are read (if present) are configurable via
417 :     environment variables. At the moment they default to
418 :     /usr/lib/smlnj-pathconfig
419 :     and
420 :     $HOME/.smlnj-pathconfig
421 :     The first one is configurable via CM_PATHCONFIG (and the default is
422 :     configurable at boot time via CM_PATHCONFIG_DEFAULT); the last is
423 :     configurable via CM_LOCAL_PATHCONFIG and CM_LOCAL_PATHCONFIG_DEFAULT.
424 :     In fact, the makeml script sets the CM_PATHCONFIG_DEFAULT variable
425 :     before making the heap image. Therefore, heap images generated by
426 :     makeml will look for their global pathconfig file in
427 :    
428 : blume 643 ../../lib/pathconfig
429 : monnier 416
430 : blume 643 [Note: The "makeml" script will not re-set the CM_PATHCONFIG_DEFAULT
431 :     variable if it was already set before. If it does re-set the
432 :     variable, it uses an absolute path name instead of the relative path
433 :     that I used for illustration above.]
434 :    
435 : monnier 429 For example, I always keep my "good" libraries in `pwd`/../../lib --
436 : blume 643 where both the main "install" script (in config/install.sh) and the
437 :     "installml" script (see above) also put them -- so I don't have to do
438 :     anything special about my pathconfig file.
439 : monnier 416
440 :     Once I have new heap image and libraries working, I replace the old
441 :     "good" image with the new one:
442 :    
443 :     mv <image>.<arch>-<osvariant> ../../bin/.heap/sml.<arch>-<osvariant>
444 :    
445 : blume 645 After this I must also move all libraries from <image>.libs/* to their
446 : blume 573 corresponding position in ../../lib.
447 : monnier 416
448 : blume 573 Since this is cumbersome to do by hand, there is a script called
449 :     "installml" that automates this task. Using the script has the added
450 :     advantage that it will not clobber libraries that belong to other than
451 :     the current architecture. (A rather heavy-handed "rm/mv" approach
452 :     will delete all stable libraries for all architectures.)
453 :     "installml" also patches the ../../lib/pathconfig file as necessary.
454 : monnier 416
455 :     Of course, you can organize things differently for yourself -- the
456 : blume 643 path configuration mechanism should be sufficiently flexible. If you
457 :     do so, you will have to set CM_PATHCONFIG. This must be done before
458 :     you start sml. If you want to change the pathname mapping at the time
459 :     sml is already running, then use the functions in CM.Anchor.
460 : monnier 416
461 :     * Libraries vs. Groups
462 :     ----------------------
463 :    
464 :     With the old CM, "group" was the primary concept while "library" and
465 :     "stabilization" could be considered afterthoughts. This has changed.
466 :     Now "library" is the primary concept, "stabilization" is semantically
467 :     significant, and "groups" are a secondary mechanism.
468 :    
469 :     Libraries are used to "structure the world"; groups are used to give
470 :     structure to libraries. Each group can be used either in precisely
471 :     one library (in which case it cannot be used at the interactive
472 :     toplevel) or at the toplevel (in which case it cannot be used in any
473 :     library). In other words, if you count the toplevel as a library,
474 :     then each group has a unique "owner" library. Of course, there still
475 :     is no limit on how many times a group can be mentioned as a member of
476 :     other groups -- as long as all these other groups belong to the same
477 :     owner library.
478 :    
479 : blume 643 Normally, collections of files that belong together should be made
480 :     into proper CM libraries. CM groups (aka "library components") should
481 :     be used only when there are namespace problems within a library.
482 : monnier 416
483 :     Aside from the fact that I find this design quite natural, there is
484 :     actually a technical reason for it: when you stabilize a library
485 :     (groups cannot be stabilized), then all its sub-groups (not
486 :     sub-libraries!) get "sucked into" the stable archive of the library.
487 :     In other words, even if you have n+1 CM description files (1 for the
488 :     library, n for n sub-groups), there will be just one file representing
489 :     the one stable archive (per architecture/os) for the whole thing. For
490 :     example, I structured the standard basis into one library with two
491 : blume 569 sub-groups, but once you compile it (CMB.make) there is only one
492 : monnier 416 stable file that represents the whole basis library. If groups were
493 :     allowed to appear in more than one library, then stabilization would
494 :     duplicate the group (its code, its environment data structures, and
495 :     even its dynamic state).
496 :    
497 :     There is a small change to the syntax of group description files: they
498 :     must explicitly state which library they belong to. CM will verify
499 :     that. The owner library is specified in parentheses after the "group"
500 :     keyword. If the specification is missing (that's the "old" syntax),
501 :     then the the owner will be taken to be the interactive toplevel.
502 :    
503 : blume 643 * Pervasive environment, core environment, the init library "init.cmi"
504 : monnier 416 -------------------------------------------------------------------------
505 :    
506 : blume 569 CMB.make starts out by building and compiling the
507 : blume 643 "init library". This library cannot be described in the "usual" way
508 : blume 537 because it uses "magic" in three ways:
509 :     - it is used to later tie in the runtime system
510 : blume 643 - it binds the "_Core" structure
511 : blume 569 - it exports the "pervasive" environment
512 : monnier 416
513 : blume 537 The pervasive environment no longer includes the entire basis library
514 :     but only non-modular bindings (top-level bindings of variables and
515 :     types).
516 :    
517 : blume 569 CM cannot automatically determine dependencies (or exports) for the
518 : blume 643 init library source files, but it still does use its regular cutoff
519 : blume 569 recompilation mechanism. Therefore, dependencies must be given
520 :     explicitly. This is done by a special description file which
521 : blume 643 currently lives in smlnj/init/init.cmi (as an anchored path:
522 :     "$smlnj/init/init.cmi"). See the long comment at the beginning of
523 :     that file for more details.
524 : monnier 416
525 : blume 643 After it is built, $smlnj/init/init.cmi can be used as an "ordinary"
526 :     library by other libraries. (This is done, for example, by the
527 :     implementation of the Basis library.) Access to
528 :     "$smlnj/init/init.cmi" is protected by the privilege named
529 :     "primitive". Also, note that the .cmi-file is not automatically
530 :     recognized as as CM description file. ("cmi" should remind you of "CM
531 :     - Initial library".) Therefore, it must be given an explicit member
532 :     class:
533 : blume 537
534 : blume 643 $smlnj/init/init.cmi : cm
535 : blume 569
536 : monnier 416 * Autoloader
537 :     ------------
538 :    
539 :     The new system heavily relies on the autoloader. As a result, almost
540 : blume 569 no static environments need to get unpickled at bootstrap time. The
541 : monnier 416 construction of such environments is deferred until they become
542 : blume 632 necessary. Thanks to this, it was possible to reduce the size of the
543 : blume 569 heap image by more than one megabyte (depending on the architecture).
544 :     The downside (although not really terribly bad) is that there is a
545 :     short wait when you first touch an identifier that hasn't been touched
546 : monnier 416 before. (I acknowledge that the notion of "short" may depend on your
547 :     sense of urgency. :-)
548 :    
549 :     The reliance on the autoloader (and therefore CM's library mechanism)
550 :     means that in order to be able to use the system, your paths must be
551 :     properly configured.
552 :    
553 : blume 652 Several libraries get pre-registered at bootstap time. Here, at least
554 : blume 710 the following two should be included: the basis library ($/basis.cm)
555 :     and CM itself ($smlnj/cm.cm). Currently, we also pre-register the
556 : blume 652 library exporting structure Compiler ($smlnj/compiler.cm) and the
557 :     SML/NJ library ($/smlnj-lib.cm).
558 : monnier 416
559 :     Here are some other useful libraries that are not pre-registered but
560 :     which can easily be made accessible via CM.autoload (or, non-lazily,
561 :     via CM.make):
562 :    
563 : blume 643 $smlnj/cmb.cm - provides "structure CMB"
564 :     $smlnj/cmb/current.cm - same as $smlnj/cmb.cm
565 :     $smlnj/compiler/all.cm - provides "structure <Arch>Compiler" and
566 : monnier 416 "structure <Arch><OS>CMB" for various
567 :     values of <Arch> and <OS>
568 :    
569 : blume 652 The file preloads.standard here in this directory currently includes
570 :     $smlnj/cmb.cm. This means that by doing ./makeml one obtains a heap
571 :     image with the bootstrap compiler being pre-registered as well. This
572 :     seems reasonable for compiler hackers. (The config/install.sh script
573 :     uses config/preloads where $smlnj/cmb.cm is not pre-registered. This
574 :     is appropriate as a setup for general users.)
575 : blume 643
576 : monnier 416 * Internal sharing
577 :     ------------------
578 :    
579 :     Dynamic values of loaded modules are shared. This is true even for
580 :     those modules that are used by the interactive compiler itself. If
581 :     you load a module from a library that is also used by the interactive
582 :     compiler, then "loading" means "loading the static environmnent" -- it
583 :     does not mean "loading the code and linking it". Instead, you get to
584 :     share the compiler's dynamic values (and therefore the executable
585 :     code as well).
586 :    
587 :     Of course, if you load a module that hasn't been loaded before and
588 :     also isn't used by the interactive system, then CM will get the code
589 :     and link (execute) it.
590 :    
591 :     * Access control
592 :     ----------------
593 :    
594 :     In some places, you will find that the "group" and "library" keywords
595 :     in description files are preceeded by certain strings, sometimes in
596 :     parentheses. These strings are the names of "privileges". Don't
597 :     worry about them too much at the moment. For the time being, access
598 :     control is not enforced, but the infrastructure is in place.
599 :    
600 :     * Preprocessor
601 :     --------------
602 :    
603 :     The syntax of expressions in #if and #elif clauses is now more ML-ish
604 :     instead of C-ish. (Hey, this is ML after all!) In particular, you
605 :     must use "andalso", "orelse", and "not" instead of "&&", "||" and "!".
606 :     Unary minus is "~".
607 :    
608 :     A more interesting change is that you can now query the exports of
609 :     sources/subgroups/sublibraries:
610 :    
611 :     - Within the "members" section of the description (i.e., after "is"):
612 :     The expression
613 :     defined(<namespace> <name>)
614 :     is true if any of the included members preceeding this clause exports
615 :     a symbol "<namespace> <name>".
616 : blume 632 - Within the "exports" section of the description (i.e., before "is"):
617 : monnier 416 The same expression is true if _any_ of the members exports the
618 :     named symbol.
619 :     (It would be more logical if the exports section would follow the
620 :     members section, but for esthetic reasons I prefer the exports
621 :     section to come first.)
622 :    
623 :     Example:
624 :    
625 :     +--------------------------+
626 :     |Library |
627 :     | structure Foo |
628 :     |#if defined(structure Bar)|
629 :     | structure Bar |
630 :     |#endif |
631 :     |is |
632 :     |#if SMLNJ_VERSION > 110 |
633 :     | new-foo.sml |
634 :     |#else |
635 :     | old-foo.sml |
636 :     |#endif |
637 :     |#if defined(structure Bar)|
638 :     | bar-client.sml |
639 :     |#else |
640 :     | no-bar-so-far.sml |
641 :     |#endif |
642 :     +--------------------------+
643 :    
644 : blume 710 Here, the file bar-client.sml gets included if SMLNJ_VERSION is
645 : monnier 416 greater than 110 and new-foo.sml exports a structure Bar _or_ if
646 :     SMLNJ_VERSION <= 110 and old-foo.sml exports structure Bar. Otherwise
647 : blume 710 no-bar-so-far.sml gets included instead. In addition, the export of
648 : monnier 416 structure Bar is guarded by its own existence. (Structure Bar could
649 : blume 710 also be defined by no-bar-so-far.sml in which case it would get
650 : monnier 416 exported regardless of the outcome of the other "defined" test.)
651 :    
652 :     Some things to note:
653 :    
654 :     - For the purpose of the pre-processor, order among members is
655 :     significant. (For the purpose of dependency analysis, order continues
656 :     to be not significant).
657 :     - As a consequence, in some cases pre-processor dependencies and
658 :     compilation-dependencies may end up to be opposites of each other.
659 :     (This is not a problem; it may very well be a feature.)
660 :    
661 :     * The Basis Library is no longer built-in
662 :     -----------------------------------------
663 :    
664 :     The SML'97 basis is no longer built-in. If you want to use it, you
665 : blume 710 must specify $/basis.cm as a member of your group/library.
666 : monnier 416
667 :     * No more aliases
668 :     -----------------
669 :    
670 :     The "alias" feature is no longer with us. At first I thought I could
671 :     keep it, but it turns out that it causes some fairly fundamental
672 :     problems with the autoloader. However, I don't think that this is a
673 :     big loss because path anchors make up for most of it. Moreover,
674 :     stable libraries can now easily be moved to convenient locations
675 :     without having to move large source trees at the same time. (See my
676 : blume 643 new config/install.sh script for examples of that.)
677 : monnier 416
678 : blume 573 It is possible to simulate aliases (in a way that is safer than the
679 :     original alias mechanism). For example, the root.cm file (which is the
680 :     root of the whole system as far as CMB.make is concerned) acts as an
681 : blume 643 alias for $smlnj/internal/intsys.cm. In this case, root.cm is a group
682 : blume 573 to avoid having a (trivial) stable library file built for it.
683 :    
684 :     A library can act as an "alias" for another library if it has a
685 :     verbatim copy of the export list and mentions the other library as its
686 : blume 643 only member. Examples for this are $smlnj/cm.cm (for
687 :     $smlnj/cm/full.cm), $smlnj/compiler.cm (for $smlnj/compiler/current.cm),
688 :     etc. The stable library file for such an "alias" is typically very
689 :     small because it basically just points to the other library. (For
690 :     example, the file representing $smlnj/cm.cm is currently 234 bytes
691 :     long.)
692 : blume 573
693 : monnier 416 * Don't use relative or absolute pathnames to refer to libraries
694 :     ----------------------------------------------------------------
695 :    
696 :     Don't use relative or absolute pathnames to refer to libraries. If
697 :     you do it anyway, you'll get an appropriate warning at the time when
698 : blume 569 you do CMB.make(). If you use relative or absolute pathnames to
699 : monnier 416 refer to library B from library A, you will be committed to keeping B
700 :     in the same relative (to A) or absolute location. This, clearly,
701 : blume 643 would be undesirable in many situations (although perhaps not always).

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0