Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/system/README
ViewVC logotype

Annotation of /sml/trunk/src/system/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 416 - (view) (download)

1 : monnier 416 !!! ATTENTION !!!
2 :     As an SML/NJ compiler developer, please read this document carefully.
3 :     The new CM has a lot of good things to offer, but you must be aware
4 :     of the many changes that it incurs to the process of compiling the
5 :     SML/NJ compiler.
6 :     Matthias Blume (July 1999)
7 :     -------------------------------------------------------------------------
8 :    
9 :     * Libraries
10 :     -----------
11 :    
12 :     The new way of building the compiler is heavily library-oriented.
13 :     Aside from a tiny portion of code that is responsible for defining the
14 :     pervasive environment, _everything_ lives in libraries. Building the
15 :     compiler means compiling and stabilizing these libraries first. Some
16 :     of the libraries exist just for reasons of organizing the code, the
17 :     other ones are potentially useful in their own right. Therefore, as a
18 :     beneficial side-effect of compiling the compiler, you will end up with
19 :     stable versions of these libraries.
20 :    
21 :     At the moment, the following libraries are constructed when compiling
22 :     the compiler ("*" means that I consider the library potentially useful
23 :     in its own right):
24 :    
25 :     * basis.cm - The SML'97 basis library
26 :     - cm-hook.cm - an internal library for organizational purposes
27 :     - cm-lib.cm - the library that implements CM's functionality
28 :     * comp-lib.cm - a helper library for the compiler, MLRISC, and CM
29 :     * host-cm.cm - the library that exports the public interface to
30 :     the compilation manager (i.e., structure CM)
31 :     * host-cmb.cm - the library that exports the public interface to
32 :     the bootstrap compiler (i.e., structure CMB)
33 :     - host-compiler-0.cm
34 :     - an internal library for organizational purposes
35 :     * host-compiler.cm
36 :     - the library that exports the public interface to
37 :     the visible compiler (i.e., structure Compiler)
38 :     - intsys.cm - an internal library for organizational purposes
39 :     (In fact, its the "root" of the main hierarchy.)
40 :     * ml-yacc-lib.cm - needs no further comment
41 :     * smlnj-lib.cm - needs no further comment
42 :     * target-compilers.cm
43 :     - library exporting target-specific versions of
44 :     structure Compiler and of structure CMB
45 :     (The existence of this library is the moral
46 :     equivalent of "CMB.retarget" in the old CM.)
47 :     * viscomp-lib.cm - library that implements the compiler
48 :     (At the moment, its interface is rather thin. We
49 :     should think about how to structure the interface
50 :     in such a way that it becomes a useful equivalent
51 :     to the old "full" compiler.)
52 :    
53 :     * Before you can use the bootstrap compiler (CMB)...
54 :     ----------------------------------------------------
55 :    
56 :     To be able to use CMB at all, you must first say
57 :    
58 :     CM.autoload "host-cmb.cm";
59 :    
60 :     after you start sml.
61 :    
62 :     * Compiling the compiler -- a two-step procedure
63 :     ------------------------------------------------
64 :    
65 :     Until now (with the old CM), once we managed to run CMB.make() to
66 :     completion we had a directory full of binfiles that were ready to be
67 :     used by the boot procedure. This is no longer the case.
68 :    
69 :     The boot procedure now wants to use stable libraries (except for the
70 :     part that makes up the pervasive environment). Having stable
71 :     libraries around during development of these very libraries would be a
72 :     bit annoying because if CM sees a stable library it will no longer
73 :     bother to check the corresponding source files -- even if they have
74 :     changed. Therefore, libraries are not stabilized until you think you
75 :     are ready for that. Thus, you should run:
76 :    
77 :     CMB.make ();
78 :    
79 :     until you no longer get compile errors. CMB.make will return true in
80 :     this case. Then you say:
81 :    
82 :     CMB.deliver ();
83 :    
84 :     This command creates a second directory parallel to the "bin"
85 :     directory -- the "boot" directory. It will hold everything necessary
86 :     to bootstrap a new heap image. You will probably find that
87 :     CMB.deliver() compiles a number of additional files even though
88 :     CMB.make completed successfully. This is because CMB.make compiles
89 :     just those modules that will actually go into the heap image, but
90 :     CMB.deliver must also build the remaining files -- files that are part
91 :     of libraries to be stabilized but which are not used by the compiler.
92 :    
93 :     After you have made the boot directory, if you want to continue
94 :     developing the compiler (i.e., make changes to some sources,
95 :     recompile, etc.), you must first get rid of that boot directory.
96 :     Running the "makeml" script (see below) will automatically remove the
97 :     boot directory.
98 :    
99 :     The names of "bin" and "boot" directories are
100 :    
101 :     <prefix>.bin.<arch>-<os>
102 :    
103 :     and
104 :    
105 :     <prefix>.boot.<arch>-<os>
106 :    
107 :     respectively, with "comp" being the default for <prefix>. To change
108 :     the prefix, use CMB.make' and CMB.deliver' with the new prefix
109 :     provided as the optional string argument to these functions.
110 :    
111 :     * Making the heap image
112 :     -----------------------
113 :    
114 :     The heap image is made by running the "makeml" script that you find
115 :     here in this directory. By default it will try to refer to the
116 :     comp.boot.<arch>-<os> directory. You can change this using the -boot
117 :     argument (which takes the full name of the boot directory to be used).
118 :    
119 :     The "feel" of using makeml should be mostly as it used to. However,
120 :     internally, there are some changes that you should be aware of:
121 :    
122 :     1. The script will make a heap image and also move its associated
123 :     libraries into a separate directory.
124 :    
125 :     2. There is no "-full" option anymore. This functionality should
126 :     eventually be provided by a library with a sufficiently rich export
127 :     interface.
128 :    
129 :     3. No image will be generated if you use the -rebuild option.
130 :     Instead, the script quits after making new bin and new boot
131 :     directories. You must re-invoke makeml with a suitable "-boot"
132 :     option to actually make the image. The argument to "-rebuild"
133 :     is the <prefix> for the new bin and boot directories (see above).
134 :    
135 :     4. Unless you use "-rebuild", makeml will delete the boot directory
136 :     (thus readying you for further "CMB.make();" runs).
137 :    
138 :     * Testing a newly generated heap image
139 :     --------------------------------------
140 :    
141 :     If you use a new heap image by saying "sml @SMLload=..." then things
142 :     will not go as you may expect because along with the new heap image
143 :     should go those new stable libraries, but unless you do something
144 :     about it, the new CM will look for its stable libraries in places
145 :     where you stored your _old_ stable libraries.
146 :    
147 :     After you have made the new heap image, the new libraries are in a
148 :     separate directory whose name is derived from the name of the heap
149 :     image. The "testrun" script that you also find here will run the heap
150 :     image and instruct it to look for its libraries in that new library
151 :     directory.
152 :     "testrun" takes the name of the heap image as its single argument. It
153 :     expects the library directory to be the one that makeml builds.
154 :    
155 :     * Installing a heap image for more permanent use
156 :     ------------------------------------------------
157 :    
158 :     Since you have been using the new CM already, it can be assumed that
159 :     you have already set up a correct pathname configuration. (For more
160 :     information on pathname configurations, see below.) With a correct
161 :     pathname configuration in place, you can "install" a newly generated
162 :     heap image by replacing the old image with the new one _AND AT THE
163 :     SAME TIME_ replacing the old stable libaries with the new ones.
164 :    
165 :     * Cross-compiling
166 :     -----------------
167 :    
168 :     All cross-compilers live in the "target-compilers.cm" library. You
169 :     must first say
170 :    
171 :     CM.autoload "target-compilers.cm";
172 :    
173 :     before you can access them. (This step corresponds to the old
174 :     CMB.retarget call.) After that, _all_ cross-compilers are available
175 :     at the same time. However, the ones that you are not using don't take
176 :     up any undue space because they only get loaded once you actually
177 :     mention them at the top-level. The names of the structures currently
178 :     exported by target-compilers.cm are:
179 :    
180 :     structure Alpha32UnixCMB
181 :     structure HppaUnixCMB
182 :     structure PPCMacOSCMB
183 :     structure PPCUnixCMB
184 :     structure SparcUnixCMB
185 :     structure X86UnixCMB
186 :     structure X86Win32CMB
187 :    
188 :     structure Alpha32Compiler
189 :     structure HppaCompiler
190 :     structure PPCCompiler
191 :     structure SparcCompiler
192 :     structure X86Compiler
193 :    
194 :     (PPCMacOSCMB is not very useful at the moment because there is no
195 :     implementation of the basis library for the MacOS.)
196 :    
197 :     * Path configuration
198 :     --------------------
199 :    
200 :     + Basics:
201 :    
202 :     One of the new features of CM is its handling of path names. In the
203 :     old CM, one particular point of trouble was the autoloader. It
204 :     analyzes a group or library and remembers the locations of associated
205 :     files. Later, when the necessity arises, those files will be read.
206 :     Therefore, one was asking for trouble if the current working directory
207 :     was changed between analysis- and load-time, or, worse, if files
208 :     actually moved about (as is often the case if build- and
209 :     installation-directories are different, or, to put it more generally,
210 :     if CM's state is frozen into a heap image and used in a different
211 :     environment).
212 :    
213 :     Maybe it would have been possible to work around most of these
214 :     problems by fixing the path-lookup mechanism in the old CM and using
215 :     it extensively. But path-lookup (as in the Unix-shell's "PATH") is
216 :     inherently dangerous because one can never be too sure what it will be
217 :     that is found on the path. A new file in one of the directories early
218 :     in the path can upset the program that hopes to find something under
219 :     the same name later on the path. Even when ignoring security-issues
220 :     like trojan horses and such, this definitely opens the door for
221 :     various unpleasant surprises. (Who has not ever named a test version
222 :     of a program "test" an found that it acts strangely only to discover
223 :     later that /bin/test was run instead?)
224 :    
225 :     Thus, the new scheme used by CM is a fixed mapping of what I call
226 :     "configuration anchors" to corresponding directories. The mapping can
227 :     be changed, but one must do so explicitly. In effect, it does not
228 :     depend on the contents of the file system. Here is how it works:
229 :    
230 :     If I specify a relative pathname in one of CM's description files
231 :     where the first component (the first arc) of that pathname is known to
232 :     CM as a configuration anchor, then the corresponding directory
233 :     (according to CM's mapping) is prepended to the path. Suppose the
234 :     path name is "a/foo.sml" and "a" is a known anchor that maps to
235 :     "/usr/lib/smlnj", then the resulting complete pathname is
236 :     "/usr/lib/smlnj/a/foo.sml". The pathname can be a single arc (but
237 :     does not have to be). For example, the anchor "basis.cm" is typically
238 :     mapped to the directory where the basis library is stored.
239 :    
240 :     Now, the important point is that one can change the mapping of the
241 :     anchor, and the path name will also change accordingly -- even very
242 :     late in the game. CM avoids "elaborating" path names until it really
243 :     needs them when it is time to open files. CM is also willing to
244 :     re-elaborate the same names if there is reason to do so. Thus, the
245 :     "basis.cm" library that was analyzed "here" but then moved "there"
246 :     will also be found "there" if the anchor has been re-set accordingly.
247 :    
248 :     + Different configurations at different times:
249 :    
250 :     During compilation of the compiler, CMB uses a path configuration that
251 :     is read from the file "pathconfig" located here in this directory.
252 :     Warning: The names in that pathconfig file are relative pathnames and
253 :     will work only if you are in this directory. (This will typically be
254 :     the case since you are compiling the compiler. Normally, however, path
255 :     configurations should map anchors to absolute pathnames.)
256 :    
257 :     At bootstrap time, the same anchors are mapped to the corresponding
258 :     sub-directory of the "boot" directory: basis.cm is mapped to
259 :     comp.boot.<arch>-<os>/basis.cm -- which means that CM will look for a
260 :     library named comp.boot.<arch>-<os>/basis.cm/basis.cm -- and so forth.
261 :    
262 :     By the way, you will perhaps notice that there is no file
263 :     comp.boot.<arch>-<os>/basis.cm/basis.cm
264 :     but there _is_ the corresponding stable archive
265 :     comp.boot.<arch>-<os>/basis.cm/CM/<arch>-<os>/basis.cm
266 :     CM always looks for stable archives first.
267 :    
268 :     This mapping (from anchors to names in the boot directory) is the one
269 :     that will get frozen into the generated heap image at boot time.
270 :     Thus, unless it is changed, CM will look for its libraries in the boot
271 :     directory. The aforementioned "testrun" script will make sure that
272 :     the mapping is changed to the one specified in a new "pathconfig" file
273 :     which was created by makeml and placed into the test library
274 :     directory. It points all anchors to the corresponding entry in the
275 :     test library directory. Thus, "testrun" will let a new heap image run
276 :     with its corresponding new libraries.
277 :    
278 :     Normally, however, CM consults other pathconfig files at startup --
279 :     files that live in standard locations. These files are used to modify
280 :     the path configuration to let anchors point to their "usual" places.
281 :     The names of the files that are read (if present) are configurable via
282 :     environment variables. At the moment they default to
283 :     /usr/lib/smlnj-pathconfig
284 :     and
285 :     $HOME/.smlnj-pathconfig
286 :     The first one is configurable via CM_PATHCONFIG (and the default is
287 :     configurable at boot time via CM_PATHCONFIG_DEFAULT); the last is
288 :     configurable via CM_LOCAL_PATHCONFIG and CM_LOCAL_PATHCONFIG_DEFAULT.
289 :     In fact, the makeml script sets the CM_PATHCONFIG_DEFAULT variable
290 :     before making the heap image. Therefore, heap images generated by
291 :     makeml will look for their global pathconfig file in
292 :    
293 :     `pwd`/../../stable-libs/pathconfig
294 :    
295 :     For example, I always keep my "good" libraries in
296 :     `pwd`/../../stable-libs and let the pathconfig file point its anchors
297 :     to the members of that directory.
298 :    
299 :     Once I have new heap image and libraries working, I replace the old
300 :     "good" image with the new one:
301 :    
302 :     mv <image>.<arch>-<osvariant> ../../bin/.heap/sml.<arch>-<osvariant>
303 :    
304 :     and then:
305 :    
306 :     rm -r ../../stable-libs/*.cm
307 :     mv <image>.libs/*.cm ../../stable-libs
308 :    
309 :     Of course, you can organize things differently for yourself -- the
310 :     path configuration mechanism should be sufficiently flexible.
311 :    
312 :     * Libraries vs. Groups
313 :     ----------------------
314 :    
315 :     With the old CM, "group" was the primary concept while "library" and
316 :     "stabilization" could be considered afterthoughts. This has changed.
317 :     Now "library" is the primary concept, "stabilization" is semantically
318 :     significant, and "groups" are a secondary mechanism.
319 :    
320 :     Libraries are used to "structure the world"; groups are used to give
321 :     structure to libraries. Each group can be used either in precisely
322 :     one library (in which case it cannot be used at the interactive
323 :     toplevel) or at the toplevel (in which case it cannot be used in any
324 :     library). In other words, if you count the toplevel as a library,
325 :     then each group has a unique "owner" library. Of course, there still
326 :     is no limit on how many times a group can be mentioned as a member of
327 :     other groups -- as long as all these other groups belong to the same
328 :     owner library.
329 :    
330 :     If you want to take a collection of files whose purpose fits that of a
331 :     library, then, please, make them into a library (i.e., not a group!).
332 :     The purpose of groups is to deal with name-space issues _within_
333 :     libraries.
334 :    
335 :     Aside from the fact that I find this design quite natural, there is
336 :     actually a technical reason for it: when you stabilize a library
337 :     (groups cannot be stabilized), then all its sub-groups (not
338 :     sub-libraries!) get "sucked into" the stable archive of the library.
339 :     In other words, even if you have n+1 CM description files (1 for the
340 :     library, n for n sub-groups), there will be just one file representing
341 :     the one stable archive (per architecture/os) for the whole thing. For
342 :     example, I structured the standard basis into one library with two
343 :     sub-groups, but once you compile it (CMB.deliver) there is only one
344 :     stable file that represents the whole basis library. If groups were
345 :     allowed to appear in more than one library, then stabilization would
346 :     duplicate the group (its code, its environment data structures, and
347 :     even its dynamic state).
348 :    
349 :     There is a small change to the syntax of group description files: they
350 :     must explicitly state which library they belong to. CM will verify
351 :     that. The owner library is specified in parentheses after the "group"
352 :     keyword. If the specification is missing (that's the "old" syntax),
353 :     then the the owner will be taken to be the interactive toplevel.
354 :    
355 :     There are several examples of this throughout the system's source
356 :     hierarchy. One notable case is MLRISC. It should probably be made
357 :     into a library of its own, but I leave this job to Lal. At the moment
358 :     MLRISC.cm is a sub-group of viscomp-lib.cm.
359 :    
360 :     * Pervasive environment, core environment, other "primitive" environments
361 :     -------------------------------------------------------------------------
362 :    
363 :     Just a handful of files is compiled at the beginning in order to
364 :     establish a number of "primitive" environments -- including the
365 :     "pervasive" environment and the "core" environment. The pervasive
366 :     environment no longer includes the entire basis library but only
367 :     non-modular bindings (top-level bindings of variables and types).
368 :    
369 :     CM cannot automatically determine dependencies for these initial
370 :     source files, but it still does use its regular cutoff recompilation
371 :     mechanism. Therefore, dependencies must be given explicitly. This is
372 :     done by a special description file which currently lives in
373 :     Init/init.cmi. See the long comment at the beginning of that file for
374 :     more details.
375 :    
376 :     * Autoloader
377 :     ------------
378 :    
379 :     The new system heavily relies on the autoloader. As a result, almost
380 :     no static environments need to get unpickled at bootstap time. The
381 :     construction of such environments is deferred until they become
382 :     necessary. Because of this, I was able to reduce the size of the heap
383 :     image by more than one megabyte (depending on the architecture). The
384 :     downside (although not really terribly bad) is that there is a short
385 :     wait when you first touch an identifier that hasn't been touched
386 :     before. (I acknowledge that the notion of "short" may depend on your
387 :     sense of urgency. :-)
388 :    
389 :     The reliance on the autoloader (and therefore CM's library mechanism)
390 :     means that in order to be able to use the system, your paths must be
391 :     properly configured.
392 :    
393 :     Two libraries get pre-registered at bootstap time: the basis library
394 :     ("basis.cm") and CM itself ("host-cm.cm"). The latter is crucial:
395 :     without it one wouldn't be able to register any other libraries
396 :     via CM.autoload. The registration of basis.cm is a mere convenience.
397 :    
398 :     Here are some other useful libraries that are not pre-registered but
399 :     which can easily be made accessible via CM.autoload (or, non-lazily,
400 :     via CM.make):
401 :    
402 :     host-compiler.cm - provides "structure Compiler"
403 :     host-cmb.cm - provides "structure CMB"
404 :     target-compilers.cm - provides "structure <Arch>Compiler" and
405 :     "structure <Arch><OS>CMB" for various
406 :     values of <Arch> and <OS>
407 :     smlnj-lib.cm - the SML/NJ library
408 :    
409 :     * Internal sharing
410 :     ------------------
411 :    
412 :     Dynamic values of loaded modules are shared. This is true even for
413 :     those modules that are used by the interactive compiler itself. If
414 :     you load a module from a library that is also used by the interactive
415 :     compiler, then "loading" means "loading the static environmnent" -- it
416 :     does not mean "loading the code and linking it". Instead, you get to
417 :     share the compiler's dynamic values (and therefore the executable
418 :     code as well).
419 :    
420 :     Of course, if you load a module that hasn't been loaded before and
421 :     also isn't used by the interactive system, then CM will get the code
422 :     and link (execute) it.
423 :    
424 :     * Access control
425 :     ----------------
426 :    
427 :     In some places, you will find that the "group" and "library" keywords
428 :     in description files are preceeded by certain strings, sometimes in
429 :     parentheses. These strings are the names of "privileges". Don't
430 :     worry about them too much at the moment. For the time being, access
431 :     control is not enforced, but the infrastructure is in place.
432 :    
433 :     * Preprocessor
434 :     --------------
435 :    
436 :     The syntax of expressions in #if and #elif clauses is now more ML-ish
437 :     instead of C-ish. (Hey, this is ML after all!) In particular, you
438 :     must use "andalso", "orelse", and "not" instead of "&&", "||" and "!".
439 :     Unary minus is "~".
440 :    
441 :     A more interesting change is that you can now query the exports of
442 :     sources/subgroups/sublibraries:
443 :    
444 :     - Within the "members" section of the description (i.e., after "is"):
445 :     The expression
446 :     defined(<namespace> <name>)
447 :     is true if any of the included members preceeding this clause exports
448 :     a symbol "<namespace> <name>".
449 :     - Within the "exports" section of the description (i.e., before "is):
450 :     The same expression is true if _any_ of the members exports the
451 :     named symbol.
452 :     (It would be more logical if the exports section would follow the
453 :     members section, but for esthetic reasons I prefer the exports
454 :     section to come first.)
455 :    
456 :     Example:
457 :    
458 :     +--------------------------+
459 :     |Library |
460 :     | structure Foo |
461 :     |#if defined(structure Bar)|
462 :     | structure Bar |
463 :     |#endif |
464 :     |is |
465 :     |#if SMLNJ_VERSION > 110 |
466 :     | new-foo.sml |
467 :     |#else |
468 :     | old-foo.sml |
469 :     |#endif |
470 :     |#if defined(structure Bar)|
471 :     | bar-client.sml |
472 :     |#else |
473 :     | no-bar-so-far.sml |
474 :     |#endif |
475 :     +--------------------------+
476 :    
477 :     Here, the file "bar-client.sml" gets included if SMLNJ_VERSION is
478 :     greater than 110 and new-foo.sml exports a structure Bar _or_ if
479 :     SMLNJ_VERSION <= 110 and old-foo.sml exports structure Bar. Otherwise
480 :     "no-bar-so-far.sml" gets included instead. In addition, the export of
481 :     structure Bar is guarded by its own existence. (Structure Bar could
482 :     also be defined by "no-bar-so-far.sml" in which case it would get
483 :     exported regardless of the outcome of the other "defined" test.)
484 :    
485 :     Some things to note:
486 :    
487 :     - For the purpose of the pre-processor, order among members is
488 :     significant. (For the purpose of dependency analysis, order continues
489 :     to be not significant).
490 :     - As a consequence, in some cases pre-processor dependencies and
491 :     compilation-dependencies may end up to be opposites of each other.
492 :     (This is not a problem; it may very well be a feature.)
493 :    
494 :     * The Basis Library is no longer built-in
495 :     -----------------------------------------
496 :    
497 :     The SML'97 basis is no longer built-in. If you want to use it, you
498 :     must specify "basis.cm" as a member of your group/library.
499 :    
500 :     * No more aliases
501 :     -----------------
502 :    
503 :     The "alias" feature is no longer with us. At first I thought I could
504 :     keep it, but it turns out that it causes some fairly fundamental
505 :     problems with the autoloader. However, I don't think that this is a
506 :     big loss because path anchors make up for most of it. Moreover,
507 :     stable libraries can now easily be moved to convenient locations
508 :     without having to move large source trees at the same time. (See my
509 :     new build/install.sh script for examples of that.)
510 :    
511 :     * Don't use relative or absolute pathnames to refer to libraries
512 :     ----------------------------------------------------------------
513 :    
514 :     Don't use relative or absolute pathnames to refer to libraries. If
515 :     you do it anyway, you'll get an appropriate warning at the time when
516 :     you do CMB.deliver(). If you use relative or absolute pathnames to
517 :     refer to library B from library A, you will be committed to keeping B
518 :     in the same relative (to A) or absolute location. This, clearly,
519 :     would be undesirable.

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0