SCM Repository
View of /sml/trunk/src/system/README
Parent Directory
|
Revision Log
Revision 573 -
(download)
(annotate)
Thu Mar 9 15:23:52 2000 UTC (20 years, 11 months ago) by blume
File size: 24977 byte(s)
Thu Mar 9 15:23:52 2000 UTC (20 years, 11 months ago) by blume
File size: 24977 byte(s)
merging back changes from blume_devel_v110_26_2
Compiler Hacker's Guide to the new CM... ======================================== Last change: 3/2000 * Libraries ----------- The new way of building the compiler is heavily library-oriented. Aside from a tiny portion of code that is responsible for defining the pervasive environment, _everything_ lives in libraries. Building the compiler means compiling and stabilizing these libraries first. Some of the libraries exist just for reasons of organizing the code, the other ones are potentially useful in their own right. Therefore, as a beneficial side-effect of compiling the compiler, you will end up with stable versions of these libraries. At the moment, the following libraries are constructed when compiling the compiler ("*" means that I consider the library potentially useful in its own right): * basis.cm SML'97 Basis Library (pre-loaded) * smlnj-lib.cm SML/NJ Utility Library * html-lib.cm SML/NJ HTML Library * pp-lib.cm SML/NJ Pretty-print Library * ml-yacc-lib.cm SML/NJ ML-Yacc runtime library * smlnj/compiler/{alpha32,hppa,ppc,sparc,x86}.cm cross-compiler libraries, exporting structure <Arch>Compiler * smlnj/compiler/current.cm structure Compiler (current arch) * smlnj/compiler/all.cm all cross-compilers and all cross-CMBs * smlnj/cm/minimal.cm minimal CM (pre-loaded) * smlnj/cm/full.cm full structure CM (see manual) * smlnj/cm/tools.cm CM tools library * smlnj/cmb/{alpha32,hppa,ppc,sparc,x86}-unix.cm cross-bootstrap-compilers for Unix (structure <Arch>UnixCMB) * smlnj/cmb/ppc-macos.cm ...for Mac (structure PPCMacosCMB) * smlnj/cmb/x86-win32.cm ...for Windoze (structure X86Win32CMB) * smlnj/cmb/current.cm structure CMB (current arch/os) * smlnj/compiler.cm abbrev. for smlnj/compiler/current.cm * smlnj/cm.cm abbrev. for smlnj/cm/full.cm * smlnj/cmb.cm abbrev. for smlnj/cmb/current.cm * comp-lib.cm Utility library for compiler - smlnj/viscomp/core.cm Compiler core functionality - smlnj/viscomp/{alpha32,hppa,ppc,sparc,x86}.cm Machine-specific parts of compiler - smlnj/internal/{intsys,cm-lib,cm-hook,host-compiler-0}.cm Glue that holds the interactive system together * MLRISC/{MLRISC,Control,Lib,ALPHA,HPPA,PPC,SPARC,IA32}.cm Various MLRISC bits * {mlyacc,mllex,mlburg}-tool.cm CM plug-in libraries for common tools * {grm,lex,burg}-ext.cm CM plug-in libraries for common file extensions * Before you can use the bootstrap compiler (CMB)... ---------------------------------------------------- To be able to use CMB at all, you must first say CM.autoload "smlnj/cmb.cm"; after you start sml. Alternatively -- and perhaps more conveniently -- you can provide "host-cmb.cm" as a command-line argument to sml: $ sml smlnj/cmb.cm * Compiling the compiler ------------------------ We are now back to the old scheme where a call to CMB.make() suffices to build a bootable set of files (libraries in our case). CMB.make maintains two parallel hierarchies of derived files: 1. the binfile hierarchy ("binfiles"), containing compiled objects for each individual ML source file; this hierarchy is rooted at <prefix>.bin.<arch>-<opsys> 2. the stable library hierarchy ("boot files"), containing library files for each library that participates in building SML/NJ; this hierarchy is rooted at <prefix>.boot.<arch>-<opsys> The default for <prefix> is "sml". It can be changed by using CMB.make' with the new <prefix> as the optional string argument. CMB.make uses bootfiles after it has verified that they are consistent with their corresponding binfiles. Bootfiles do not need to be deleted in order for CMB.make to work correctly. To bootstrap a new system (using the runtime system boot loader), the bootfiles _must_ be present, the binfiles need not be present (but their presence does not hurt either). You can reduce the number of extra files compiled and stabilized during CMB.make at the expense of not building any cross-compilers. For that, say #set (CMB.symval "LIGHT") (SOME 1); before running CMB.make. * Making the heap image ----------------------- The heap image is made by running the "makeml" script that you find here in this directory. By default it will try to refer to the sml.boot.<arch>-<os> directory. You can change this using the -boot argument (which takes the full name of the boot directory to be used). The "feel" of using makeml should be mostly as it used to. However, internally, there are some changes that you should be aware of: 1. The script will make a heap image and build a separate library directory that contains links to the library files in the bootfile directory. 2. There is no "-full" option anymore. This functionality should eventually be provided by a library with a sufficiently rich export interface. 3. No image will be generated if you use the -rebuild option. Instead, the script quits after making new bin and new boot directories. You must re-invoke makeml with a suitable "-boot" option to actually make the image. The argument to "-rebuild" is the <prefix> for the new bin and boot directories (see above). Makeml will not destroy the bootfile directory. * Testing a newly generated heap image -------------------------------------- If you use a new heap image by saying "sml @SMLload=..." then things will not go as you may expect because along with the new heap image should go those new stable libraries, but unless you do something about it, the newly booted system will look for its stable libraries in places where you stored your _old_ stable libraries. After you have made the new heap image, the new libraries are in a separate directory whose name is derived from the name of the heap image. (Actually, only the directory hierachy is separate, the library files themselves are hard links.) The "testml" script that you also find here will run the heap image and instruct it to look for its libraries in that new library directory. "testml" takes the <prefix> of the heap image as its first argument. All other arguments are passed verbatim to the ML process. The <prefix> is the same as the one used when you did "makeml". If you run "testml" without arguments, <prefix> defaults to "sml". Thus, if you just said "makeml" without argument you can also say "testml" without argument. (Note that you _must_ supply the <prefix> argument if you intend to pass any additional arguments.) * Installing a heap image for more permanent use ------------------------------------------------ You can "install" a newly generated heap image by replacing the old image with the new one _AND AT THE SAME TIME_ replacing the old stable libaries with the new ones. To do this, run the "installml" script. Like "testml", "installml" also expects the <prefix> as its first argument. <prefix> defaults to "sml" if no argument is specified. "installml" patches the ../../lib/pathconfig file to reflect any changes or additions to the path name mapping. Thus, after a successful CMB.make, you should say ./makeml to make the new heap image + libraries, then ./testml to make sure everything works, and finally ./installml to replace your existing compiler with the one you just built and tested. * Cross-compiling ----------------- All cross-compilers live in the "smlnj/compiler/all.cm" library. You must first say CM.autoload "smlnj/compiler/all.cm"; before you can access them. (This step corresponds to the old CMB.retarget call.) After that, _all_ cross-compilers are available at the same time. However, the ones that you are not using don't take up any undue space because they only get loaded once you actually mention them at the top-level. The names of the structures currently exported by target-compilers.cm are: structure Alpha32UnixCMB structure HppaUnixCMB structure PPCMacOSCMB structure PPCUnixCMB structure SparcUnixCMB structure X86UnixCMB structure X86Win32CMB structure Alpha32Compiler structure HppaCompiler structure PPCCompiler structure SparcCompiler structure X86Compiler (PPCMacOSCMB is not very useful at the moment because there is no implementation of the basis library for the MacOS.) Alternatively, you can select just the one single structure that you are interested in by auto-loading smlnj/compiler/<arch>.cm or smlnj/cmb/<arch>-<os>.cm. <arch> currently ranges over "alpha32", "hppa", "ppc", "sparc", and "x86. <os> can be either "unix" or "macos" or "win32". (Obviously, not all combinations are valid.) Again, as with smlnj/cmb.cm, you can specify the .cm file as an argument to the sml command: $ sml smlnj/compiler/all.cm or $ sml smlnj/cmb/alpha32-unix.cm * Path configuration -------------------- + Basics: One of the new features of CM is its handling of path names. In the old CM, one particular point of trouble was the autoloader. It analyzes a group or library and remembers the locations of associated files. Later, when the necessity arises, those files will be read. Therefore, one was asking for trouble if the current working directory was changed between analysis- and load-time, or, worse, if files actually moved about (as is often the case if build- and installation-directories are different, or, to put it more generally, if CM's state is frozen into a heap image and used in a different environment). Maybe it would have been possible to work around most of these problems by fixing the path-lookup mechanism in the old CM and using it extensively. But path-lookup (as in the Unix-shell's "PATH") is inherently dangerous because one can never be too sure what it will be that is found on the path. A new file in one of the directories early in the path can upset the program that hopes to find something under the same name later on the path. Even when ignoring security-issues like trojan horses and such, this definitely opens the door for various unpleasant surprises. (Who has never named a test version of a program "test" an found that it acts strangely only to discover later that /bin/test was run instead?) Thus, the new scheme used by CM is a fixed mapping of what I call "configuration anchors" to corresponding directories. The mapping can be changed, but one must do so explicitly. In effect, it does not depend on the contents of the file system. Here is how it works: If I specify a relative pathname in one of CM's description files where the first component (the first arc) of that pathname is known to CM as a configuration anchor, then the corresponding directory (according to CM's mapping) is prepended to the path. Suppose the path name is "a/foo.sml" and "a" is a known anchor that maps to "/usr/lib/smlnj", then the resulting complete pathname is "/usr/lib/smlnj/a/foo.sml". The pathname can be a single arc (but does not have to be). For example, the anchor "basis.cm" is typically mapped to the directory where the basis library is stored. Now, the important point is that one can change the mapping of the anchor, and the path name will also change accordingly -- even very late in the game. CM avoids "elaborating" path names until it really needs them when it is time to open files. CM is also willing to re-elaborate the same names if there is reason to do so. Thus, the "basis.cm" library that was analyzed "here" but then moved "there" will also be found "there" if the anchor has been re-set accordingly. + Different configurations at different times: During compilation of the compiler, CMB uses a path configuration that is read from the file "pathconfig" located here in this directory. At bootstrap time, the same anchors are mapped to the corresponding sub-directory of the "boot" directory: basis.cm is mapped to sml.boot.<arch>-<os>/basis.cm -- which means that CM will look for a library named sml.boot.<arch>-<os>/basis.cm/basis.cm -- and so forth. By the way, you will perhaps notice that there is no file sml.boot.<arch>-<os>/basis.cm/basis.cm but there _is_ the corresponding stable archive sml.boot.<arch>-<os>/basis.cm/CM/<arch>-<os>/basis.cm CM always looks for stable archives first. This mapping (from anchors to names in the boot directory) is the one that will get frozen into the generated heap image at boot time. Thus, unless it is changed, CM will look for its libraries in the boot directory. The aforementioned "testml" script will make sure that the mapping is changed to the one specified in a new "pathconfig" file which was created by makeml and placed into the test library directory. It points all anchors to the corresponding entry in the test library directory. Thus, "testml" will let a new heap image run with its corresponding new libraries. Normally, however, CM consults other pathconfig files at startup -- files that live in standard locations. These files are used to modify the path configuration to let anchors point to their "usual" places. The names of the files that are read (if present) are configurable via environment variables. At the moment they default to /usr/lib/smlnj-pathconfig and $HOME/.smlnj-pathconfig The first one is configurable via CM_PATHCONFIG (and the default is configurable at boot time via CM_PATHCONFIG_DEFAULT); the last is configurable via CM_LOCAL_PATHCONFIG and CM_LOCAL_PATHCONFIG_DEFAULT. In fact, the makeml script sets the CM_PATHCONFIG_DEFAULT variable before making the heap image. Therefore, heap images generated by makeml will look for their global pathconfig file in `pwd`/../../lib/pathconfig For example, I always keep my "good" libraries in `pwd`/../../lib -- where both the main "install" script and the "installml" script (see above) also put them -- so I don't have to do anything special about my pathconfig file. Once I have new heap image and libraries working, I replace the old "good" image with the new one: mv <image>.<arch>-<osvariant> ../../bin/.heap/sml.<arch>-<osvariant> After this you must also move all libraries from <image>.libs/* to their corresponding position in ../../lib. Since this is cumbersome to do by hand, there is a script called "installml" that automates this task. Using the script has the added advantage that it will not clobber libraries that belong to other than the current architecture. (A rather heavy-handed "rm/mv" approach will delete all stable libraries for all architectures.) "installml" also patches the ../../lib/pathconfig file as necessary. Of course, you can organize things differently for yourself -- the path configuration mechanism should be sufficiently flexible. * Libraries vs. Groups ---------------------- With the old CM, "group" was the primary concept while "library" and "stabilization" could be considered afterthoughts. This has changed. Now "library" is the primary concept, "stabilization" is semantically significant, and "groups" are a secondary mechanism. Libraries are used to "structure the world"; groups are used to give structure to libraries. Each group can be used either in precisely one library (in which case it cannot be used at the interactive toplevel) or at the toplevel (in which case it cannot be used in any library). In other words, if you count the toplevel as a library, then each group has a unique "owner" library. Of course, there still is no limit on how many times a group can be mentioned as a member of other groups -- as long as all these other groups belong to the same owner library. If you want to take a collection of files whose purpose fits that of a library, then, please, make them into a library (i.e., not a group!). The purpose of groups is to deal with name-space issues _within_ libraries. Aside from the fact that I find this design quite natural, there is actually a technical reason for it: when you stabilize a library (groups cannot be stabilized), then all its sub-groups (not sub-libraries!) get "sucked into" the stable archive of the library. In other words, even if you have n+1 CM description files (1 for the library, n for n sub-groups), there will be just one file representing the one stable archive (per architecture/os) for the whole thing. For example, I structured the standard basis into one library with two sub-groups, but once you compile it (CMB.make) there is only one stable file that represents the whole basis library. If groups were allowed to appear in more than one library, then stabilization would duplicate the group (its code, its environment data structures, and even its dynamic state). There is a small change to the syntax of group description files: they must explicitly state which library they belong to. CM will verify that. The owner library is specified in parentheses after the "group" keyword. If the specification is missing (that's the "old" syntax), then the the owner will be taken to be the interactive toplevel. * Pervasive environment, core environment, the init group "init.cmi" ------------------------------------------------------------------------- CMB.make starts out by building and compiling the "init group". This group cannot be described in the "usual" way because it uses "magic" in three ways: - it is used to later tie in the runtime system - it exports the "core" environment - it exports the "pervasive" environment The pervasive environment no longer includes the entire basis library but only non-modular bindings (top-level bindings of variables and types). CM cannot automatically determine dependencies (or exports) for the init group source files, but it still does use its regular cutoff recompilation mechanism. Therefore, dependencies must be given explicitly. This is done by a special description file which currently lives in ../system/smlnj/init/init.cmi. See the long comment at the beginning of that file for more details. After it is built, smlnj/init/init.cmi can be used as an "ordinary" library by other libraries. (This is done, for example, by the implementation of the Basis library.) Access to "smlnj/init/init.cmi" is protected by the privilege named "primitive". Also, note that the .cmi-file is not automatically recognized as as CM description file. Therefore, it must be given an explicit member class: smlnj/init/init.cmi : cm * Autoloader ------------ The new system heavily relies on the autoloader. As a result, almost no static environments need to get unpickled at bootstrap time. The construction of such environments is deferred until they become necessary. Thanks of this, it was possible to reduce the size of the heap image by more than one megabyte (depending on the architecture). The downside (although not really terribly bad) is that there is a short wait when you first touch an identifier that hasn't been touched before. (I acknowledge that the notion of "short" may depend on your sense of urgency. :-) The reliance on the autoloader (and therefore CM's library mechanism) means that in order to be able to use the system, your paths must be properly configured. Two libraries get pre-registered at bootstap time: the basis library ("basis.cm") and CM itself ("minimal-cm.cm"). The latter is crucial: without it one wouldn't be able to register any other libraries via CM.autoload. The registration of basis.cm is a mere convenience. Here are some other useful libraries that are not pre-registered but which can easily be made accessible via CM.autoload (or, non-lazily, via CM.make): smlnj/cm.cm - provides the actual ("full") structure CM as described in the CM manual smlnj/cm/full.cm - same as smlnj/cm.cm smlnj/compiler.cm - provides "structure Compiler" smlnj/compiler/current.cm - same as smlnj/compiler.cm smlnj/cmb.cm - provides "structure CMB" smlnj/cmb/current.cm - same as smlnj/cmb.cm smlnj/compiler/all.cm - provides "structure <Arch>Compiler" and "structure <Arch><OS>CMB" for various values of <Arch> and <OS> smlnj-lib.cm - the SML/NJ library * Internal sharing ------------------ Dynamic values of loaded modules are shared. This is true even for those modules that are used by the interactive compiler itself. If you load a module from a library that is also used by the interactive compiler, then "loading" means "loading the static environmnent" -- it does not mean "loading the code and linking it". Instead, you get to share the compiler's dynamic values (and therefore the executable code as well). Of course, if you load a module that hasn't been loaded before and also isn't used by the interactive system, then CM will get the code and link (execute) it. * Access control ---------------- In some places, you will find that the "group" and "library" keywords in description files are preceeded by certain strings, sometimes in parentheses. These strings are the names of "privileges". Don't worry about them too much at the moment. For the time being, access control is not enforced, but the infrastructure is in place. * Preprocessor -------------- The syntax of expressions in #if and #elif clauses is now more ML-ish instead of C-ish. (Hey, this is ML after all!) In particular, you must use "andalso", "orelse", and "not" instead of "&&", "||" and "!". Unary minus is "~". A more interesting change is that you can now query the exports of sources/subgroups/sublibraries: - Within the "members" section of the description (i.e., after "is"): The expression defined(<namespace> <name>) is true if any of the included members preceeding this clause exports a symbol "<namespace> <name>". - Within the "exports" section of the description (i.e., before "is): The same expression is true if _any_ of the members exports the named symbol. (It would be more logical if the exports section would follow the members section, but for esthetic reasons I prefer the exports section to come first.) Example: +--------------------------+ |Library | | structure Foo | |#if defined(structure Bar)| | structure Bar | |#endif | |is | |#if SMLNJ_VERSION > 110 | | new-foo.sml | |#else | | old-foo.sml | |#endif | |#if defined(structure Bar)| | bar-client.sml | |#else | | no-bar-so-far.sml | |#endif | +--------------------------+ Here, the file "bar-client.sml" gets included if SMLNJ_VERSION is greater than 110 and new-foo.sml exports a structure Bar _or_ if SMLNJ_VERSION <= 110 and old-foo.sml exports structure Bar. Otherwise "no-bar-so-far.sml" gets included instead. In addition, the export of structure Bar is guarded by its own existence. (Structure Bar could also be defined by "no-bar-so-far.sml" in which case it would get exported regardless of the outcome of the other "defined" test.) Some things to note: - For the purpose of the pre-processor, order among members is significant. (For the purpose of dependency analysis, order continues to be not significant). - As a consequence, in some cases pre-processor dependencies and compilation-dependencies may end up to be opposites of each other. (This is not a problem; it may very well be a feature.) * The Basis Library is no longer built-in ----------------------------------------- The SML'97 basis is no longer built-in. If you want to use it, you must specify "basis.cm" as a member of your group/library. * No more aliases ----------------- The "alias" feature is no longer with us. At first I thought I could keep it, but it turns out that it causes some fairly fundamental problems with the autoloader. However, I don't think that this is a big loss because path anchors make up for most of it. Moreover, stable libraries can now easily be moved to convenient locations without having to move large source trees at the same time. (See my new build/install.sh script for examples of that.) It is possible to simulate aliases (in a way that is safer than the original alias mechanism). For example, the root.cm file (which is the root of the whole system as far as CMB.make is concerned) acts as an alias for smlnj/internal/intsys.cm. In this case, root.cm is a group to avoid having a (trivial) stable library file built for it. A library can act as an "alias" for another library if it has a verbatim copy of the export list and mentions the other library as its only member. Examples for this are smlnj/cm.cm (for smlnj/cm/full.cm), smlnj/compiler.cm (for smlnj/compiler/current.cm), etc. * Don't use relative or absolute pathnames to refer to libraries ---------------------------------------------------------------- Don't use relative or absolute pathnames to refer to libraries. If you do it anyway, you'll get an appropriate warning at the time when you do CMB.make(). If you use relative or absolute pathnames to refer to library B from library A, you will be committed to keeping B in the same relative (to A) or absolute location. This, clearly, would be undesirable.
root@smlnj-gforge.cs.uchicago.edu | ViewVC Help |
Powered by ViewVC 1.0.0 |