Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] View of /sml/trunk/src/compiler/README
ViewVC logotype

View of /sml/trunk/src/compiler/README

Parent Directory Parent Directory | Revision Log Revision Log

Revision 69 - (download) (annotate)
Fri Apr 3 00:06:55 1998 UTC (22 years, 3 months ago) by monnier
Original Path: sml/branches/SMLNJ/src/compiler/README
File size: 14870 byte(s)
*** empty log message ***
This README file describes the overall structure of the current version of 
the SML/NJ (v110.4) & FLINT/ML (v1.4) compiler source tree. Please send 
your questions, comments, and suggestions to flint@cs.yale.edu (or contact
Zhong Shao at shao-zhong@cs.yale.edu). 

   Some informal implementation notes. 

   This file. It gives an overview of the overall compiler structure.

   The standard Makefile for compiling the compiler. It is similar
   to the idea of sources.cm used by CM.make, except that 
   all-files.cm is designed for bootstrapping the compiler itself
   only (i.e., CMB.make). The resulting binfiles from doing CMB.make 
   are placed in a single bin directory, eg. bin.x86-unix or 
   bin.sparc-unix. Right now, the list in all-files.cm is just the
   list in sources.cm plus all the glue files in the 1-TopLevel/bootstrap
   directory (which are used to bootstrap the interactive compiler).

buildcm* compiler-name
   A script for building the sml-cm version of the compiler. Suppose
   you have build a SML heap image named sml.x86-unix, you type
   "buildcm sml.x86-unix" to get the cm version of the compiler, 
   probably named "sml-cm.x86-unix".

buildcm2* compiler-name
   Scripts for building a sml-cm compiler that knows where to
   find the library and ml-lex and ml-yacc, etc. Need to adjust
   the top-level directory name there. 

   This file contains the usual makefile for CM.make. It is not
   used to build up the interactive compiler. But it can be
   useful for debugging purpose. For example, you can type CM.make() 
   to immediately build up a new, interactive visible compiler. To
   access the newly built compiler, you use the 
   function to compile ML programs. Notice all the bootstrap glue 
   files are not in sources.cm.

xmakeml* [-full] [-elab]
   A script for building the interactive compiler. The default path
   of bin files is ./bin.$arch-$os. There are two command-line options:
   if you add the "-full" option, it will build a compiler whose 
   components are visible to the top-level interactive environment;
   if you add the "-elab" option, it will re-elaborate all the ML 
   programs to recreate the static environments (this is useful, if
   your new compiler has changed the representations of the bindings
   in the static environments).

xrun* compiler-name
   A script for running the copmiler. Suppose you have a heap image 
   named "sml.x86-unix", you can type "xrun sml.x86-unix" to run the
   compiler. Similarly, you can type "xrun sml-cm.x86-unix" to run 
   the CM version of the sml compiler. The xrun script uses the 
   runtime system in the ../../bin/.run directory. 

   The current source code is organized as a two-level directory tree.
   Apart from a few files which are placed immediately inside the 0-Boot 
   directory (i.e., 0-Boot/*.{sig,sml}), all source files can be grep-ed
   by typing "grep xxx */*/*.{sig,sml}", assuming you are looking for
   binding "xxx". 

   The following directories is organized based on the compilation phases.
   Within each phase, the "main" sub-directory always contains the top-level 
   module and some important data structures for that particular compilation 

   File name conventions: 
     *.sig --- the ML signature file 
     *.sml --- the ML source program (occasionally with signatures)
     *.grm --- ML-Yacc file
     *.lex --- ML-Lex file
     *.cm  --- the CM makefile

   The SML/NJ Initial Bootstrapping Library and the SML97 Basis Library. 
   When recompiling the compiler (i.e., via CMB.make), files in this 
   directory are always compiled first. More specifically, their order
   of compilation is as follows:
       (0)  build the initial primitive static environment 
              (see 3-Semant/statenv/prim.sml)
       (1)  compile assembly.sig and dummy.sml, these two files
            make up the static environment for the runtime structure
            (coming from the ../runtime/kernel/globals.c file). The
            dynamic object from executing dummy.sml is discarded, and
            replaced by a hard-wired object coming from the runtime
       (2)  compile core.sml, which defines a bunch of useful exceptions
            and utilty functions such as polymorphic equality, string
            equality, delay and force primitives, etc.
       (4)  files in all-files.cm (must follow the exact order)
       (5)  files in pervasive.cm (must follow the exact order)

   This directory contains the top-level glue files for different versions
   of the batch and interactive compiler.  To understand, how the compiler
   is organized, you can read the main directory.
   Utility files for the Compilation Manager CM and CMB; 
   How to bootstrap an interactive compiler. Details are in boot.sml and
   shareglue.sml. Before building an interactive compiler, one should have
   already gotten a visible compiler (for that particular architecture),
   see the viscomp directory. To build a compiler for SPARC architecture,
   all we need to do is to load and run the IntSparc (in sparcglue.sml) 
   A top-level environment include static environment, dynamic environment
   and symbolic environment. The definitions of static environments are in
   the 3-Semant/statenv directory, as they are mostly used by the elaboration
   and type checking.
   How the top-level interactive loop is organized. The evalloop.sml contains
   the details on how a ML program is compiled from source code to binary
   code and then later being executed.
   The top-level compiler structure is shown in the compile.sig and 
   compile.sml. The compile.sml contains details on how ML programs
   are compiled into the FLINT intermediate format, but the details
   on how FLINT gets compiled into the binary code segments are not
   detailed here, instead, they are described in the 
   4-FLINT/main/flintcomp.sml file. The CODEGENERATOR signature
   in codes.sig defines the interface about this FLINT code generator.
   Note: all the uses of the compilation facility goes throught the "compile"
   function defined in the compile.sml. The common intermediate formats are 
   stated in the compbasic.sig and compbasic.sml files. The version.sml 
   defines the version numbers.
   How to build the visible compiler viscomp --- this is essentially 
   deciding what to export to the outside world. All the Compiler 
   control flags are defined in the control.sig and control.sml files
   placed in this directory.

   Phase 1 of the compilation process. Turning the SML source code into
   the Concrete Synatx. The definition of concrete syntax is in ast/ast.sml.
   The frontend.sig and frontend.sml files in the main directory contain 
   the big picture on the front end.

   This phase does semantic analysis, more specifically, it does the 
   elaboration (of concrete syntax into abstract syntax) and type-checking 
   of the core and module languages. The semantic objects are defined in 
   main/bindings.sml. The result is the Abstract Syntax, defined the 
   main/absyn.sml file. 
   Definition of several data structures and utility functions. They are
   used by the code that does semantic analysis. The env.sig and env.sml
   files defines the underlying data structures used to represent  the 
   static environment. 
   How to turn a piece of code in the Concrete Syntax into one in the
   Abstract Syntax. The top-level organization is in the following 
   elabtop.sml file.
   Definition of Abstract Syntax
   Top-level view of what semantic objects we have
   Top-level view of the elaboration process. Notice that each piece
   of core-ML program is first translated into the Abstract Syntax, 
   and gets type-checked. The type-checking does change the contents
   of abstract syntax, as certain type information won't be known
   until type-checking is done.
   Utility functions for elaborations of modules. The module.sig and
   module.sml contains the definitions of module-level semantic objects.
   How to write the static environments into a file! This is important
   if you want to create the *.bin file. It is also useful to infer 
   a unique persistant id for each compilation unit (useful to detect
   the cut-off compilation dependencies).
   The definition of Static Environment. The SC-ed version of Static
   Environment is used to avoid environment blow-up in the pickling.
   The prim.sml contains the list of primitive operators and primitive 
   types exported in the initial static environment (i.e., PrimEnv).
   During bootstrapping, PrimEnv is the first environment you have to
   set up before you can compile files in the 0-Boot directory.
   This directory contains all the data structures and utility functions
   used in type-checking the Core-ML language.
   The type-checking and type-inference code for the core-ML programs.
   It is performed on Abstract Syntax and it produces Abstract Syntax

   This phase translates the Abstract Syntax into the intermediate 
   Lambda language (i.e., FLINT). During the translation, it compiles
   the Pattern Matches (see the mcomp directory). Then it does a bunch
   of optimizations on FLINT; then it does representation analysis, 
   and it converts the FLINT code into CPS, finally it does closure 
   The closure conversion step. Check out Shao/Appel LFP94 paper for
   the detailed algorithm.
   Definition of CPS plus on how to convert the FLINT code into the 
   CPS code. The compilation of the Switch statement is done in this
   The CPS-based optimizations (check Appel's "Compiling with 
   Continuations" book for details). Eventually, all optimizations
   in this directory will be migrated into FLINT.
   This directory defines the FLINT language. The detailed definitions
   of primitive tycs, primitive operators, kinds, type constructors, 
   and types are in the 4-FLINT/kernel directory.
   Definiton of the kernel data structures used in the FLINT language.
   This includes: deBruijn indices, primitive tycs, primitive operators,
   FLINT kinds, FLINT constructors, and FLINT types. When you write 
   code that manipulates the FLINT code, please restrict yourself to 
   use the functions defined in the LTYEXTERN interface only.
   The flintcomp.sml describes how the FLINT code gets compiled into
   the optimized and closure-converted CPS code (eventually, it should
   produce optimized, closure-converted, adn type-safe FLINT code).
   The FLINT-based optimizations, such as contraction, type 
   specializations, etc.
   An older version of the Lambda language (not in the A-Normal form)
   Code for performing the representation analysis on FLINT
   Translation of Abstract Syntax into the PLambda code, then to the FLINT
   code. All semantic objects used in the elaboration are translated into
   the FLINT types as well. The translation phase also does match 
   compilation. The translation from PLambda to FLINT does the (partial)
   type-based argument flattening.

   Alpha32 new code generator
   Alpha32 new code generator (with special patches)
   Compilation of CPS into the MLRISC abstract machine code
   HPPA new code genrator
   The big picture of the codegenerator; including important
   files on machine specifications and runtime tagging schemes.

   The old code generator. May eventually go away after Lal's new
   code generator becomes stable on all platforms. Each code generator 
   should produce a structure of signature CODEGENERATOR (defined in 
   the 1-Toplevel/main/codes.sig file).
   This directory contains the machine-independent parts of the
   old code generator. Some important signatures are also here.
   Compilation of CPS into the abstract machine in the old code 
   generator. Probably the spill.sml and limit.sml files should
   not be placed here. A counterpart of this in the new 
   code generator is the 6-NewCGen/cpscompile directory.
   MIPS code generator for both little endian and big endian
   RS6000 code generator
   SPARC code generator
   X86 code generator

   Lal George's new code generator generator (MLRISC).

   Contains various kinds of utility programs
   Bignum packages. I have no clue how stable this is.
   Some code for implementation of the lazy evaluation primitives.
   Pretty printing. Very Adhoc, needs major clean up.
   The time and the space profiler.
   Important utility functions including the Inputsource (for 
   reading in a program), and various Hashtable and Dictionary


0. statenv   : symbol -> binding
   dynenv    : pid -> object
   symenv    : pid -> flint 
1. Parsing   : source -> ast
2. Elaborator: ast + statenv -> absyn + pickle + newstatenv
3. FLINT     : absyn -> FLINT -> CPS -> CLO
4. CodeGen   : CPS -> csegments (via MLRISC)
5. OldCGen   : CPS -> csegments (spilling, limit check, codegen)

B. How to recover the all-files.cm (or sources.cm) file after making 
   dramatic changes to the directory structure. Notice that the difference
   between all-files.cm and sources.cm is just the bootstrap glue files.

   1. ls -1 [1-6,9]*/*/*.{sig,sml} | grep -i -v glue | grep -v obsol > xxx
   2. Add 7-MLRISC/MLRISC.cm
   3. Fix ml.lex.* and ml.grm.* files
   4. Add 9-MiscUtil/util/UTIL.cm
   5. Add ../ml-yacc/lib/sources.cm
   6. Delete 9-MiscUtil/util/intmap.sig

ViewVC Help
Powered by ViewVC 1.0.0