Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/src/compiler/README
ViewVC logotype

Diff of /sml/trunk/src/compiler/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 17, Wed Mar 11 21:00:18 1998 UTC revision 24, Thu Mar 12 00:49:58 1998 UTC
# Line 1  Line 1 
1  ============================================================================  ============================================================================
2  This README file describes the overall structure of the current version of  The Overall Structure of the Current FLINT/ML Compiler
3  the SML/NJ (v110.3) & FLINT/ML (v1.3) compiler source tree. Please send      <based On SML/NJ version 109.33 11/29/97>
4  your questions, comments, and suggestions to flint@cs.yale.edu (or contact  
5  Zhong Shao at shao-zhong@cs.yale.edu).  
6    
7  ============================================================================  ============================================================================
8    The Overall Structure of the Current FLINT/ML Compiler
9        <based On SML/NJ version 109.31+ 9/22/97>
10    
11  NOTES  NOTES
12     Some informal implementation notes.     Some informal half-baked notes. Just don't want to discard them.
13    
14  README  README
15     This file. It gives an overview of the overall compiler structure.     This file which gives an overview of the compiler structure.
16    
17  all-files.cm  all-files.cm
18     The standard Makefile for compiling the compiler. It is similar     The standard Makefile for compiling the compiler. It is similar
19     to the idea of sources.cm used by CM.make, except that     to the idea of sources.cm used by CM.make, except that
20     all-files.cm is designed for bootstrapping the compiler itself     all-files.cm is designed for CMB.make only. The resulting binfiles
21     only (i.e., CMB.make). The resulting binfiles from doing CMB.make     from doing CMB.make are placed in a single bin directory, eg.
22     are placed in a single bin directory, eg. bin.x86-unix or     bin.x86-unix or bin.sparc-unix. Right now, all-files.cm is
23     bin.sparc-unix. Right now, the list in all-files.cm is just the     just whatever in sources.cm plus all the bootstrap glue files.
    list in sources.cm plus all the glue files in the 1-TopLevel/bootstrap  
    directory (which are used to bootstrap the interactive compiler).  
   
 buildcm* compiler-name  
    A script for building the sml-cm version of the compiler. Suppose  
    you have build a SML heap image named sml.x86-unix, you type  
    "buildcm sml.x86-unix" to get the cm version of the compiler,  
    probably named "sml-cm.x86-unix".  
24    
25  buildcm2* compiler-name  buildcm
26     Scripts for building a sml-cm compiler that knows where to     Scripts for building the sml-cm version of the compiler.
27    
28    buildcm2
29       Scripts for building a sml-cm compiler that knows to where to
30     find the library and ml-lex and ml-yacc, etc. Need to adjust     find the library and ml-lex and ml-yacc, etc. Need to adjust
31     the top-level directory name there.     the top-level directory name there.
32    
33  sources.cm  sources.cm
34     This file contains the usual makefile for CM.make. It is not     This file contains the usual makefile for CM.make. It is not
35     used to build up the interactive compiler. But it can be     used to build up the interactive compiler. But it can be
36     useful for debugging purpose. For example, you can type CM.make()     useful for debugging purpose as doing CM.make immediately
37     to immediately build up a new, interactive visible compiler. To     build up a interactive visible compiler. Notice all the
38     access the newly built compiler, you use the     bootstrap glue files are not here, because CM.make() does
39         "XXXVisComp.Interact.useFile"     run them which will cause problems.
40     function to compile ML programs. Notice all the bootstrap glue  
41     files are not in sources.cm.  xmakeml
42       Scripts for building the interactive compiler. The default path
43  xmakeml* [-full] [-elab]     of bin files is ./bin.$arch-$os. If you add the "-full" option,
44     A script for building the interactive compiler. The default path     it will build a compiler whose components are visible to the
45     of bin files is ./bin.$arch-$os. There are two command-line options:     top-level interactive environment.
46     if you add the "-full" option, it will build a compiler whose  
47     components are visible to the top-level interactive environment;  xrun
48     if you add the "-elab" option, it will re-elaborate all the ML     Scripts for running the interactive copmiler.
    programs to recreate the static environments (this is useful, if  
    your new compiler has changed the representations of the bindings  
    in the static environments).  
   
 xrun* compiler-name  
    A script for running the copmiler. Suppose you have a heap image  
    named "sml.x86-unix", you can type "xrun sml.x86-unix" to run the  
    compiler. Similarly, you can type "xrun sml-cm.x86-unix" to run  
    the CM version of the sml compiler. The xrun script uses the  
    runtime system in the ../../bin/.run directory.  
49    
50  ============================================================================  ============================================================================
51  Tips:  Tips:
52     The current source code is organized as a two-level directory tree.     To find a particular binding XXX, type
53     Apart from a few files which are placed immediately inside the 0-Boot          grep XXX */*.{sig,sml,lex,grm} */*/*.{sig,sml,lex,grm}
54     directory (i.e., 0-Boot/*.{sig,sml}), all source files can be grep-ed     If you like, make the above into a script.
    by typing "grep xxx */*/*.{sig,sml}", assuming you are looking for  
    binding "xxx".  
   
    The following directories is organized based on the compilation phases.  
    Within each phase, the "main" sub-directory always contains the top-level  
    module and some important data structures for that particular compilation  
    phase.  
   
    File name conventions:  
      *.sig --- the ML signature file  
      *.sml --- the ML source program (occasionally with signatures)  
      *.grm --- ML-Yacc file  
      *.lex --- ML-Lex file  
      *.cm  --- the CM makefile  
55    
56  0-Boot  0-Boot
57     The SML/NJ Initial Bootstrapping Library and the SML97 Basis Library.     The SML97 Basis Library. When recompiling the compiler (via CMB.make),
58     When recompiling the compiler (i.e., via CMB.make), files in this     files in this directory are always compiled first. Files in this directory
59     directory are always compiled first. More specifically, their order     are compiled in the following order:
60     of compilation is as follows:         (0)  build the primitive environment (from 3-Semant/prim/prim.sml)
61         (0)  build the initial primitive static environment         (1)  assembly.sig
62                (see 3-Semant/statenv/prim.sml)         (2)  dummy.sml
63         (1)  compile assembly.sig and dummy.sml, these two files         (3)  core.sml
64              make up the static environment for the runtime structure         (4)  files in all-files.cm (following the exact order)
65              (coming from the ../runtime/kernel/globals.c file). The         (5)  files in pervasive.cm (following the exact order)
66              dynamic object from executing dummy.sml is discarded, and  
             replaced by a hard-wired object coming from the runtime  
             system.  
        (2)  compile core.sml, which defines a bunch of useful exceptions  
             and utilty functions such as polymorphic equality, string  
             equality, delay and force primitives, etc.  
        (4)  files in all-files.cm (must follow the exact order)  
        (5)  files in pervasive.cm (must follow the exact order)  
   
 1/TopLevel  
    This directory contains the top-level glue files for different versions  
    of the batch and interactive compiler.  To understand, how the compiler  
    is organized, you can read the main directory.  
67  1-TopLevel/batch/  1-TopLevel/batch/
68     Utility files for the Compilation Manager CM and CMB;     Utility files for the Compilation Manager CM and CMB;
69  1-TopLevel/bootstrap/  1-TopLevel/bootstrap/
# Line 109  Line 71 
71     shareglue.sml. Before building an interactive compiler, one should have     shareglue.sml. Before building an interactive compiler, one should have
72     already gotten a visible compiler (for that particular architecture),     already gotten a visible compiler (for that particular architecture),
73     see the viscomp directory. To build a compiler for SPARC architecture,     see the viscomp directory. To build a compiler for SPARC architecture,
74     all we need to do is to load and run the IntSparc (in sparcglue.sml)     all we need to do is the load and run the IntSparc (in sparcglue.sml)
75     structure.     structure.
76  1-TopLevel/environ/  1-TopLevel/environ/
77     A top-level environment include static environment, dynamic environment     A top-level environment include static environment, dynamic environment
# Line 120  Line 82 
82     How the top-level interactive loop is organized. The evalloop.sml contains     How the top-level interactive loop is organized. The evalloop.sml contains
83     the details on how a ML program is compiled from source code to binary     the details on how a ML program is compiled from source code to binary
84     code and then later being executed.     code and then later being executed.
85  1-TopLevel/main/  1-TopLevel/misc/
86     The top-level compiler structure is shown in the compile.sig and     Compiler control flags and version numbers are here.
    compile.sml. The compile.sml contains details on how ML programs  
    are compiled into the FLINT intermediate format, but the details  
    on how FLINT gets compiled into the binary code segments are not  
    detailed here, instead, they are described in the  
    4-FLINT/main/flintcomp.sml file. The CODEGENERATOR signature  
    in codes.sig defines the interface about this FLINT code generator.  
    Note: all the uses of the compilation facility goes throught the "compile"  
    function defined in the compile.sml. The common intermediate formats are  
    stated in the compbasic.sig and compbasic.sml files. The version.sml  
    defines the version numbers.  
87  1-TopLevel/viscomp/  1-TopLevel/viscomp/
88     How to build the visible compiler viscomp --- this is essentially     How to build the visible compiler viscomp --- this is essentially
89     deciding what to export to the outside world. All the Compiler     deciding what to export to the outside world.
    control flags are defined in the control.sig and control.sml files  
    placed in this directory.  
90    
91  2-FrontEnd/  2-FrontEnd
92     Phase 1 of the compilation process. Turning the SML source code into     Phase 1 of the compilation process. Turning the SML source code into
93     the Concrete Synatx. The definition of concrete syntax is in ast/ast.sml.     the Concrete Synatx. The definition of concrete syntax is in ast.sml.
94     The frontend.sig and frontend.sml files in the main directory contain     The frontend.sig and frontend.sml files contains the big picture on
95     the big picture on the front end.     how the lexer and parser are organized.
96    
97  3-Semant  3-Semant
98     This phase does semantic analysis, more specifically, it does the     So-called semantic analysis, but really doing the elaboration and
99     elaboration (of concrete syntax into abstract syntax) and type-checking     type-checking of the core and module languages. The semantic objects
100     of the core and module languages. The semantic objects are defined in     are defined in bindings.sml. The result is the Abstract Syntax, defined
101     main/bindings.sml. The result is the Abstract Syntax, defined the     in the absyn directory
102     main/absyn.sml file.  3-Semant/absyn/
103       Definition of Abstract Syntax
104  3-Semant/basics/  3-Semant/basics/
105     Definition of several data structures and utility functions. They are     Definition of semantic objects for the core language including type
106     used by the code that does semantic analysis. The env.sig and env.sml     bindings, value bindings, and constructor bindings.
107     files defines the underlying data structures used to represent  the  3-Semant/bindings.sml
108     static environment.     Top-level view of what semantic objects we have
109  3-Semant/elaborate/  3-Semant/elaborate/
110     How to turn a piece of code in the Concrete Syntax into one in the     How to turn a piece of code in the Concrete Syntax into one in the
111     Abstract Syntax. The top-level organization is in the following     Abstract Syntax. The top-level organization is in the following
112     elabtop.sml file.     elabtop.sml file.
113  3-Semant/main/absyn.sml  3-Semant/elabtop.sml
    Definition of Abstract Syntax  
 3-Semant/main/bindings.sml  
    Top-level view of what semantic objects we have  
 3-Semant/main/elabtop.sml  
114     Top-level view of the elaboration process. Notice that each piece     Top-level view of the elaboration process. Notice that each piece
115     of core-ML program is first translated into the Abstract Syntax,     of core-ML program is first translated into the Abstract Syntax,
116     and gets type-checked. The type-checking does change the contents     and gets type-checked. The type-checking does change the contents
# Line 177  Line 124 
124     if you want to create the *.bin file. It is also useful to infer     if you want to create the *.bin file. It is also useful to infer
125     a unique persistant id for each compilation unit (useful to detect     a unique persistant id for each compilation unit (useful to detect
126     the cut-off compilation dependencies).     the cut-off compilation dependencies).
127    3-Semant/prim/
128       Here is the list of primitive operators and primitive types. All
129       are synthesized, and then later get put in the primEnv environment.
130       PrimEnv was used as the initial environment when elaborating files
131       in the 0-Boot directory.
132  3-Semant/statenv/  3-Semant/statenv/
133     The definition of Static Environment. The SC-ed version of Static     The definition of Static Environment. The SC-ed version of Static
134     Environment is used to avoid environment blow-up in the pickling.     Environment is used to avoid environment blow-up.
    The prim.sml contains the list of primitive operators and primitive  
    types exported in the initial static environment (i.e., PrimEnv).  
    During bootstrapping, PrimEnv is the first environment you have to  
    set up before you can compile files in the 0-Boot directory.  
 3-Semant/types/  
    This directory contains all the data structures and utility functions  
    used in type-checking the Core-ML language.  
135  3-Semant/typing/  3-Semant/typing/
136     The type-checking and type-inference code for the core-ML programs.     The type-checking and type-inference code for the core-ML programs.
137     It is performed on Abstract Syntax and it produces Abstract Syntax     It is performed on Abstract Syntax and it produces Abstract Syntax
138     also.     also.
139    
140  4-FLINT  4-Translate
141     This phase translates the Abstract Syntax into the intermediate     This phase translates the Abstract Syntax into the intermediate
142     Lambda language (i.e., FLINT). During the translation, it compiles     Lambda language (i.e., FLINT). During the translation, it compiles
143     the Pattern Matches (see the mcomp directory). Then it does a bunch     the Pattern Matches (see the mcomp directory).
144     of optimizations on FLINT; then it does representation analysis,  4-Translate/lambda/
145     and it converts the FLINT code into CPS, finally it does closure     Definition of the intermediate languages. How to type-check it
146     conversion.     and how to print it.
147  4-FLINT/clos/  4-Translate/mcomp/
148       Code for compiling pattern matches.
149    4-Translate/opt/
150       Code for optimization and representation analysis of the Lambda code.
151    4-Translate/plambda/
152       An older version of the Lambda language (not in the A-Normal form)
153    4-Translate/trans/
154       Translation of Abstract Syntax into the Lambda code. Of course, the
155       semantic objects used in the elaboration have to be translated into
156       the Lambda types as well.
157    4-Translate/type/
158       Definiton of the Lambda types, constructors and kinds. A bunch of
159       utility functions on how to manipulate the type environment, deBruijn
160       indices, etc.
161    
162    5-CPS
163       This phase turns the Lambda code into Continuation Passing Style,
164       does the closure conversion, and then feed it into the code generator.
165    5-CPS/clos/
166     The closure conversion step. Check out Shao/Appel LFP94 paper for     The closure conversion step. Check out Shao/Appel LFP94 paper for
167     the detailed algorithm.     the detailed algorithm
168  4-FLINT/cps/  5-CPS/conv/
169     Definition of CPS plus on how to convert the FLINT code into the     Converting the Lambda code into the CPS code. The main nontrivial
170     CPS code. The compilation of the Switch statement is done in this     step is the compilation of the Switch statement.
171     phase.  5-CPS/obsol/
 4-FLINT/cpsopt/  
    The CPS-based optimizations (check Appel's "Compiling with  
    Continuations" book for details). Eventually, all optimizations  
    in this directory will be migrated into FLINT.  
 4-FLINT/flint/  
    This directory defines the FLINT language. The detailed definitions  
    of primitive tycs, primitive operators, kinds, type constructors,  
    and types are in the 4-FLINT/kernel directory.  
 4-FLINT/kernel/  
    Definiton of the kernel data structures used in the FLINT language.  
    This includes: deBruijn indices, primitive tycs, primitive operators,  
    FLINT kinds, FLINT constructors, and FLINT types. When you write  
    code that manipulates the FLINT code, please restrict yourself to  
    use the functions defined in the LTYEXTERN interface only.  
 4-FLINT/lambda/  
    Definition of the OLD lambda language, should go away soon.  
 4-FLINT/main/  
    The flintcomp.sml describes how the FLINT code gets compiled into  
    the optimized and closure-converted CPS code (eventually, it should  
    produce optimized, closure-converted, adn type-safe FLINT code).  
 4-FLINT/obsol/  
172     All files in this directory are currently not up-to-date. They are     All files in this directory are currently not up-to-date. They are
173     either obsolete or are not compatible with recent changes made to     either obsolete or are not compatible with recent changes made to
174     the CPS language.     the CPS language.
175  4-FLINT/opt/  5-CPS/opt/
176     The FLINT-based optimizations, such as contraction, type     The CPS-based optimizations. Check out the Appel Compiling with
177     specializations, etc.     Continuation book for the details
178  4-FLINT/plambda/  5-CPS/spillgen/
179     An older version of the Lambda language (not in the A-Normal form)     This is the OLD code generator, currently used by the SPARC, MIPS,
180  4-FLINT/reps/     RS6000 and X86 architectures. A counterpart of this is the new
181     Code for the representation analysis of the FLINT code.     code generator in the 7-NewCGen/cpscompile directory.
182  4-FLINT/trans/  5-CPS/top/
183     Translation of Abstract Syntax into the PLambda code, then to the FLINT     The toplevel organization of all the above phases.
    code. All semantic objects used in the elaboration are translated into  
    the FLINT types as well. The translation phase also does match  
    compilation. The translation from PLambda to FLINT does the (partial)  
    type-based argument flattening.  
184    
185  5-CodeGen  6-CodeGen
186     The old code generator. May eventually go away after Lal's new     The old code generator. May eventually go away after Lal's new
187     code generator becomes stable on all platforms. Each code generator     code generator become stable. Each code generator should produce
188     should produce a structure of signature CODEGENERATOR (defined in     a structure of signature CODEGENERATOR (defined in the
189     the 1-Toplevel/main/codes.sig file).     1-Toplevel/misc/codes.sig file)
190  5-CodeGen/coder/  6-CodeGen/bytecode/
191       These code might be obsolete. It is supposed to generate some
192       kind of byte code.
193    6-CodeGen/coder/
194     This directory contains the machine-independent parts of the     This directory contains the machine-independent parts of the
195     old code generator. Some important signatures are also here.     old code generator. Some important signatures are also here.
196  5-CodeGen/cpsgen/  6-CodeGen/mips/
    Compilation of CPS into the abstract machine in the old code  
    generator. Probably the spill.sml and limit.sml files should  
    not be placed here. A counterpart of this in the new  
    code generator is the 6-NewCGen/cpscompile directory.  
 5-CodeGen/mips/  
197     MIPS code generator for both little endian and big endian     MIPS code generator for both little endian and big endian
198  5-CodeGen/rs6000/  6-CodeGen/rs6000/
199     RS6000 code generator     RS6000 code generator
200  5-CodeGen/sparc/  6-CodeGen/sparc/
201     SPARC code generator     SPARC code generator
202  5-CodeGen/x86/  6-CodeGen/x86/
203     X86 code generator     X86 code generator
204    
205  6-NewCGen/alpha32/  7-NewCGen/MLRISC/
206       Lal George's new code generator ML RISC
207    7-NewCGen/alpha32/
208     Alpha32 new code generator     Alpha32 new code generator
209  6-NewCGen/alpha32x/  7-NewCGen/alpha32x/
210     Alpha32 new code generator (with special patches)     Alpha32 new code generator (with special patches)
211  6-NewCGen/cpscompile/  7-NewCGen/cpscompile/
212     Compilation of CPS into the MLRISC abstract machine code     Compilation of CPS into the MLRISC abstract machine code
213  6-NewCGen/hppa/  7-NewCGen/hppa/
214     HPPA new code genrator     HPPA new code genrator
215    
216  7-MLRISC  9-Misc/
    Lal George's new code generator generator (MLRISC).  
   
 9-MiscUtil/  
217     Contains various kinds of utility programs     Contains various kinds of utility programs
218  9-MiscUtil/bignums/  9-Misc/bignums/
219     Bignum packages. I have no clue how stable this is.     Bignum packages. I have no clue how stable this is.
220  9-MiscUtil/fixityparse  9-Misc/print/
 9-MiscUtil/lazycomp  
    Some code for implementation of the lazy evaluation primitives.  
 9-MiscUtil/print/  
221     Pretty printing. Very Adhoc, needs major clean up.     Pretty printing. Very Adhoc, needs major clean up.
222  9-MiscUtil/profile/  9-Misc/profile/
223     The time and the space profiler.     The time and the space profiler.
224  9-MiscUtil/util/  9-Misc/util/
225     Important utility functions including the Inputsource (for     Important utility functions including the Inputsource (for
226     reading in a program), and various Hashtable and Dictionary     reading in a program), and various Hashtable and Dictionary
227     implementations.     implementations.
   
 ============================================================================  
 A. SUMMARY:  
   
 0. statenv   : symbol -> binding  
    dynenv    : pid -> object  
    symenv    : pid -> flint  
 1. Parsing   : source -> ast  
 2. Elaborator: ast + statenv -> absyn + pickle + newstatenv  
 3. FLINT     : absyn -> FLINT -> CPS -> CLO  
 4. CodeGen   : CPS -> csegments (spilling, limit check, codegen)  
 5. NewCGen   : CPS -> csegments  (via MLRISC)  
   
 ============================================================================  
 B. How to recover the all-files.cm (or sources.cm) file after making  
    dramatic changes to the directory structure. Notice that the difference  
    between all-files.cm and sources.cm is just the bootstrap glue files.  
   
    1. ls -1 [1-6,9]*/*/*.{sig,sml} | grep -i -v glue | grep -v obsol > xxx  
    2. Add 7-MLRISC/MLRISC.cm  
    3. Fix ml.lex.* and ml.grm.* files  
    4. Add 9-MiscUtil/util/UTIL.cm  
    5. Add ../ml-yacc/lib/sources.cm  
    6. Delete 9-MiscUtil/util/intmap.sig  
              9-MiscUtil/util/intmap.sml  
              9-MiscUtil/util/sort.sml  
              9-MiscUtil/util/sortedlist.sml  
228  ============================================================================  ============================================================================

Legend:
Removed from v.17  
changed lines
  Added in v.24

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0