Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/branches/SMLNJ/src/compiler/README
ViewVC logotype

Diff of /sml/branches/SMLNJ/src/compiler/README

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 24, Thu Mar 12 00:49:58 1998 UTC revision 45, Sun Mar 22 20:11:09 1998 UTC
# Line 1  Line 1 
1  ============================================================================  ============================================================================
2  The Overall Structure of the Current FLINT/ML Compiler  This README file describes the overall structure of the current version of
3      <based On SML/NJ version 109.33 11/29/97>  the SML/NJ (v110.3) & FLINT/ML (v1.3) compiler source tree. Please send
4    your questions, comments, and suggestions to flint@cs.yale.edu (or contact
5    Zhong Shao at shao-zhong@cs.yale.edu).
   
6  ============================================================================  ============================================================================
 The Overall Structure of the Current FLINT/ML Compiler  
     <based On SML/NJ version 109.31+ 9/22/97>  
7    
8  NOTES  NOTES
9     Some informal half-baked notes. Just don't want to discard them.     Some informal implementation notes.
10    
11  README  README
12     This file which gives an overview of the compiler structure.     This file. It gives an overview of the overall compiler structure.
13    
14  all-files.cm  all-files.cm
15     The standard Makefile for compiling the compiler. It is similar     The standard Makefile for compiling the compiler. It is similar
16     to the idea of sources.cm used by CM.make, except that     to the idea of sources.cm used by CM.make, except that
17     all-files.cm is designed for CMB.make only. The resulting binfiles     all-files.cm is designed for bootstrapping the compiler itself
18     from doing CMB.make are placed in a single bin directory, eg.     only (i.e., CMB.make). The resulting binfiles from doing CMB.make
19     bin.x86-unix or bin.sparc-unix. Right now, all-files.cm is     are placed in a single bin directory, eg. bin.x86-unix or
20     just whatever in sources.cm plus all the bootstrap glue files.     bin.sparc-unix. Right now, the list in all-files.cm is just the
21       list in sources.cm plus all the glue files in the 1-TopLevel/bootstrap
22  buildcm     directory (which are used to bootstrap the interactive compiler).
23     Scripts for building the sml-cm version of the compiler.  
24    buildcm* compiler-name
25       A script for building the sml-cm version of the compiler. Suppose
26       you have build a SML heap image named sml.x86-unix, you type
27       "buildcm sml.x86-unix" to get the cm version of the compiler,
28       probably named "sml-cm.x86-unix".
29    
30  buildcm2  buildcm2* compiler-name
31     Scripts for building a sml-cm compiler that knows to where to     Scripts for building a sml-cm compiler that knows where to
32     find the library and ml-lex and ml-yacc, etc. Need to adjust     find the library and ml-lex and ml-yacc, etc. Need to adjust
33     the top-level directory name there.     the top-level directory name there.
34    
35  sources.cm  sources.cm
36     This file contains the usual makefile for CM.make. It is not     This file contains the usual makefile for CM.make. It is not
37     used to build up the interactive compiler. But it can be     used to build up the interactive compiler. But it can be
38     useful for debugging purpose as doing CM.make immediately     useful for debugging purpose. For example, you can type CM.make()
39     build up a interactive visible compiler. Notice all the     to immediately build up a new, interactive visible compiler. To
40     bootstrap glue files are not here, because CM.make() does     access the newly built compiler, you use the
41     run them which will cause problems.         "XXXVisComp.Interact.useFile"
42       function to compile ML programs. Notice all the bootstrap glue
43  xmakeml     files are not in sources.cm.
44     Scripts for building the interactive compiler. The default path  
45     of bin files is ./bin.$arch-$os. If you add the "-full" option,  xmakeml* [-full] [-elab]
46     it will build a compiler whose components are visible to the     A script for building the interactive compiler. The default path
47     top-level interactive environment.     of bin files is ./bin.$arch-$os. There are two command-line options:
48       if you add the "-full" option, it will build a compiler whose
49  xrun     components are visible to the top-level interactive environment;
50     Scripts for running the interactive copmiler.     if you add the "-elab" option, it will re-elaborate all the ML
51       programs to recreate the static environments (this is useful, if
52       your new compiler has changed the representations of the bindings
53       in the static environments).
54    
55    xrun* compiler-name
56       A script for running the copmiler. Suppose you have a heap image
57       named "sml.x86-unix", you can type "xrun sml.x86-unix" to run the
58       compiler. Similarly, you can type "xrun sml-cm.x86-unix" to run
59       the CM version of the sml compiler. The xrun script uses the
60       runtime system in the ../../bin/.run directory.
61    
62  ============================================================================  ============================================================================
63  Tips:  Tips:
64     To find a particular binding XXX, type     The current source code is organized as a two-level directory tree.
65          grep XXX */*.{sig,sml,lex,grm} */*/*.{sig,sml,lex,grm}     Apart from a few files which are placed immediately inside the 0-Boot
66     If you like, make the above into a script.     directory (i.e., 0-Boot/*.{sig,sml}), all source files can be grep-ed
67       by typing "grep xxx */*/*.{sig,sml}", assuming you are looking for
68  0-Boot     binding "xxx".
69     The SML97 Basis Library. When recompiling the compiler (via CMB.make),  
70     files in this directory are always compiled first. Files in this directory     The following directories is organized based on the compilation phases.
71     are compiled in the following order:     Within each phase, the "main" sub-directory always contains the top-level
72         (0)  build the primitive environment (from 3-Semant/prim/prim.sml)     module and some important data structures for that particular compilation
73         (1)  assembly.sig     phase.
74         (2)  dummy.sml  
75         (3)  core.sml     File name conventions:
76         (4)  files in all-files.cm (following the exact order)       *.sig --- the ML signature file
77         (5)  files in pervasive.cm (following the exact order)       *.sml --- the ML source program (occasionally with signatures)
78         *.grm --- ML-Yacc file
79         *.lex --- ML-Lex file
80         *.cm  --- the CM makefile
81    
82    0-Basis
83       The SML/NJ Initial Bootstrapping Library and the SML97 Basis Library.
84       When recompiling the compiler (i.e., via CMB.make), files in this
85       directory are always compiled first. More specifically, their order
86       of compilation is as follows:
87           (0)  build the initial primitive static environment
88                  (see 3-Semant/statenv/prim.sml)
89           (1)  compile assembly.sig and dummy.sml, these two files
90                make up the static environment for the runtime structure
91                (coming from the ../runtime/kernel/globals.c file). The
92                dynamic object from executing dummy.sml is discarded, and
93                replaced by a hard-wired object coming from the runtime
94                system.
95           (2)  compile core.sml, which defines a bunch of useful exceptions
96                and utilty functions such as polymorphic equality, string
97                equality, delay and force primitives, etc.
98           (4)  files in all-files.cm (must follow the exact order)
99           (5)  files in pervasive.cm (must follow the exact order)
100    
101    1-TopLevel
102       This directory contains the top-level glue files for different versions
103       of the batch and interactive compiler.  To understand, how the compiler
104       is organized, you can read the main directory.
105  1-TopLevel/batch/  1-TopLevel/batch/
106     Utility files for the Compilation Manager CM and CMB;     Utility files for the Compilation Manager CM and CMB;
107  1-TopLevel/bootstrap/  1-TopLevel/bootstrap/
# Line 71  Line 109 
109     shareglue.sml. Before building an interactive compiler, one should have     shareglue.sml. Before building an interactive compiler, one should have
110     already gotten a visible compiler (for that particular architecture),     already gotten a visible compiler (for that particular architecture),
111     see the viscomp directory. To build a compiler for SPARC architecture,     see the viscomp directory. To build a compiler for SPARC architecture,
112     all we need to do is the load and run the IntSparc (in sparcglue.sml)     all we need to do is to load and run the IntSparc (in sparcglue.sml)
113     structure.     structure.
114  1-TopLevel/environ/  1-TopLevel/environ/
115     A top-level environment include static environment, dynamic environment     A top-level environment include static environment, dynamic environment
# Line 82  Line 120 
120     How the top-level interactive loop is organized. The evalloop.sml contains     How the top-level interactive loop is organized. The evalloop.sml contains
121     the details on how a ML program is compiled from source code to binary     the details on how a ML program is compiled from source code to binary
122     code and then later being executed.     code and then later being executed.
123  1-TopLevel/misc/  1-TopLevel/main/
124     Compiler control flags and version numbers are here.     The top-level compiler structure is shown in the compile.sig and
125       compile.sml. The compile.sml contains details on how ML programs
126       are compiled into the FLINT intermediate format, but the details
127       on how FLINT gets compiled into the binary code segments are not
128       detailed here, instead, they are described in the
129       4-FLINT/main/flintcomp.sml file. The CODEGENERATOR signature
130       in codes.sig defines the interface about this FLINT code generator.
131       Note: all the uses of the compilation facility goes throught the "compile"
132       function defined in the compile.sml. The common intermediate formats are
133       stated in the compbasic.sig and compbasic.sml files. The version.sml
134       defines the version numbers.
135  1-TopLevel/viscomp/  1-TopLevel/viscomp/
136     How to build the visible compiler viscomp --- this is essentially     How to build the visible compiler viscomp --- this is essentially
137     deciding what to export to the outside world.     deciding what to export to the outside world. All the Compiler
138       control flags are defined in the control.sig and control.sml files
139       placed in this directory.
140    
141  2-FrontEnd  2-Parse/
142     Phase 1 of the compilation process. Turning the SML source code into     Phase 1 of the compilation process. Turning the SML source code into
143     the Concrete Synatx. The definition of concrete syntax is in ast.sml.     the Concrete Synatx. The definition of concrete syntax is in ast/ast.sml.
144     The frontend.sig and frontend.sml files contains the big picture on     The frontend.sig and frontend.sml files in the main directory contain
145     how the lexer and parser are organized.     the big picture on the front end.
146    
147  3-Semant  3-Semant
148     So-called semantic analysis, but really doing the elaboration and     This phase does semantic analysis, more specifically, it does the
149     type-checking of the core and module languages. The semantic objects     elaboration (of concrete syntax into abstract syntax) and type-checking
150     are defined in bindings.sml. The result is the Abstract Syntax, defined     of the core and module languages. The semantic objects are defined in
151     in the absyn directory     main/bindings.sml. The result is the Abstract Syntax, defined the
152  3-Semant/absyn/     main/absyn.sml file.
    Definition of Abstract Syntax  
153  3-Semant/basics/  3-Semant/basics/
154     Definition of semantic objects for the core language including type     Definition of several data structures and utility functions. They are
155     bindings, value bindings, and constructor bindings.     used by the code that does semantic analysis. The env.sig and env.sml
156  3-Semant/bindings.sml     files defines the underlying data structures used to represent  the
157     Top-level view of what semantic objects we have     static environment.
158  3-Semant/elaborate/  3-Semant/elaborate/
159     How to turn a piece of code in the Concrete Syntax into one in the     How to turn a piece of code in the Concrete Syntax into one in the
160     Abstract Syntax. The top-level organization is in the following     Abstract Syntax. The top-level organization is in the following
161     elabtop.sml file.     elabtop.sml file.
162  3-Semant/elabtop.sml  3-Semant/main/absyn.sml
163       Definition of Abstract Syntax
164    3-Semant/main/bindings.sml
165       Top-level view of what semantic objects we have
166    3-Semant/main/elabtop.sml
167     Top-level view of the elaboration process. Notice that each piece     Top-level view of the elaboration process. Notice that each piece
168     of core-ML program is first translated into the Abstract Syntax,     of core-ML program is first translated into the Abstract Syntax,
169     and gets type-checked. The type-checking does change the contents     and gets type-checked. The type-checking does change the contents
# Line 124  Line 177 
177     if you want to create the *.bin file. It is also useful to infer     if you want to create the *.bin file. It is also useful to infer
178     a unique persistant id for each compilation unit (useful to detect     a unique persistant id for each compilation unit (useful to detect
179     the cut-off compilation dependencies).     the cut-off compilation dependencies).
 3-Semant/prim/  
    Here is the list of primitive operators and primitive types. All  
    are synthesized, and then later get put in the primEnv environment.  
    PrimEnv was used as the initial environment when elaborating files  
    in the 0-Boot directory.  
180  3-Semant/statenv/  3-Semant/statenv/
181     The definition of Static Environment. The SC-ed version of Static     The definition of Static Environment. The SC-ed version of Static
182     Environment is used to avoid environment blow-up.     Environment is used to avoid environment blow-up in the pickling.
183       The prim.sml contains the list of primitive operators and primitive
184       types exported in the initial static environment (i.e., PrimEnv).
185       During bootstrapping, PrimEnv is the first environment you have to
186       set up before you can compile files in the 0-Boot directory.
187    3-Semant/types/
188       This directory contains all the data structures and utility functions
189       used in type-checking the Core-ML language.
190  3-Semant/typing/  3-Semant/typing/
191     The type-checking and type-inference code for the core-ML programs.     The type-checking and type-inference code for the core-ML programs.
192     It is performed on Abstract Syntax and it produces Abstract Syntax     It is performed on Abstract Syntax and it produces Abstract Syntax
193     also.     also.
194    
195  4-Translate  4-FLINT
196     This phase translates the Abstract Syntax into the intermediate     This phase translates the Abstract Syntax into the intermediate
197     Lambda language (i.e., FLINT). During the translation, it compiles     Lambda language (i.e., FLINT). During the translation, it compiles
198     the Pattern Matches (see the mcomp directory).     the Pattern Matches (see the mcomp directory). Then it does a bunch
199  4-Translate/lambda/     of optimizations on FLINT; then it does representation analysis,
200     Definition of the intermediate languages. How to type-check it     and it converts the FLINT code into CPS, finally it does closure
201     and how to print it.     conversion.
202  4-Translate/mcomp/  4-FLINT/clos/
    Code for compiling pattern matches.  
 4-Translate/opt/  
    Code for optimization and representation analysis of the Lambda code.  
 4-Translate/plambda/  
    An older version of the Lambda language (not in the A-Normal form)  
 4-Translate/trans/  
    Translation of Abstract Syntax into the Lambda code. Of course, the  
    semantic objects used in the elaboration have to be translated into  
    the Lambda types as well.  
 4-Translate/type/  
    Definiton of the Lambda types, constructors and kinds. A bunch of  
    utility functions on how to manipulate the type environment, deBruijn  
    indices, etc.  
   
 5-CPS  
    This phase turns the Lambda code into Continuation Passing Style,  
    does the closure conversion, and then feed it into the code generator.  
 5-CPS/clos/  
203     The closure conversion step. Check out Shao/Appel LFP94 paper for     The closure conversion step. Check out Shao/Appel LFP94 paper for
204     the detailed algorithm     the detailed algorithm.
205  5-CPS/conv/  4-FLINT/cps/
206     Converting the Lambda code into the CPS code. The main nontrivial     Definition of CPS plus on how to convert the FLINT code into the
207     step is the compilation of the Switch statement.     CPS code. The compilation of the Switch statement is done in this
208  5-CPS/obsol/     phase.
209    4-FLINT/cpsopt/
210       The CPS-based optimizations (check Appel's "Compiling with
211       Continuations" book for details). Eventually, all optimizations
212       in this directory will be migrated into FLINT.
213    4-FLINT/flint/
214       This directory defines the FLINT language. The detailed definitions
215       of primitive tycs, primitive operators, kinds, type constructors,
216       and types are in the 4-FLINT/kernel directory.
217    4-FLINT/kernel/
218       Definiton of the kernel data structures used in the FLINT language.
219       This includes: deBruijn indices, primitive tycs, primitive operators,
220       FLINT kinds, FLINT constructors, and FLINT types. When you write
221       code that manipulates the FLINT code, please restrict yourself to
222       use the functions defined in the LTYEXTERN interface only.
223    4-FLINT/lambda/
224       Definition of the OLD lambda language, should go away soon.
225    4-FLINT/main/
226       The flintcomp.sml describes how the FLINT code gets compiled into
227       the optimized and closure-converted CPS code (eventually, it should
228       produce optimized, closure-converted, adn type-safe FLINT code).
229    4-FLINT/obsol/
230     All files in this directory are currently not up-to-date. They are     All files in this directory are currently not up-to-date. They are
231     either obsolete or are not compatible with recent changes made to     either obsolete or are not compatible with recent changes made to
232     the CPS language.     the CPS language.
233  5-CPS/opt/  4-FLINT/opt/
234     The CPS-based optimizations. Check out the Appel Compiling with     The FLINT-based optimizations, such as contraction, type
235     Continuation book for the details     specializations, etc.
236  5-CPS/spillgen/  4-FLINT/plambda/
237     This is the OLD code generator, currently used by the SPARC, MIPS,     An older version of the Lambda language (not in the A-Normal form)
238     RS6000 and X86 architectures. A counterpart of this is the new  4-FLINT/reps/
239     code generator in the 7-NewCGen/cpscompile directory.     Code for the representation analysis of the FLINT code.
240  5-CPS/top/  4-FLINT/trans/
241     The toplevel organization of all the above phases.     Translation of Abstract Syntax into the PLambda code, then to the FLINT
242       code. All semantic objects used in the elaboration are translated into
243       the FLINT types as well. The translation phase also does match
244       compilation. The translation from PLambda to FLINT does the (partial)
245       type-based argument flattening.
246    
247    5-CodeGen/alpha32/
248       Alpha32 new code generator
249    5-CodeGen/alpha32x/
250       Alpha32 new code generator (with special patches)
251    5-CodeGen/cpscompile/
252       Compilation of CPS into the MLRISC abstract machine code
253    5-CodeGen/hppa/
254       HPPA new code genrator
255    5-CodeGen/main/
256       The big picture of the codegenerator; including important
257       files on machine specifications and runtime tagging schemes.
258    
259  6-CodeGen  6-OldCGen
260     The old code generator. May eventually go away after Lal's new     The old code generator. May eventually go away after Lal's new
261     code generator become stable. Each code generator should produce     code generator becomes stable on all platforms. Each code generator
262     a structure of signature CODEGENERATOR (defined in the     should produce a structure of signature CODEGENERATOR (defined in
263     1-Toplevel/misc/codes.sig file)     the 1-Toplevel/main/codes.sig file).
264  6-CodeGen/bytecode/  6-OldCGen/coder/
    These code might be obsolete. It is supposed to generate some  
    kind of byte code.  
 6-CodeGen/coder/  
265     This directory contains the machine-independent parts of the     This directory contains the machine-independent parts of the
266     old code generator. Some important signatures are also here.     old code generator. Some important signatures are also here.
267  6-CodeGen/mips/  6-OldCGen/cpsgen/
268       Compilation of CPS into the abstract machine in the old code
269       generator. Probably the spill.sml and limit.sml files should
270       not be placed here. A counterpart of this in the new
271       code generator is the 6-NewCGen/cpscompile directory.
272    6-OldCGen/mips/
273     MIPS code generator for both little endian and big endian     MIPS code generator for both little endian and big endian
274  6-CodeGen/rs6000/  6-OldCGen/rs6000/
275     RS6000 code generator     RS6000 code generator
276  6-CodeGen/sparc/  6-OldCGen/sparc/
277     SPARC code generator     SPARC code generator
278  6-CodeGen/x86/  6-OldCGen/x86/
279     X86 code generator     X86 code generator
280    
281  7-NewCGen/MLRISC/  7-MLRISC
282     Lal George's new code generator ML RISC     Lal George's new code generator generator (MLRISC).
 7-NewCGen/alpha32/  
    Alpha32 new code generator  
 7-NewCGen/alpha32x/  
    Alpha32 new code generator (with special patches)  
 7-NewCGen/cpscompile/  
    Compilation of CPS into the MLRISC abstract machine code  
 7-NewCGen/hppa/  
    HPPA new code genrator  
283    
284  9-Misc/  9-MiscUtil/
285     Contains various kinds of utility programs     Contains various kinds of utility programs
286  9-Misc/bignums/  9-MiscUtil/bignums/
287     Bignum packages. I have no clue how stable this is.     Bignum packages. I have no clue how stable this is.
288  9-Misc/print/  9-MiscUtil/fixityparse
289    9-MiscUtil/lazycomp
290       Some code for implementation of the lazy evaluation primitives.
291    9-MiscUtil/print/
292     Pretty printing. Very Adhoc, needs major clean up.     Pretty printing. Very Adhoc, needs major clean up.
293  9-Misc/profile/  9-MiscUtil/profile/
294     The time and the space profiler.     The time and the space profiler.
295  9-Misc/util/  9-MiscUtil/util/
296     Important utility functions including the Inputsource (for     Important utility functions including the Inputsource (for
297     reading in a program), and various Hashtable and Dictionary     reading in a program), and various Hashtable and Dictionary
298     implementations.     implementations.
299    
300    ============================================================================
301    A. SUMMARY:
302    
303    0. statenv   : symbol -> binding
304       dynenv    : pid -> object
305       symenv    : pid -> flint
306    1. Parsing   : source -> ast
307    2. Elaborator: ast + statenv -> absyn + pickle + newstatenv
308    3. FLINT     : absyn -> FLINT -> CPS -> CLO
309    4. CodeGen   : CPS -> csegments (via MLRISC)
310    5. OldCGen   : CPS -> csegments (spilling, limit check, codegen)
311    
312    ============================================================================
313    B. How to recover the all-files.cm (or sources.cm) file after making
314       dramatic changes to the directory structure. Notice that the difference
315       between all-files.cm and sources.cm is just the bootstrap glue files.
316    
317       1. ls -1 [1-6,9]*/*/*.{sig,sml} | grep -i -v glue | grep -v obsol > xxx
318       2. Add 7-MLRISC/MLRISC.cm
319       3. Fix ml.lex.* and ml.grm.* files
320       4. Add 9-MiscUtil/util/UTIL.cm
321       5. Add ../ml-yacc/lib/sources.cm
322       6. Delete 9-MiscUtil/util/intmap.sig
323                 9-MiscUtil/util/intmap.sml
324                 9-MiscUtil/util/sort.sml
325                 9-MiscUtil/util/sortedlist.sml
326  ============================================================================  ============================================================================

Legend:
Removed from v.24  
changed lines
  Added in v.45

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0