Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/branches/SMLNJ/src/compiler/README
ViewVC logotype

Annotation of /sml/branches/SMLNJ/src/compiler/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 16 - (view) (download)
Original Path: sml/trunk/src/compiler/README

1 : monnier 16 ============================================================================
2 :     This README file describes the overall structure of the current version of
3 :     the SML/NJ (v110.3) & FLINT/ML (v1.3) compiler source tree. Please send
4 :     your questions, comments, and suggestions to flint@cs.yale.edu (or contact
5 :     Zhong Shao at shao-zhong@cs.yale.edu).
6 :     ============================================================================
7 :    
8 :     NOTES
9 :     Some informal implementation notes.
10 :    
11 :     README
12 :     This file. It gives an overview of the overall compiler structure.
13 :    
14 :     all-files.cm
15 :     The standard Makefile for compiling the compiler. It is similar
16 :     to the idea of sources.cm used by CM.make, except that
17 :     all-files.cm is designed for bootstrapping the compiler itself
18 :     only (i.e., CMB.make). The resulting binfiles from doing CMB.make
19 :     are placed in a single bin directory, eg. bin.x86-unix or
20 :     bin.sparc-unix. Right now, the list in all-files.cm is just the
21 :     list in sources.cm plus all the glue files in the 1-TopLevel/bootstrap
22 :     directory (which are used to bootstrap the interactive compiler).
23 :    
24 :     buildcm* compiler-name
25 :     A script for building the sml-cm version of the compiler. Suppose
26 :     you have build a SML heap image named sml.x86-unix, you type
27 :     "buildcm sml.x86-unix" to get the cm version of the compiler,
28 :     probably named "sml-cm.x86-unix".
29 :    
30 :     buildcm2* compiler-name
31 :     Scripts for building a sml-cm compiler that knows where to
32 :     find the library and ml-lex and ml-yacc, etc. Need to adjust
33 :     the top-level directory name there.
34 :    
35 :     sources.cm
36 :     This file contains the usual makefile for CM.make. It is not
37 :     used to build up the interactive compiler. But it can be
38 :     useful for debugging purpose. For example, you can type CM.make()
39 :     to immediately build up a new, interactive visible compiler. To
40 :     access the newly built compiler, you use the
41 :     "XXXVisComp.Interact.useFile"
42 :     function to compile ML programs. Notice all the bootstrap glue
43 :     files are not in sources.cm.
44 :    
45 :     xmakeml* [-full] [-elab]
46 :     A script for building the interactive compiler. The default path
47 :     of bin files is ./bin.$arch-$os. There are two command-line options:
48 :     if you add the "-full" option, it will build a compiler whose
49 :     components are visible to the top-level interactive environment;
50 :     if you add the "-elab" option, it will re-elaborate all the ML
51 :     programs to recreate the static environments (this is useful, if
52 :     your new compiler has changed the representations of the bindings
53 :     in the static environments).
54 :    
55 :     xrun* compiler-name
56 :     A script for running the copmiler. Suppose you have a heap image
57 :     named "sml.x86-unix", you can type "xrun sml.x86-unix" to run the
58 :     compiler. Similarly, you can type "xrun sml-cm.x86-unix" to run
59 :     the CM version of the sml compiler. The xrun script uses the
60 :     runtime system in the ../../bin/.run directory.
61 :    
62 :     ============================================================================
63 :     Tips:
64 :     The current source code is organized as a two-level directory tree.
65 :     Apart from a few files which are placed immediately inside the 0-Boot
66 :     directory (i.e., 0-Boot/*.{sig,sml}), all source files can be grep-ed
67 :     by typing "grep xxx */*/*.{sig,sml}", assuming you are looking for
68 :     binding "xxx".
69 :    
70 :     The following directories is organized based on the compilation phases.
71 :     Within each phase, the "main" sub-directory always contains the top-level
72 :     module and some important data structures for that particular compilation
73 :     phase.
74 :    
75 :     File name conventions:
76 :     *.sig --- the ML signature file
77 :     *.sml --- the ML source program (occasionally with signatures)
78 :     *.grm --- ML-Yacc file
79 :     *.lex --- ML-Lex file
80 :     *.cm --- the CM makefile
81 :    
82 :     0-Boot
83 :     The SML/NJ Initial Bootstrapping Library and the SML97 Basis Library.
84 :     When recompiling the compiler (i.e., via CMB.make), files in this
85 :     directory are always compiled first. More specifically, their order
86 :     of compilation is as follows:
87 :     (0) build the initial primitive static environment
88 :     (see 3-Semant/statenv/prim.sml)
89 :     (1) compile assembly.sig and dummy.sml, these two files
90 :     make up the static environment for the runtime structure
91 :     (coming from the ../runtime/kernel/globals.c file). The
92 :     dynamic object from executing dummy.sml is discarded, and
93 :     replaced by a hard-wired object coming from the runtime
94 :     system.
95 :     (2) compile core.sml, which defines a bunch of useful exceptions
96 :     and utilty functions such as polymorphic equality, string
97 :     equality, delay and force primitives, etc.
98 :     (4) files in all-files.cm (must follow the exact order)
99 :     (5) files in pervasive.cm (must follow the exact order)
100 :    
101 :     1/TopLevel
102 :     This directory contains the top-level glue files for different versions
103 :     of the batch and interactive compiler. To understand, how the compiler
104 :     is organized, you can read the main directory.
105 :     1-TopLevel/batch/
106 :     Utility files for the Compilation Manager CM and CMB;
107 :     1-TopLevel/bootstrap/
108 :     How to bootstrap an interactive compiler. Details are in boot.sml and
109 :     shareglue.sml. Before building an interactive compiler, one should have
110 :     already gotten a visible compiler (for that particular architecture),
111 :     see the viscomp directory. To build a compiler for SPARC architecture,
112 :     all we need to do is to load and run the IntSparc (in sparcglue.sml)
113 :     structure.
114 :     1-TopLevel/environ/
115 :     A top-level environment include static environment, dynamic environment
116 :     and symbolic environment. The definitions of static environments are in
117 :     the 3-Semant/statenv directory, as they are mostly used by the elaboration
118 :     and type checking.
119 :     1-TopLevel/interact/
120 :     How the top-level interactive loop is organized. The evalloop.sml contains
121 :     the details on how a ML program is compiled from source code to binary
122 :     code and then later being executed.
123 :     1-TopLevel/main/
124 :     The top-level compiler structure is shown in the compile.sig and
125 :     compile.sml. The compile.sml contains details on how ML programs
126 :     are compiled into the FLINT intermediate format, but the details
127 :     on how FLINT gets compiled into the binary code segments are not
128 :     detailed here, instead, they are described in the
129 :     4-FLINT/main/flintcomp.sml file. The CODEGENERATOR signature
130 :     in codes.sig defines the interface about this FLINT code generator.
131 :     Note: all the uses of the compilation facility goes throught the "compile"
132 :     function defined in the compile.sml. The common intermediate formats are
133 :     stated in the compbasic.sig and compbasic.sml files. The version.sml
134 :     defines the version numbers.
135 :     1-TopLevel/viscomp/
136 :     How to build the visible compiler viscomp --- this is essentially
137 :     deciding what to export to the outside world. All the Compiler
138 :     control flags are defined in the control.sig and control.sml files
139 :     placed in this directory.
140 :    
141 :     2-FrontEnd/
142 :     Phase 1 of the compilation process. Turning the SML source code into
143 :     the Concrete Synatx. The definition of concrete syntax is in ast/ast.sml.
144 :     The frontend.sig and frontend.sml files in the main directory contain
145 :     the big picture on the front end.
146 :    
147 :     3-Semant
148 :     This phase does semantic analysis, more specifically, it does the
149 :     elaboration (of concrete syntax into abstract syntax) and type-checking
150 :     of the core and module languages. The semantic objects are defined in
151 :     main/bindings.sml. The result is the Abstract Syntax, defined the
152 :     main/absyn.sml file.
153 :     3-Semant/basics/
154 :     Definition of several data structures and utility functions. They are
155 :     used by the code that does semantic analysis. The env.sig and env.sml
156 :     files defines the underlying data structures used to represent the
157 :     static environment.
158 :     3-Semant/elaborate/
159 :     How to turn a piece of code in the Concrete Syntax into one in the
160 :     Abstract Syntax. The top-level organization is in the following
161 :     elabtop.sml file.
162 :     3-Semant/main/absyn.sml
163 :     Definition of Abstract Syntax
164 :     3-Semant/main/bindings.sml
165 :     Top-level view of what semantic objects we have
166 :     3-Semant/main/elabtop.sml
167 :     Top-level view of the elaboration process. Notice that each piece
168 :     of core-ML program is first translated into the Abstract Syntax,
169 :     and gets type-checked. The type-checking does change the contents
170 :     of abstract syntax, as certain type information won't be known
171 :     until type-checking is done.
172 :     3-Semant/modules/
173 :     Utility functions for elaborations of modules. The module.sig and
174 :     module.sml contains the definitions of module-level semantic objects.
175 :     3-Semant/pickle/
176 :     How to write the static environments into a file! This is important
177 :     if you want to create the *.bin file. It is also useful to infer
178 :     a unique persistant id for each compilation unit (useful to detect
179 :     the cut-off compilation dependencies).
180 :     3-Semant/statenv/
181 :     The definition of Static Environment. The SC-ed version of Static
182 :     Environment is used to avoid environment blow-up in the pickling.
183 :     The prim.sml contains the list of primitive operators and primitive
184 :     types exported in the initial static environment (i.e., PrimEnv).
185 :     During bootstrapping, PrimEnv is the first environment you have to
186 :     set up before you can compile files in the 0-Boot directory.
187 :     3-Semant/types/
188 :     This directory contains all the data structures and utility functions
189 :     used in type-checking the Core-ML language.
190 :     3-Semant/typing/
191 :     The type-checking and type-inference code for the core-ML programs.
192 :     It is performed on Abstract Syntax and it produces Abstract Syntax
193 :     also.
194 :    
195 :     4-FLINT
196 :     This phase translates the Abstract Syntax into the intermediate
197 :     Lambda language (i.e., FLINT). During the translation, it compiles
198 :     the Pattern Matches (see the mcomp directory). Then it does a bunch
199 :     of optimizations on FLINT; then it does representation analysis,
200 :     and it converts the FLINT code into CPS, finally it does closure
201 :     conversion.
202 :     4-FLINT/clos/
203 :     The closure conversion step. Check out Shao/Appel LFP94 paper for
204 :     the detailed algorithm.
205 :     4-FLINT/cps/
206 :     Definition of CPS plus on how to convert the FLINT code into the
207 :     CPS code. The compilation of the Switch statement is done in this
208 :     phase.
209 :     4-FLINT/cpsopt/
210 :     The CPS-based optimizations (check Appel's "Compiling with
211 :     Continuations" book for details). Eventually, all optimizations
212 :     in this directory will be migrated into FLINT.
213 :     4-FLINT/flint/
214 :     This directory defines the FLINT language. The detailed definitions
215 :     of primitive tycs, primitive operators, kinds, type constructors,
216 :     and types are in the 4-FLINT/kernel directory.
217 :     4-FLINT/kernel/
218 :     Definiton of the kernel data structures used in the FLINT language.
219 :     This includes: deBruijn indices, primitive tycs, primitive operators,
220 :     FLINT kinds, FLINT constructors, and FLINT types. When you write
221 :     code that manipulates the FLINT code, please restrict yourself to
222 :     use the functions defined in the LTYEXTERN interface only.
223 :     4-FLINT/lambda/
224 :     Definition of the OLD lambda language, should go away soon.
225 :     4-FLINT/main/
226 :     The flintcomp.sml describes how the FLINT code gets compiled into
227 :     the optimized and closure-converted CPS code (eventually, it should
228 :     produce optimized, closure-converted, adn type-safe FLINT code).
229 :     4-FLINT/obsol/
230 :     All files in this directory are currently not up-to-date. They are
231 :     either obsolete or are not compatible with recent changes made to
232 :     the CPS language.
233 :     4-FLINT/opt/
234 :     The FLINT-based optimizations, such as contraction, type
235 :     specializations, etc.
236 :     4-FLINT/plambda/
237 :     An older version of the Lambda language (not in the A-Normal form)
238 :     4-FLINT/reps/
239 :     Code for the representation analysis of the FLINT code.
240 :     4-FLINT/trans/
241 :     Translation of Abstract Syntax into the PLambda code, then to the FLINT
242 :     code. All semantic objects used in the elaboration are translated into
243 :     the FLINT types as well. The translation phase also does match
244 :     compilation. The translation from PLambda to FLINT does the (partial)
245 :     type-based argument flattening.
246 :    
247 :     5-CodeGen
248 :     The old code generator. May eventually go away after Lal's new
249 :     code generator becomes stable on all platforms. Each code generator
250 :     should produce a structure of signature CODEGENERATOR (defined in
251 :     the 1-Toplevel/main/codes.sig file).
252 :     5-CodeGen/coder/
253 :     This directory contains the machine-independent parts of the
254 :     old code generator. Some important signatures are also here.
255 :     5-CodeGen/cpsgen/
256 :     Compilation of CPS into the abstract machine in the old code
257 :     generator. Probably the spill.sml and limit.sml files should
258 :     not be placed here. A counterpart of this in the new
259 :     code generator is the 6-NewCGen/cpscompile directory.
260 :     5-CodeGen/mips/
261 :     MIPS code generator for both little endian and big endian
262 :     5-CodeGen/rs6000/
263 :     RS6000 code generator
264 :     5-CodeGen/sparc/
265 :     SPARC code generator
266 :     5-CodeGen/x86/
267 :     X86 code generator
268 :    
269 :     6-NewCGen/alpha32/
270 :     Alpha32 new code generator
271 :     6-NewCGen/alpha32x/
272 :     Alpha32 new code generator (with special patches)
273 :     6-NewCGen/cpscompile/
274 :     Compilation of CPS into the MLRISC abstract machine code
275 :     6-NewCGen/hppa/
276 :     HPPA new code genrator
277 :    
278 :     7-MLRISC
279 :     Lal George's new code generator generator (MLRISC).
280 :    
281 :     9-MiscUtil/
282 :     Contains various kinds of utility programs
283 :     9-MiscUtil/bignums/
284 :     Bignum packages. I have no clue how stable this is.
285 :     9-MiscUtil/fixityparse
286 :     9-MiscUtil/lazycomp
287 :     Some code for implementation of the lazy evaluation primitives.
288 :     9-MiscUtil/print/
289 :     Pretty printing. Very Adhoc, needs major clean up.
290 :     9-MiscUtil/profile/
291 :     The time and the space profiler.
292 :     9-MiscUtil/util/
293 :     Important utility functions including the Inputsource (for
294 :     reading in a program), and various Hashtable and Dictionary
295 :     implementations.
296 :    
297 :     ============================================================================
298 :     A. SUMMARY:
299 :    
300 :     0. statenv : symbol -> binding
301 :     dynenv : pid -> object
302 :     symenv : pid -> flint
303 :     1. Parsing : source -> ast
304 :     2. Elaborator: ast + statenv -> absyn + pickle + newstatenv
305 :     3. FLINT : absyn -> FLINT -> CPS -> CLO
306 :     4. CodeGen : CPS -> csegments (spilling, limit check, codegen)
307 :     5. NewCGen : CPS -> csegments (via MLRISC)
308 :    
309 :     ============================================================================
310 :     B. How to recover the all-files.cm (or sources.cm) file after making
311 :     dramatic changes to the directory structure. Notice that the difference
312 :     between all-files.cm and sources.cm is just the bootstrap glue files.
313 :    
314 :     1. ls -1 [1-6,9]*/*/*.{sig,sml} | grep -i -v glue | grep -v obsol > xxx
315 :     2. Add 7-MLRISC/MLRISC.cm
316 :     3. Fix ml.lex.* and ml.grm.* files
317 :     4. Add 9-MiscUtil/util/UTIL.cm
318 :     5. Add ../ml-yacc/lib/sources.cm
319 :     6. Delete 9-MiscUtil/util/intmap.sig
320 :     9-MiscUtil/util/intmap.sml
321 :     9-MiscUtil/util/sort.sml
322 :     9-MiscUtil/util/sortedlist.sml
323 :     ============================================================================

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0