Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/compiler/README
ViewVC logotype

Annotation of /sml/trunk/src/compiler/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 93 - (view) (download)
Original Path: sml/branches/SMLNJ/src/compiler/README

1 : monnier 16 ============================================================================
2 : monnier 45 This README file describes the overall structure of the current version of
3 : monnier 69 the SML/NJ (v110.4) & FLINT/ML (v1.4) compiler source tree. Please send
4 : monnier 93 your questions, comments, and suggestions to sml-nj@research.bell-labs.com.
5 : monnier 16 ============================================================================
6 :    
7 :     NOTES
8 : monnier 45 Some informal implementation notes.
9 : monnier 16
10 :     README
11 : monnier 45 This file. It gives an overview of the overall compiler structure.
12 : monnier 16
13 :     all-files.cm
14 :     The standard Makefile for compiling the compiler. It is similar
15 :     to the idea of sources.cm used by CM.make, except that
16 : monnier 45 all-files.cm is designed for bootstrapping the compiler itself
17 :     only (i.e., CMB.make). The resulting binfiles from doing CMB.make
18 :     are placed in a single bin directory, eg. bin.x86-unix or
19 :     bin.sparc-unix. Right now, the list in all-files.cm is just the
20 : monnier 93 list in sources.cm plus all the glue files in the TopLevel/bootstrap
21 : monnier 45 directory (which are used to bootstrap the interactive compiler).
22 : monnier 16
23 :     sources.cm
24 :     This file contains the usual makefile for CM.make. It is not
25 :     used to build up the interactive compiler. But it can be
26 : monnier 45 useful for debugging purpose. For example, you can type CM.make()
27 :     to immediately build up a new, interactive visible compiler. To
28 :     access the newly built compiler, you use the
29 :     "XXXVisComp.Interact.useFile"
30 :     function to compile ML programs. Notice all the bootstrap glue
31 :     files are not in sources.cm.
32 : monnier 16
33 : monnier 93 makeml* [-full] [-elab]
34 : monnier 45 A script for building the interactive compiler. The default path
35 :     of bin files is ./bin.$arch-$os. There are two command-line options:
36 :     if you add the "-full" option, it will build a compiler whose
37 :     components are visible to the top-level interactive environment;
38 :     if you add the "-elab" option, it will re-elaborate all the ML
39 :     programs to recreate the static environments (this is useful, if
40 :     your new compiler has changed the representations of the bindings
41 :     in the static environments).
42 : monnier 16
43 :     ============================================================================
44 :     Tips:
45 : monnier 45 The current source code is organized as a two-level directory tree.
46 : monnier 93 All source files can be grep-ed by typing "grep xxx */*/*.{sig,sml}",
47 :     assuming you are looking for binding "xxx".
48 : monnier 16
49 : monnier 45 The following directories is organized based on the compilation phases.
50 :     Within each phase, the "main" sub-directory always contains the top-level
51 :     module and some important data structures for that particular compilation
52 :     phase.
53 :    
54 :     File name conventions:
55 :     *.sig --- the ML signature file
56 :     *.sml --- the ML source program (occasionally with signatures)
57 :     *.grm --- ML-Yacc file
58 :     *.lex --- ML-Lex file
59 :     *.cm --- the CM makefile
60 :    
61 : monnier 93 PervEnv
62 : monnier 45 The SML/NJ Initial Bootstrapping Library and the SML97 Basis Library.
63 :     When recompiling the compiler (i.e., via CMB.make), files in this
64 :     directory are always compiled first. More specifically, their order
65 :     of compilation is as follows:
66 :     (0) build the initial primitive static environment
67 : monnier 93 (see Semant/statenv/prim.sml)
68 : monnier 45 (1) compile assembly.sig and dummy.sml, these two files
69 :     make up the static environment for the runtime structure
70 :     (coming from the ../runtime/kernel/globals.c file). The
71 :     dynamic object from executing dummy.sml is discarded, and
72 :     replaced by a hard-wired object coming from the runtime
73 :     system.
74 :     (2) compile core.sml, which defines a bunch of useful exceptions
75 :     and utilty functions such as polymorphic equality, string
76 :     equality, delay and force primitives, etc.
77 :     (4) files in all-files.cm (must follow the exact order)
78 :     (5) files in pervasive.cm (must follow the exact order)
79 :    
80 : monnier 93 TopLevel
81 : monnier 45 This directory contains the top-level glue files for different versions
82 :     of the batch and interactive compiler. To understand, how the compiler
83 :     is organized, you can read the main directory.
84 : monnier 93 TopLevel/batch/
85 : monnier 16 Utility files for the Compilation Manager CM and CMB;
86 : monnier 93 TopLevel/bootstrap/
87 : monnier 16 How to bootstrap an interactive compiler. Details are in boot.sml and
88 :     shareglue.sml. Before building an interactive compiler, one should have
89 :     already gotten a visible compiler (for that particular architecture),
90 :     see the viscomp directory. To build a compiler for SPARC architecture,
91 : monnier 45 all we need to do is to load and run the IntSparc (in sparcglue.sml)
92 : monnier 16 structure.
93 : monnier 93 TopLevel/environ/
94 : monnier 16 A top-level environment include static environment, dynamic environment
95 :     and symbolic environment. The definitions of static environments are in
96 : monnier 93 the Semant/statenv directory, as they are mostly used by the elaboration
97 : monnier 16 and type checking.
98 : monnier 93 TopLevel/interact/
99 : monnier 16 How the top-level interactive loop is organized. The evalloop.sml contains
100 :     the details on how a ML program is compiled from source code to binary
101 :     code and then later being executed.
102 : monnier 93 TopLevel/main/
103 : monnier 45 The top-level compiler structure is shown in the compile.sig and
104 :     compile.sml. The compile.sml contains details on how ML programs
105 :     are compiled into the FLINT intermediate format, but the details
106 :     on how FLINT gets compiled into the binary code segments are not
107 :     detailed here, instead, they are described in the
108 : monnier 93 FLINT/main/flintcomp.sml file. The CODEGENERATOR signature
109 : monnier 45 in codes.sig defines the interface about this FLINT code generator.
110 :     Note: all the uses of the compilation facility goes throught the "compile"
111 :     function defined in the compile.sml. The common intermediate formats are
112 :     stated in the compbasic.sig and compbasic.sml files. The version.sml
113 :     defines the version numbers.
114 : monnier 93 TopLevel/viscomp/
115 : monnier 16 How to build the visible compiler viscomp --- this is essentially
116 : monnier 45 deciding what to export to the outside world. All the Compiler
117 :     control flags are defined in the control.sig and control.sml files
118 :     placed in this directory.
119 : monnier 16
120 : monnier 93 Parse/
121 : monnier 16 Phase 1 of the compilation process. Turning the SML source code into
122 : monnier 45 the Concrete Synatx. The definition of concrete syntax is in ast/ast.sml.
123 :     The frontend.sig and frontend.sml files in the main directory contain
124 :     the big picture on the front end.
125 : monnier 16
126 : monnier 93 Semant
127 : monnier 45 This phase does semantic analysis, more specifically, it does the
128 :     elaboration (of concrete syntax into abstract syntax) and type-checking
129 :     of the core and module languages. The semantic objects are defined in
130 :     main/bindings.sml. The result is the Abstract Syntax, defined the
131 :     main/absyn.sml file.
132 : monnier 93 Semant/basics/
133 : monnier 45 Definition of several data structures and utility functions. They are
134 :     used by the code that does semantic analysis. The env.sig and env.sml
135 :     files defines the underlying data structures used to represent the
136 :     static environment.
137 : monnier 93 Semant/elaborate/
138 : monnier 16 How to turn a piece of code in the Concrete Syntax into one in the
139 :     Abstract Syntax. The top-level organization is in the following
140 :     elabtop.sml file.
141 : monnier 93 Semant/main/absyn.sml
142 : monnier 45 Definition of Abstract Syntax
143 : monnier 93 Semant/main/bindings.sml
144 : monnier 45 Top-level view of what semantic objects we have
145 : monnier 93 Semant/main/elabtop.sml
146 : monnier 16 Top-level view of the elaboration process. Notice that each piece
147 :     of core-ML program is first translated into the Abstract Syntax,
148 :     and gets type-checked. The type-checking does change the contents
149 :     of abstract syntax, as certain type information won't be known
150 :     until type-checking is done.
151 : monnier 93 Semant/modules/
152 : monnier 16 Utility functions for elaborations of modules. The module.sig and
153 :     module.sml contains the definitions of module-level semantic objects.
154 : monnier 93 Semant/pickle/
155 : monnier 16 How to write the static environments into a file! This is important
156 :     if you want to create the *.bin file. It is also useful to infer
157 :     a unique persistant id for each compilation unit (useful to detect
158 :     the cut-off compilation dependencies).
159 : monnier 93 Semant/statenv/
160 :     The definition of Static Environment. The CM-ed version of Static
161 : monnier 45 Environment is used to avoid environment blow-up in the pickling.
162 :     The prim.sml contains the list of primitive operators and primitive
163 :     types exported in the initial static environment (i.e., PrimEnv).
164 :     During bootstrapping, PrimEnv is the first environment you have to
165 : monnier 93 set up before you can compile files in the Boot directory.
166 :     Semant/types/
167 : monnier 45 This directory contains all the data structures and utility functions
168 :     used in type-checking the Core-ML language.
169 : monnier 93 Semant/typing/
170 : monnier 16 The type-checking and type-inference code for the core-ML programs.
171 :     It is performed on Abstract Syntax and it produces Abstract Syntax
172 :     also.
173 :    
174 : monnier 93 FLINT
175 : monnier 16 This phase translates the Abstract Syntax into the intermediate
176 :     Lambda language (i.e., FLINT). During the translation, it compiles
177 : monnier 45 the Pattern Matches (see the mcomp directory). Then it does a bunch
178 :     of optimizations on FLINT; then it does representation analysis,
179 :     and it converts the FLINT code into CPS, finally it does closure
180 :     conversion.
181 : monnier 93 FLINT/clos/
182 : monnier 16 The closure conversion step. Check out Shao/Appel LFP94 paper for
183 : monnier 45 the detailed algorithm.
184 : monnier 93 FLINT/cps/
185 : monnier 45 Definition of CPS plus on how to convert the FLINT code into the
186 :     CPS code. The compilation of the Switch statement is done in this
187 :     phase.
188 : monnier 93 FLINT/cpsopt/
189 : monnier 45 The CPS-based optimizations (check Appel's "Compiling with
190 :     Continuations" book for details). Eventually, all optimizations
191 :     in this directory will be migrated into FLINT.
192 : monnier 93 FLINT/flint/
193 : monnier 45 This directory defines the FLINT language. The detailed definitions
194 :     of primitive tycs, primitive operators, kinds, type constructors,
195 : monnier 93 and types are in the FLINT/kernel directory.
196 :     FLINT/kernel/
197 : monnier 45 Definiton of the kernel data structures used in the FLINT language.
198 :     This includes: deBruijn indices, primitive tycs, primitive operators,
199 :     FLINT kinds, FLINT constructors, and FLINT types. When you write
200 :     code that manipulates the FLINT code, please restrict yourself to
201 :     use the functions defined in the LTYEXTERN interface only.
202 : monnier 93 FLINT/main/
203 : monnier 45 The flintcomp.sml describes how the FLINT code gets compiled into
204 :     the optimized and closure-converted CPS code (eventually, it should
205 :     produce optimized, closure-converted, adn type-safe FLINT code).
206 : monnier 93 FLINT/opt/
207 : monnier 45 The FLINT-based optimizations, such as contraction, type
208 :     specializations, etc.
209 : monnier 93 FLINT/plambda/
210 : monnier 45 An older version of the Lambda language (not in the A-Normal form)
211 : monnier 93 FLINT/reps/
212 : monnier 69 Code for performing the representation analysis on FLINT
213 : monnier 93 FLINT/trans/
214 : monnier 45 Translation of Abstract Syntax into the PLambda code, then to the FLINT
215 :     code. All semantic objects used in the elaboration are translated into
216 :     the FLINT types as well. The translation phase also does match
217 :     compilation. The translation from PLambda to FLINT does the (partial)
218 :     type-based argument flattening.
219 : monnier 16
220 : monnier 93 CodeGen/alpha32/
221 : monnier 45 Alpha32 new code generator
222 : monnier 93 CodeGen/alpha32x/
223 : monnier 45 Alpha32 new code generator (with special patches)
224 : monnier 93 CodeGen/cpscompile/
225 : monnier 45 Compilation of CPS into the MLRISC abstract machine code
226 : monnier 93 CodeGen/hppa/
227 : monnier 45 HPPA new code genrator
228 : monnier 93 CodeGen/main/
229 : monnier 45 The big picture of the codegenerator; including important
230 :     files on machine specifications and runtime tagging schemes.
231 :    
232 : monnier 93 OldCGen
233 : monnier 16 The old code generator. May eventually go away after Lal's new
234 : monnier 45 code generator becomes stable on all platforms. Each code generator
235 :     should produce a structure of signature CODEGENERATOR (defined in
236 : monnier 93 the Toplevel/main/codes.sig file).
237 :     OldCGen/coder/
238 : monnier 16 This directory contains the machine-independent parts of the
239 :     old code generator. Some important signatures are also here.
240 : monnier 93 OldCGen/cpsgen/
241 : monnier 45 Compilation of CPS into the abstract machine in the old code
242 :     generator. Probably the spill.sml and limit.sml files should
243 :     not be placed here. A counterpart of this in the new
244 : monnier 93 code generator is the NewCGen/cpscompile directory.
245 :     OldCGen/mips/
246 : monnier 16 MIPS code generator for both little endian and big endian
247 : monnier 93 OldCGen/rs6000/
248 : monnier 16 RS6000 code generator
249 : monnier 93 OldCGen/sparc/
250 : monnier 16 SPARC code generator
251 : monnier 93 OldCGen/x86/
252 : monnier 16 X86 code generator
253 :    
254 : monnier 93 MLRISC
255 :     Lal George's new MLRISC based code generators (MLRISC).
256 : monnier 16
257 : monnier 93 MiscUtil/
258 : monnier 16 Contains various kinds of utility programs
259 : monnier 93 MiscUtil/bignums/
260 : monnier 16 Bignum packages. I have no clue how stable this is.
261 : monnier 93 MiscUtil/fixityparse
262 :     MiscUtil/lazycomp
263 : monnier 45 Some code for implementation of the lazy evaluation primitives.
264 : monnier 93 MiscUtil/print/
265 : monnier 16 Pretty printing. Very Adhoc, needs major clean up.
266 : monnier 93 MiscUtil/profile/
267 : monnier 16 The time and the space profiler.
268 : monnier 93 MiscUtil/util/
269 : monnier 16 Important utility functions including the Inputsource (for
270 :     reading in a program), and various Hashtable and Dictionary
271 :     implementations.
272 : monnier 45
273 : monnier 16 ============================================================================
274 : monnier 93 A. SUMMARY of PHASES:
275 : monnier 45
276 : monnier 93 0. statenv : symbol -> binding
277 :     dynenv : pid -> object
278 :     symenv : pid -> flint
279 :     1. Parsing : source -> ast
280 :     2. Elaborator: ast + statenv -> absyn + pickle + newstatenv
281 :     3. FLINT : absyn -> FLINT -> CPS -> CLO
282 :     4. CodeGen : CPS -> csegments (via MLRISC)
283 :     5. OldCGen : CPS -> csegments (spilling, limit check, codegen)
284 : monnier 45
285 :     ============================================================================
286 : monnier 93 B. CREATING all-files.cm
287 :    
288 :     How to recover the all-files.cm (or sources.cm) file after making
289 : monnier 45 dramatic changes to the directory structure. Notice that the difference
290 :     between all-files.cm and sources.cm is just the bootstrap glue files.
291 :    
292 : monnier 93 1. ls -1 [TopLevel,Parse,Semant,FLINT,CodeGen,OldCGen,MiscUtil]*/*/*.{sig,sml} \
293 :     | grep -i -v glue | grep -v obsol > xxx
294 :     2. Add ../MLRISC/MLRISC.cm
295 :     3. remove ml.lex.* and ml.grm.* files
296 :     4. Add ../comp-lib/UTIL.cm
297 : monnier 45 5. Add ../ml-yacc/lib/sources.cm
298 :     ============================================================================

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0