Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/branches/SMLNJ/src/compiler/README
ViewVC logotype

Annotation of /sml/branches/SMLNJ/src/compiler/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 113 - (view) (download)

1 : monnier 16 ============================================================================
2 : monnier 45 This README file describes the overall structure of the current version of
3 : monnier 69 the SML/NJ (v110.4) & FLINT/ML (v1.4) compiler source tree. Please send
4 : monnier 93 your questions, comments, and suggestions to sml-nj@research.bell-labs.com.
5 : monnier 16 ============================================================================
6 :    
7 :     NOTES
8 : monnier 45 Some informal implementation notes.
9 : monnier 16
10 :     README
11 : monnier 45 This file. It gives an overview of the overall compiler structure.
12 : monnier 16
13 :     all-files.cm
14 :     The standard Makefile for compiling the compiler. It is similar
15 :     to the idea of sources.cm used by CM.make, except that
16 : monnier 45 all-files.cm is designed for bootstrapping the compiler itself
17 :     only (i.e., CMB.make). The resulting binfiles from doing CMB.make
18 :     are placed in a single bin directory, eg. bin.x86-unix or
19 : monnier 113 bin.sparc-unix.
20 : monnier 16
21 : monnier 113 CM's preprocessor directives are used in such a way that the compiler,
22 :     the compilation manager, and the batch compilation manager
23 :     for the current architecture will be built.
24 :    
25 :     In addition to that, it is possible to also build additional binfiles
26 :     that are useful for "retargeting" the compiler. This is optional (and
27 :     turned off by default), because it is not strictly necessary. If the
28 :     binfiles for the cross-compiler are not present at the time of
29 :     CMB.retarget, then CMB.retarget will create them.
30 :    
31 : monnier 16 sources.cm
32 : monnier 113 This file is and alias for viscomp-lib.cm and makes it possible to
33 :     use CMB.CM.make (); for experimenting with the compiler.
34 :     (Don't use CM.make because it has a different idea of where the binfiles
35 :     live.)
36 :     This can be useful for debugging purpose. You can type CMB.CM.make()
37 : monnier 45 to immediately build up a new, interactive visible compiler. To
38 :     access the newly built compiler, you use the
39 :     "XXXVisComp.Interact.useFile"
40 : monnier 113 function to compile ML programs. Notice none of the bootstrap glue
41 :     is in sources.cm (viscomp-lib.cm).
42 : monnier 16
43 : monnier 113 viscomp-lib.cm
44 :     This file specifies the "library" of visible compilers for various
45 :     supported architectures.
46 :    
47 :     compman-lib.cm
48 :     This file specifies the "library" of compilation managers for various
49 :     supported architectures.
50 :    
51 :     makeml* [-full] [-rebuild dir]
52 : monnier 45 A script for building the interactive compiler. The default path
53 :     of bin files is ./bin.$arch-$os. There are two command-line options:
54 :     if you add the "-full" option, it will build a compiler whose
55 : monnier 113 components are visible to the top-level interactive environment.
56 :    
57 :     If you add the "-rebuild dir" option, it will recompile the compiler,
58 :     using "dir" as the new binfile directory. It then proceeds by loading
59 :     the static and symbolic environments from the newly created batch of
60 :     binfiles. (This supercedes the -elab option and is useful, if
61 : monnier 45 your new compiler has changed the representations of the bindings
62 : monnier 113 in the environments. Other than with -elab, there will be a fresh set
63 :     of usable binfiles ready after such a "rebuild".)
64 : monnier 16
65 : monnier 113 There are some environment variables that are sensed during bootstrap.
66 :     They determine the defaults for various parameters used by the compilation
67 :     manager. Internal "fallback" defaults are used for variables that are
68 :     not defined at the time of bootstrap.
69 :     CM_YACC_DEFAULT -- shell command to run ml-yacc
70 :     CM_LEX_DEFAULT -- shell command to run ml-lex
71 :     CM_BURG_DEFAULT -- shell command to run ml-burg
72 :     CM_RCSCO_DEFAULT -- shell command to checkout a file under RCS
73 :     CM_PATH_DEFAULT -- ':'-separated list of directories that are on
74 :     CM's search path
75 :     ...
76 :    
77 :     Retarget/<arch>-<os>.{cm,sml}
78 :     WARNING!
79 :     After you do a 'CMB.retarget { cpu = "<arch>", os = "<os>" };'
80 :     you can access the "CMB" structure for the newly-loaded cross compiler
81 :     as <Arch><Os>CMB. The original structure CMB will *not* be redefined!
82 :     For further details on retargeting see Retarget/README.
83 :    
84 : monnier 16 ============================================================================
85 :     Tips:
86 : monnier 45 The current source code is organized as a two-level directory tree.
87 : monnier 93 All source files can be grep-ed by typing "grep xxx */*/*.{sig,sml}",
88 :     assuming you are looking for binding "xxx".
89 : monnier 16
90 : monnier 45 The following directories is organized based on the compilation phases.
91 :     Within each phase, the "main" sub-directory always contains the top-level
92 :     module and some important data structures for that particular compilation
93 :     phase.
94 :    
95 :     File name conventions:
96 :     *.sig --- the ML signature file
97 :     *.sml --- the ML source program (occasionally with signatures)
98 :     *.grm --- ML-Yacc file
99 :     *.lex --- ML-Lex file
100 :     *.cm --- the CM makefile
101 :    
102 : monnier 93 PervEnv
103 : monnier 45 The SML/NJ Initial Bootstrapping Library and the SML97 Basis Library.
104 :     When recompiling the compiler (i.e., via CMB.make), files in this
105 :     directory are always compiled first. More specifically, their order
106 :     of compilation is as follows:
107 :     (0) build the initial primitive static environment
108 : monnier 93 (see Semant/statenv/prim.sml)
109 : monnier 45 (1) compile assembly.sig and dummy.sml, these two files
110 :     make up the static environment for the runtime structure
111 :     (coming from the ../runtime/kernel/globals.c file). The
112 :     dynamic object from executing dummy.sml is discarded, and
113 :     replaced by a hard-wired object coming from the runtime
114 :     system.
115 :     (2) compile core.sml, which defines a bunch of useful exceptions
116 :     and utilty functions such as polymorphic equality, string
117 :     equality, delay and force primitives, etc.
118 :     (4) files in all-files.cm (must follow the exact order)
119 :     (5) files in pervasive.cm (must follow the exact order)
120 :    
121 : monnier 93 TopLevel
122 : monnier 45 This directory contains the top-level glue files for different versions
123 :     of the batch and interactive compiler. To understand, how the compiler
124 :     is organized, you can read the main directory.
125 : monnier 93 TopLevel/batch/
126 : monnier 16 Utility files for the Compilation Manager CM and CMB;
127 : monnier 93 TopLevel/bootstrap/
128 : monnier 16 How to bootstrap an interactive compiler. Details are in boot.sml and
129 :     shareglue.sml. Before building an interactive compiler, one should have
130 :     already gotten a visible compiler (for that particular architecture),
131 :     see the viscomp directory. To build a compiler for SPARC architecture,
132 : monnier 45 all we need to do is to load and run the IntSparc (in sparcglue.sml)
133 : monnier 16 structure.
134 : monnier 93 TopLevel/environ/
135 : monnier 16 A top-level environment include static environment, dynamic environment
136 :     and symbolic environment. The definitions of static environments are in
137 : monnier 93 the Semant/statenv directory, as they are mostly used by the elaboration
138 : monnier 16 and type checking.
139 : monnier 93 TopLevel/interact/
140 : monnier 16 How the top-level interactive loop is organized. The evalloop.sml contains
141 :     the details on how a ML program is compiled from source code to binary
142 :     code and then later being executed.
143 : monnier 93 TopLevel/main/
144 : monnier 45 The top-level compiler structure is shown in the compile.sig and
145 :     compile.sml. The compile.sml contains details on how ML programs
146 :     are compiled into the FLINT intermediate format, but the details
147 :     on how FLINT gets compiled into the binary code segments are not
148 :     detailed here, instead, they are described in the
149 : monnier 93 FLINT/main/flintcomp.sml file. The CODEGENERATOR signature
150 : monnier 45 in codes.sig defines the interface about this FLINT code generator.
151 :     Note: all the uses of the compilation facility goes throught the "compile"
152 :     function defined in the compile.sml. The common intermediate formats are
153 :     stated in the compbasic.sig and compbasic.sml files. The version.sml
154 :     defines the version numbers.
155 : monnier 93 TopLevel/viscomp/
156 : monnier 16 How to build the visible compiler viscomp --- this is essentially
157 : monnier 45 deciding what to export to the outside world. All the Compiler
158 :     control flags are defined in the control.sig and control.sml files
159 :     placed in this directory.
160 : monnier 16
161 : monnier 93 Parse/
162 : monnier 16 Phase 1 of the compilation process. Turning the SML source code into
163 : monnier 45 the Concrete Synatx. The definition of concrete syntax is in ast/ast.sml.
164 :     The frontend.sig and frontend.sml files in the main directory contain
165 :     the big picture on the front end.
166 : monnier 16
167 : monnier 93 Semant
168 : monnier 45 This phase does semantic analysis, more specifically, it does the
169 :     elaboration (of concrete syntax into abstract syntax) and type-checking
170 :     of the core and module languages. The semantic objects are defined in
171 :     main/bindings.sml. The result is the Abstract Syntax, defined the
172 :     main/absyn.sml file.
173 : monnier 93 Semant/basics/
174 : monnier 45 Definition of several data structures and utility functions. They are
175 :     used by the code that does semantic analysis. The env.sig and env.sml
176 :     files defines the underlying data structures used to represent the
177 :     static environment.
178 : monnier 93 Semant/elaborate/
179 : monnier 16 How to turn a piece of code in the Concrete Syntax into one in the
180 :     Abstract Syntax. The top-level organization is in the following
181 :     elabtop.sml file.
182 : monnier 93 Semant/main/absyn.sml
183 : monnier 45 Definition of Abstract Syntax
184 : monnier 93 Semant/main/bindings.sml
185 : monnier 45 Top-level view of what semantic objects we have
186 : monnier 93 Semant/main/elabtop.sml
187 : monnier 16 Top-level view of the elaboration process. Notice that each piece
188 :     of core-ML program is first translated into the Abstract Syntax,
189 :     and gets type-checked. The type-checking does change the contents
190 :     of abstract syntax, as certain type information won't be known
191 :     until type-checking is done.
192 : monnier 93 Semant/modules/
193 : monnier 16 Utility functions for elaborations of modules. The module.sig and
194 :     module.sml contains the definitions of module-level semantic objects.
195 : monnier 93 Semant/pickle/
196 : monnier 16 How to write the static environments into a file! This is important
197 :     if you want to create the *.bin file. It is also useful to infer
198 :     a unique persistant id for each compilation unit (useful to detect
199 :     the cut-off compilation dependencies).
200 : monnier 93 Semant/statenv/
201 :     The definition of Static Environment. The CM-ed version of Static
202 : monnier 45 Environment is used to avoid environment blow-up in the pickling.
203 :     The prim.sml contains the list of primitive operators and primitive
204 :     types exported in the initial static environment (i.e., PrimEnv).
205 :     During bootstrapping, PrimEnv is the first environment you have to
206 : monnier 93 set up before you can compile files in the Boot directory.
207 :     Semant/types/
208 : monnier 45 This directory contains all the data structures and utility functions
209 :     used in type-checking the Core-ML language.
210 : monnier 93 Semant/typing/
211 : monnier 16 The type-checking and type-inference code for the core-ML programs.
212 :     It is performed on Abstract Syntax and it produces Abstract Syntax
213 :     also.
214 :    
215 : monnier 93 FLINT
216 : monnier 16 This phase translates the Abstract Syntax into the intermediate
217 :     Lambda language (i.e., FLINT). During the translation, it compiles
218 : monnier 45 the Pattern Matches (see the mcomp directory). Then it does a bunch
219 :     of optimizations on FLINT; then it does representation analysis,
220 :     and it converts the FLINT code into CPS, finally it does closure
221 :     conversion.
222 : monnier 93 FLINT/clos/
223 : monnier 16 The closure conversion step. Check out Shao/Appel LFP94 paper for
224 : monnier 45 the detailed algorithm.
225 : monnier 93 FLINT/cps/
226 : monnier 45 Definition of CPS plus on how to convert the FLINT code into the
227 :     CPS code. The compilation of the Switch statement is done in this
228 :     phase.
229 : monnier 93 FLINT/cpsopt/
230 : monnier 45 The CPS-based optimizations (check Appel's "Compiling with
231 :     Continuations" book for details). Eventually, all optimizations
232 :     in this directory will be migrated into FLINT.
233 : monnier 93 FLINT/flint/
234 : monnier 45 This directory defines the FLINT language. The detailed definitions
235 :     of primitive tycs, primitive operators, kinds, type constructors,
236 : monnier 93 and types are in the FLINT/kernel directory.
237 :     FLINT/kernel/
238 : monnier 45 Definiton of the kernel data structures used in the FLINT language.
239 :     This includes: deBruijn indices, primitive tycs, primitive operators,
240 :     FLINT kinds, FLINT constructors, and FLINT types. When you write
241 :     code that manipulates the FLINT code, please restrict yourself to
242 :     use the functions defined in the LTYEXTERN interface only.
243 : monnier 93 FLINT/main/
244 : monnier 45 The flintcomp.sml describes how the FLINT code gets compiled into
245 :     the optimized and closure-converted CPS code (eventually, it should
246 :     produce optimized, closure-converted, adn type-safe FLINT code).
247 : monnier 93 FLINT/opt/
248 : monnier 45 The FLINT-based optimizations, such as contraction, type
249 :     specializations, etc.
250 : monnier 93 FLINT/plambda/
251 : monnier 45 An older version of the Lambda language (not in the A-Normal form)
252 : monnier 93 FLINT/reps/
253 : monnier 69 Code for performing the representation analysis on FLINT
254 : monnier 93 FLINT/trans/
255 : monnier 45 Translation of Abstract Syntax into the PLambda code, then to the FLINT
256 :     code. All semantic objects used in the elaboration are translated into
257 :     the FLINT types as well. The translation phase also does match
258 :     compilation. The translation from PLambda to FLINT does the (partial)
259 :     type-based argument flattening.
260 : monnier 16
261 : monnier 93 CodeGen/alpha32/
262 : monnier 45 Alpha32 new code generator
263 : monnier 93 CodeGen/alpha32x/
264 : monnier 45 Alpha32 new code generator (with special patches)
265 : monnier 93 CodeGen/cpscompile/
266 : monnier 45 Compilation of CPS into the MLRISC abstract machine code
267 : monnier 93 CodeGen/hppa/
268 : monnier 45 HPPA new code genrator
269 : monnier 93 CodeGen/main/
270 : monnier 45 The big picture of the codegenerator; including important
271 :     files on machine specifications and runtime tagging schemes.
272 :    
273 : monnier 93 OldCGen
274 : monnier 16 The old code generator. May eventually go away after Lal's new
275 : monnier 45 code generator becomes stable on all platforms. Each code generator
276 :     should produce a structure of signature CODEGENERATOR (defined in
277 : monnier 93 the Toplevel/main/codes.sig file).
278 :     OldCGen/coder/
279 : monnier 16 This directory contains the machine-independent parts of the
280 :     old code generator. Some important signatures are also here.
281 : monnier 93 OldCGen/cpsgen/
282 : monnier 45 Compilation of CPS into the abstract machine in the old code
283 :     generator. Probably the spill.sml and limit.sml files should
284 :     not be placed here. A counterpart of this in the new
285 : monnier 93 code generator is the NewCGen/cpscompile directory.
286 :     OldCGen/mips/
287 : monnier 16 MIPS code generator for both little endian and big endian
288 : monnier 93 OldCGen/rs6000/
289 : monnier 16 RS6000 code generator
290 : monnier 93 OldCGen/sparc/
291 : monnier 16 SPARC code generator
292 : monnier 93 OldCGen/x86/
293 : monnier 16 X86 code generator
294 :    
295 : monnier 93 MLRISC
296 :     Lal George's new MLRISC based code generators (MLRISC).
297 : monnier 16
298 : monnier 93 MiscUtil/
299 : monnier 16 Contains various kinds of utility programs
300 : monnier 93 MiscUtil/bignums/
301 : monnier 16 Bignum packages. I have no clue how stable this is.
302 : monnier 93 MiscUtil/fixityparse
303 :     MiscUtil/lazycomp
304 : monnier 45 Some code for implementation of the lazy evaluation primitives.
305 : monnier 93 MiscUtil/print/
306 : monnier 16 Pretty printing. Very Adhoc, needs major clean up.
307 : monnier 93 MiscUtil/profile/
308 : monnier 16 The time and the space profiler.
309 : monnier 93 MiscUtil/util/
310 : monnier 16 Important utility functions including the Inputsource (for
311 :     reading in a program), and various Hashtable and Dictionary
312 :     implementations.
313 : monnier 45
314 : monnier 16 ============================================================================
315 : monnier 93 A. SUMMARY of PHASES:
316 : monnier 45
317 : monnier 93 0. statenv : symbol -> binding
318 :     dynenv : pid -> object
319 :     symenv : pid -> flint
320 :     1. Parsing : source -> ast
321 :     2. Elaborator: ast + statenv -> absyn + pickle + newstatenv
322 :     3. FLINT : absyn -> FLINT -> CPS -> CLO
323 :     4. CodeGen : CPS -> csegments (via MLRISC)
324 :     5. OldCGen : CPS -> csegments (spilling, limit check, codegen)
325 : monnier 45
326 :     ============================================================================
327 : monnier 93 B. CREATING all-files.cm
328 :    
329 :     How to recover the all-files.cm (or sources.cm) file after making
330 : monnier 45 dramatic changes to the directory structure. Notice that the difference
331 :     between all-files.cm and sources.cm is just the bootstrap glue files.
332 :    
333 : monnier 93 1. ls -1 [TopLevel,Parse,Semant,FLINT,CodeGen,OldCGen,MiscUtil]*/*/*.{sig,sml} \
334 :     | grep -i -v glue | grep -v obsol > xxx
335 :     2. Add ../MLRISC/MLRISC.cm
336 :     3. remove ml.lex.* and ml.grm.* files
337 :     4. Add ../comp-lib/UTIL.cm
338 : monnier 45 5. Add ../ml-yacc/lib/sources.cm
339 :     ============================================================================

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0