Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/compiler/README
ViewVC logotype

Annotation of /sml/trunk/src/compiler/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 144 - (view) (download)

1 : monnier 16 ============================================================================
2 : monnier 45 This README file describes the overall structure of the current version of
3 : monnier 69 the SML/NJ (v110.4) & FLINT/ML (v1.4) compiler source tree. Please send
4 : monnier 93 your questions, comments, and suggestions to sml-nj@research.bell-labs.com.
5 : monnier 16 ============================================================================
6 :    
7 :     NOTES
8 : monnier 45 Some informal implementation notes.
9 : monnier 16
10 :     README
11 : monnier 45 This file. It gives an overview of the overall compiler structure.
12 : monnier 16
13 :     all-files.cm
14 :     The standard Makefile for compiling the compiler. It is similar
15 :     to the idea of sources.cm used by CM.make, except that
16 : monnier 45 all-files.cm is designed for bootstrapping the compiler itself
17 :     only (i.e., CMB.make). The resulting binfiles from doing CMB.make
18 :     are placed in a single bin directory, eg. bin.x86-unix or
19 : monnier 113 bin.sparc-unix.
20 : monnier 16
21 : monnier 113 CM's preprocessor directives are used in such a way that the compiler,
22 :     the compilation manager, and the batch compilation manager
23 :     for the current architecture will be built.
24 :    
25 :     In addition to that, it is possible to also build additional binfiles
26 :     that are useful for "retargeting" the compiler. This is optional (and
27 :     turned off by default), because it is not strictly necessary. If the
28 :     binfiles for the cross-compiler are not present at the time of
29 :     CMB.retarget, then CMB.retarget will create them.
30 :    
31 : monnier 16 sources.cm
32 : monnier 143 This file is an alias for viscomp-lib.cm and makes it possible to
33 : monnier 113 use CMB.CM.make (); for experimenting with the compiler.
34 :     (Don't use CM.make because it has a different idea of where the binfiles
35 :     live.)
36 :     This can be useful for debugging purpose. You can type CMB.CM.make()
37 : monnier 45 to immediately build up a new, interactive visible compiler. To
38 :     access the newly built compiler, you use the
39 :     "XXXVisComp.Interact.useFile"
40 : monnier 113 function to compile ML programs. Notice none of the bootstrap glue
41 :     is in sources.cm (viscomp-lib.cm).
42 : monnier 16
43 : monnier 113 viscomp-lib.cm
44 :     This file specifies the "library" of visible compilers for various
45 :     supported architectures.
46 :    
47 :     makeml* [-full] [-rebuild dir]
48 : monnier 45 A script for building the interactive compiler. The default path
49 :     of bin files is ./bin.$arch-$os. There are two command-line options:
50 :     if you add the "-full" option, it will build a compiler whose
51 : monnier 113 components are visible to the top-level interactive environment.
52 :    
53 :     If you add the "-rebuild dir" option, it will recompile the compiler,
54 :     using "dir" as the new binfile directory. It then proceeds by loading
55 :     the static and symbolic environments from the newly created batch of
56 : monnier 143 binfiles. (This supercedes the -elab option and is useful if
57 : monnier 45 your new compiler has changed the representations of the bindings
58 : monnier 113 in the environments. Other than with -elab, there will be a fresh set
59 :     of usable binfiles ready after such a "rebuild".)
60 : monnier 16
61 : monnier 113 There are some environment variables that are sensed during bootstrap.
62 :     They determine the defaults for various parameters used by the compilation
63 :     manager. Internal "fallback" defaults are used for variables that are
64 :     not defined at the time of bootstrap.
65 :     CM_YACC_DEFAULT -- shell command to run ml-yacc
66 :     CM_LEX_DEFAULT -- shell command to run ml-lex
67 :     CM_BURG_DEFAULT -- shell command to run ml-burg
68 :     CM_RCSCO_DEFAULT -- shell command to checkout a file under RCS
69 :     CM_PATH_DEFAULT -- ':'-separated list of directories that are on
70 :     CM's search path
71 :     ...
72 :    
73 :     Retarget/<arch>-<os>.{cm,sml}
74 :     WARNING!
75 :     After you do a 'CMB.retarget { cpu = "<arch>", os = "<os>" };'
76 :     you can access the "CMB" structure for the newly-loaded cross compiler
77 :     as <Arch><Os>CMB. The original structure CMB will *not* be redefined!
78 :     For further details on retargeting see Retarget/README.
79 :    
80 : monnier 16 ============================================================================
81 :     Tips:
82 : monnier 45 The current source code is organized as a two-level directory tree.
83 : monnier 143 All source files (except those in Retarget/* wich are not part of the
84 :     ordinary compiler) can be grep-ed by typing "grep xxx */*/*.{sig,sml}",
85 : monnier 93 assuming you are looking for binding "xxx".
86 : monnier 16
87 : monnier 143 The following directories are organized based on the compilation phases.
88 : monnier 45 Within each phase, the "main" sub-directory always contains the top-level
89 :     module and some important data structures for that particular compilation
90 :     phase.
91 :    
92 :     File name conventions:
93 :     *.sig --- the ML signature file
94 :     *.sml --- the ML source program (occasionally with signatures)
95 :     *.grm --- ML-Yacc file
96 :     *.lex --- ML-Lex file
97 :     *.cm --- the CM makefile
98 :    
99 : monnier 93 PervEnv
100 : monnier 45 The SML/NJ Initial Bootstrapping Library and the SML97 Basis Library.
101 :     When recompiling the compiler (i.e., via CMB.make), files in this
102 :     directory are always compiled first. More specifically, their order
103 :     of compilation is as follows:
104 :     (0) build the initial primitive static environment
105 : monnier 93 (see Semant/statenv/prim.sml)
106 : monnier 45 (1) compile assembly.sig and dummy.sml, these two files
107 :     make up the static environment for the runtime structure
108 :     (coming from the ../runtime/kernel/globals.c file). The
109 :     dynamic object from executing dummy.sml is discarded, and
110 :     replaced by a hard-wired object coming from the runtime
111 :     system.
112 :     (2) compile core.sml, which defines a bunch of useful exceptions
113 :     and utilty functions such as polymorphic equality, string
114 :     equality, delay and force primitives, etc.
115 :     (4) files in all-files.cm (must follow the exact order)
116 :     (5) files in pervasive.cm (must follow the exact order)
117 :    
118 : monnier 93 TopLevel
119 : monnier 45 This directory contains the top-level glue files for different versions
120 :     of the batch and interactive compiler. To understand, how the compiler
121 :     is organized, you can read the main directory.
122 : monnier 93 TopLevel/batch/
123 : monnier 16 Utility files for the Compilation Manager CM and CMB;
124 : monnier 93 TopLevel/bootstrap/
125 : monnier 16 How to bootstrap an interactive compiler. Details are in boot.sml and
126 :     shareglue.sml. Before building an interactive compiler, one should have
127 :     already gotten a visible compiler (for that particular architecture),
128 :     see the viscomp directory. To build a compiler for SPARC architecture,
129 : monnier 45 all we need to do is to load and run the IntSparc (in sparcglue.sml)
130 : monnier 16 structure.
131 : monnier 93 TopLevel/environ/
132 : monnier 16 A top-level environment include static environment, dynamic environment
133 :     and symbolic environment. The definitions of static environments are in
134 : monnier 93 the Semant/statenv directory, as they are mostly used by the elaboration
135 : monnier 16 and type checking.
136 : monnier 93 TopLevel/interact/
137 : monnier 16 How the top-level interactive loop is organized. The evalloop.sml contains
138 :     the details on how a ML program is compiled from source code to binary
139 :     code and then later being executed.
140 : monnier 93 TopLevel/main/
141 : monnier 45 The top-level compiler structure is shown in the compile.sig and
142 :     compile.sml. The compile.sml contains details on how ML programs
143 :     are compiled into the FLINT intermediate format, but the details
144 :     on how FLINT gets compiled into the binary code segments are not
145 :     detailed here, instead, they are described in the
146 : monnier 93 FLINT/main/flintcomp.sml file. The CODEGENERATOR signature
147 : monnier 45 in codes.sig defines the interface about this FLINT code generator.
148 :     Note: all the uses of the compilation facility goes throught the "compile"
149 :     function defined in the compile.sml. The common intermediate formats are
150 :     stated in the compbasic.sig and compbasic.sml files. The version.sml
151 :     defines the version numbers.
152 : monnier 93 TopLevel/viscomp/
153 : monnier 16 How to build the visible compiler viscomp --- this is essentially
154 : monnier 45 deciding what to export to the outside world. All the Compiler
155 :     control flags are defined in the control.sig and control.sml files
156 :     placed in this directory.
157 : monnier 16
158 : monnier 93 Parse/
159 : monnier 16 Phase 1 of the compilation process. Turning the SML source code into
160 : monnier 45 the Concrete Synatx. The definition of concrete syntax is in ast/ast.sml.
161 :     The frontend.sig and frontend.sml files in the main directory contain
162 :     the big picture on the front end.
163 : monnier 16
164 : monnier 93 Semant
165 : monnier 45 This phase does semantic analysis, more specifically, it does the
166 :     elaboration (of concrete syntax into abstract syntax) and type-checking
167 :     of the core and module languages. The semantic objects are defined in
168 :     main/bindings.sml. The result is the Abstract Syntax, defined the
169 :     main/absyn.sml file.
170 : monnier 93 Semant/basics/
171 : monnier 45 Definition of several data structures and utility functions. They are
172 :     used by the code that does semantic analysis. The env.sig and env.sml
173 :     files defines the underlying data structures used to represent the
174 :     static environment.
175 : monnier 93 Semant/elaborate/
176 : monnier 16 How to turn a piece of code in the Concrete Syntax into one in the
177 :     Abstract Syntax. The top-level organization is in the following
178 :     elabtop.sml file.
179 : monnier 93 Semant/main/absyn.sml
180 : monnier 45 Definition of Abstract Syntax
181 : monnier 93 Semant/main/bindings.sml
182 : monnier 45 Top-level view of what semantic objects we have
183 : monnier 93 Semant/main/elabtop.sml
184 : monnier 16 Top-level view of the elaboration process. Notice that each piece
185 :     of core-ML program is first translated into the Abstract Syntax,
186 :     and gets type-checked. The type-checking does change the contents
187 :     of abstract syntax, as certain type information won't be known
188 :     until type-checking is done.
189 : monnier 93 Semant/modules/
190 : monnier 16 Utility functions for elaborations of modules. The module.sig and
191 :     module.sml contains the definitions of module-level semantic objects.
192 : monnier 93 Semant/pickle/
193 : monnier 16 How to write the static environments into a file! This is important
194 :     if you want to create the *.bin file. It is also useful to infer
195 :     a unique persistant id for each compilation unit (useful to detect
196 :     the cut-off compilation dependencies).
197 : monnier 93 Semant/statenv/
198 :     The definition of Static Environment. The CM-ed version of Static
199 : monnier 45 Environment is used to avoid environment blow-up in the pickling.
200 :     The prim.sml contains the list of primitive operators and primitive
201 :     types exported in the initial static environment (i.e., PrimEnv).
202 :     During bootstrapping, PrimEnv is the first environment you have to
203 : monnier 93 set up before you can compile files in the Boot directory.
204 :     Semant/types/
205 : monnier 45 This directory contains all the data structures and utility functions
206 :     used in type-checking the Core-ML language.
207 : monnier 93 Semant/typing/
208 : monnier 16 The type-checking and type-inference code for the core-ML programs.
209 :     It is performed on Abstract Syntax and it produces Abstract Syntax
210 :     also.
211 :    
212 : monnier 93 FLINT
213 : monnier 16 This phase translates the Abstract Syntax into the intermediate
214 :     Lambda language (i.e., FLINT). During the translation, it compiles
215 : monnier 45 the Pattern Matches (see the mcomp directory). Then it does a bunch
216 :     of optimizations on FLINT; then it does representation analysis,
217 :     and it converts the FLINT code into CPS, finally it does closure
218 :     conversion.
219 : monnier 93 FLINT/clos/
220 : monnier 16 The closure conversion step. Check out Shao/Appel LFP94 paper for
221 : monnier 45 the detailed algorithm.
222 : monnier 93 FLINT/cps/
223 : monnier 45 Definition of CPS plus on how to convert the FLINT code into the
224 :     CPS code. The compilation of the Switch statement is done in this
225 :     phase.
226 : monnier 93 FLINT/cpsopt/
227 : monnier 45 The CPS-based optimizations (check Appel's "Compiling with
228 :     Continuations" book for details). Eventually, all optimizations
229 :     in this directory will be migrated into FLINT.
230 : monnier 93 FLINT/flint/
231 : monnier 45 This directory defines the FLINT language. The detailed definitions
232 :     of primitive tycs, primitive operators, kinds, type constructors,
233 : monnier 93 and types are in the FLINT/kernel directory.
234 :     FLINT/kernel/
235 : monnier 45 Definiton of the kernel data structures used in the FLINT language.
236 :     This includes: deBruijn indices, primitive tycs, primitive operators,
237 :     FLINT kinds, FLINT constructors, and FLINT types. When you write
238 :     code that manipulates the FLINT code, please restrict yourself to
239 :     use the functions defined in the LTYEXTERN interface only.
240 : monnier 93 FLINT/main/
241 : monnier 45 The flintcomp.sml describes how the FLINT code gets compiled into
242 :     the optimized and closure-converted CPS code (eventually, it should
243 :     produce optimized, closure-converted, adn type-safe FLINT code).
244 : monnier 93 FLINT/opt/
245 : monnier 45 The FLINT-based optimizations, such as contraction, type
246 :     specializations, etc.
247 : monnier 93 FLINT/plambda/
248 : monnier 45 An older version of the Lambda language (not in the A-Normal form)
249 : monnier 93 FLINT/reps/
250 : monnier 69 Code for performing the representation analysis on FLINT
251 : monnier 93 FLINT/trans/
252 : monnier 45 Translation of Abstract Syntax into the PLambda code, then to the FLINT
253 :     code. All semantic objects used in the elaboration are translated into
254 :     the FLINT types as well. The translation phase also does match
255 :     compilation. The translation from PLambda to FLINT does the (partial)
256 :     type-based argument flattening.
257 : monnier 16
258 : monnier 93 CodeGen/alpha32/
259 : monnier 45 Alpha32 new code generator
260 : monnier 93 CodeGen/alpha32x/
261 : monnier 45 Alpha32 new code generator (with special patches)
262 : monnier 93 CodeGen/cpscompile/
263 : monnier 45 Compilation of CPS into the MLRISC abstract machine code
264 : monnier 93 CodeGen/hppa/
265 : monnier 45 HPPA new code genrator
266 : monnier 93 CodeGen/main/
267 : monnier 45 The big picture of the codegenerator; including important
268 :     files on machine specifications and runtime tagging schemes.
269 :    
270 : monnier 93 OldCGen
271 : monnier 16 The old code generator. May eventually go away after Lal's new
272 : monnier 45 code generator becomes stable on all platforms. Each code generator
273 :     should produce a structure of signature CODEGENERATOR (defined in
274 : monnier 93 the Toplevel/main/codes.sig file).
275 :     OldCGen/coder/
276 : monnier 16 This directory contains the machine-independent parts of the
277 :     old code generator. Some important signatures are also here.
278 : monnier 93 OldCGen/cpsgen/
279 : monnier 45 Compilation of CPS into the abstract machine in the old code
280 :     generator. Probably the spill.sml and limit.sml files should
281 :     not be placed here. A counterpart of this in the new
282 : monnier 93 code generator is the NewCGen/cpscompile directory.
283 :     OldCGen/mips/
284 : monnier 16 MIPS code generator for both little endian and big endian
285 : monnier 93 OldCGen/rs6000/
286 : monnier 16 RS6000 code generator
287 : monnier 93 OldCGen/sparc/
288 : monnier 16 SPARC code generator
289 : monnier 93 OldCGen/x86/
290 : monnier 16 X86 code generator
291 :    
292 : monnier 93 MLRISC
293 :     Lal George's new MLRISC based code generators (MLRISC).
294 : monnier 16
295 : monnier 93 MiscUtil/
296 : monnier 16 Contains various kinds of utility programs
297 : monnier 93 MiscUtil/bignums/
298 : monnier 16 Bignum packages. I have no clue how stable this is.
299 : monnier 93 MiscUtil/fixityparse
300 :     MiscUtil/lazycomp
301 : monnier 45 Some code for implementation of the lazy evaluation primitives.
302 : monnier 93 MiscUtil/print/
303 : monnier 16 Pretty printing. Very Adhoc, needs major clean up.
304 : monnier 93 MiscUtil/profile/
305 : monnier 16 The time and the space profiler.
306 : monnier 93 MiscUtil/util/
307 : monnier 16 Important utility functions including the Inputsource (for
308 :     reading in a program), and various Hashtable and Dictionary
309 :     implementations.
310 : monnier 45
311 : monnier 16 ============================================================================
312 : monnier 93 A. SUMMARY of PHASES:
313 : monnier 45
314 : monnier 93 0. statenv : symbol -> binding
315 :     dynenv : pid -> object
316 :     symenv : pid -> flint
317 :     1. Parsing : source -> ast
318 :     2. Elaborator: ast + statenv -> absyn + pickle + newstatenv
319 :     3. FLINT : absyn -> FLINT -> CPS -> CLO
320 :     4. CodeGen : CPS -> csegments (via MLRISC)
321 :     5. OldCGen : CPS -> csegments (spilling, limit check, codegen)
322 : monnier 45
323 :     ============================================================================
324 : monnier 93 B. CREATING all-files.cm
325 :    
326 :     How to recover the all-files.cm (or sources.cm) file after making
327 : monnier 45 dramatic changes to the directory structure. Notice that the difference
328 :     between all-files.cm and sources.cm is just the bootstrap glue files.
329 :    
330 : monnier 93 1. ls -1 [TopLevel,Parse,Semant,FLINT,CodeGen,OldCGen,MiscUtil]*/*/*.{sig,sml} \
331 :     | grep -i -v glue | grep -v obsol > xxx
332 :     2. Add ../MLRISC/MLRISC.cm
333 :     3. remove ml.lex.* and ml.grm.* files
334 :     4. Add ../comp-lib/UTIL.cm
335 : monnier 45 5. Add ../ml-yacc/lib/sources.cm
336 :     ============================================================================

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0