Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/branches/primop-branch-3/compiler/README
ViewVC logotype

Annotation of /sml/branches/primop-branch-3/compiler/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 245 - (view) (download)
Original Path: sml/branches/SMLNJ/src/compiler/README

1 : monnier 245 ============================================================================
2 :     This README file describes the overall structure of the current version of
3 :     the SML/NJ (v110.4) & FLINT/ML (v1.4) compiler source tree. Please send
4 :     your questions, comments, and suggestions to sml-nj@research.bell-labs.com.
5 :     ============================================================================
6 :    
7 :     NOTES
8 :     Some informal implementation notes.
9 :    
10 :     README
11 :     This file. It gives an overview of the overall compiler structure.
12 :    
13 :     all-files.cm
14 :     The standard Makefile for compiling the compiler. It is similar
15 :     to the idea of sources.cm used by CM.make, except that
16 :     all-files.cm is designed for bootstrapping the compiler itself
17 :     only (i.e., CMB.make). The resulting binfiles from doing CMB.make
18 :     are placed in a single bin directory, eg. bin.x86-unix or
19 :     bin.sparc-unix.
20 :    
21 :     CM's preprocessor directives are used in such a way that the compiler,
22 :     the compilation manager, and the batch compilation manager
23 :     for the current architecture will be built.
24 :    
25 :     In addition to that, it is possible to also build additional binfiles
26 :     that are useful for "retargeting" the compiler. This is optional (and
27 :     turned off by default), because it is not strictly necessary. If the
28 :     binfiles for the cross-compiler are not present at the time of
29 :     CMB.retarget, then CMB.retarget will create them.
30 :    
31 :     sources.cm
32 :     This file is an alias for viscomp-lib.cm and makes it possible to
33 :     use CMB.CM.make (); for experimenting with the compiler.
34 :     (Don't use CM.make because it has a different idea of where the binfiles
35 :     live.)
36 :     This can be useful for debugging purpose. You can type CMB.CM.make()
37 :     to immediately build up a new, interactive visible compiler. To
38 :     access the newly built compiler, you use the
39 :     "XXXVisComp.Interact.useFile"
40 :     function to compile ML programs. Notice none of the bootstrap glue
41 :     is in sources.cm (viscomp-lib.cm).
42 :    
43 :     viscomp-lib.cm
44 :     This file specifies the "library" of visible compilers for various
45 :     supported architectures.
46 :    
47 :     makeml* [-full] [-rebuild dir]
48 :     A script for building the interactive compiler. The default path
49 :     of bin files is ./bin.$arch-$os. There are two command-line options:
50 :     if you add the "-full" option, it will build a compiler whose
51 :     components are visible to the top-level interactive environment.
52 :    
53 :     If you add the "-rebuild dir" option, it will recompile the compiler,
54 :     using "dir" as the new binfile directory. It then proceeds by loading
55 :     the static and symbolic environments from the newly created batch of
56 :     binfiles. (This supercedes the -elab option and is useful if
57 :     your new compiler has changed the representations of the bindings
58 :     in the environments. Other than with -elab, there will be a fresh set
59 :     of usable binfiles ready after such a "rebuild".)
60 :    
61 :     There are some environment variables that are sensed during bootstrap.
62 :     They determine the defaults for various parameters used by the compilation
63 :     manager. Internal "fallback" defaults are used for variables that are
64 :     not defined at the time of bootstrap.
65 :     CM_YACC_DEFAULT -- shell command to run ml-yacc
66 :     CM_LEX_DEFAULT -- shell command to run ml-lex
67 :     CM_BURG_DEFAULT -- shell command to run ml-burg
68 :     CM_RCSCO_DEFAULT -- shell command to checkout a file under RCS
69 :     CM_PATH_DEFAULT -- ':'-separated list of directories that are on
70 :     CM's search path
71 :     ...
72 :    
73 :     Retarget/<arch>-<os>.{cm,sml}
74 :     WARNING!
75 :     After you do a 'CMB.retarget { cpu = "<arch>", os = "<os>" };'
76 :     you can access the "CMB" structure for the newly-loaded cross compiler
77 :     as <Arch><Os>CMB. The original structure CMB will *not* be redefined!
78 :     For further details on retargeting see Retarget/README.
79 :    
80 :     ============================================================================
81 :     Tips:
82 :     The current source code is organized as a two-level directory tree.
83 :     All source files (except those in Retarget/* wich are not part of the
84 :     ordinary compiler) can be grep-ed by typing "grep xxx */*/*.{sig,sml}",
85 :     assuming you are looking for binding "xxx".
86 :    
87 :     The following directories are organized based on the compilation phases.
88 :     Within each phase, the "main" sub-directory always contains the top-level
89 :     module and some important data structures for that particular compilation
90 :     phase.
91 :    
92 :     File name conventions:
93 :     *.sig --- the ML signature file
94 :     *.sml --- the ML source program (occasionally with signatures)
95 :     *.grm --- ML-Yacc file
96 :     *.lex --- ML-Lex file
97 :     *.cm --- the CM makefile
98 :    
99 :     PervEnv
100 :     The SML/NJ Initial Bootstrapping Library and the SML97 Basis Library.
101 :     When recompiling the compiler (i.e., via CMB.make), files in this
102 :     directory are always compiled first. More specifically, their order
103 :     of compilation is as follows:
104 :     (0) build the initial primitive static environment
105 :     (see Semant/statenv/prim.sml)
106 :     (1) compile assembly.sig and dummy.sml, these two files
107 :     make up the static environment for the runtime structure
108 :     (coming from the ../runtime/kernel/globals.c file). The
109 :     dynamic object from executing dummy.sml is discarded, and
110 :     replaced by a hard-wired object coming from the runtime
111 :     system.
112 :     (2) compile core.sml, which defines a bunch of useful exceptions
113 :     and utilty functions such as polymorphic equality, string
114 :     equality, delay and force primitives, etc.
115 :     (4) files in all-files.cm (must follow the exact order)
116 :     (5) files in pervasive.cm (must follow the exact order)
117 :    
118 :     TopLevel
119 :     This directory contains the top-level glue files for different versions
120 :     of the batch and interactive compiler. To understand, how the compiler
121 :     is organized, you can read the main directory.
122 :     TopLevel/batch/
123 :     Utility files for the Compilation Manager CM and CMB;
124 :     TopLevel/bootstrap/
125 :     How to bootstrap an interactive compiler. Details are in boot.sml and
126 :     shareglue.sml. Before building an interactive compiler, one should have
127 :     already gotten a visible compiler (for that particular architecture),
128 :     see the viscomp directory. To build a compiler for SPARC architecture,
129 :     all we need to do is to load and run the IntSparc (in sparcglue.sml)
130 :     structure.
131 :     TopLevel/environ/
132 :     A top-level environment include static environment, dynamic environment
133 :     and symbolic environment. The definitions of static environments are in
134 :     the Semant/statenv directory, as they are mostly used by the elaboration
135 :     and type checking.
136 :     TopLevel/interact/
137 :     How the top-level interactive loop is organized. The evalloop.sml contains
138 :     the details on how a ML program is compiled from source code to binary
139 :     code and then later being executed.
140 :     TopLevel/main/
141 :     The top-level compiler structure is shown in the compile.sig and
142 :     compile.sml. The compile.sml contains details on how ML programs
143 :     are compiled into the FLINT intermediate format, but the details
144 :     on how FLINT gets compiled into the binary code segments are not
145 :     detailed here, instead, they are described in the
146 :     FLINT/main/flintcomp.sml file. The CODEGENERATOR signature
147 :     in codes.sig defines the interface about this FLINT code generator.
148 :     Note: all the uses of the compilation facility goes throught the "compile"
149 :     function defined in the compile.sml. The common intermediate formats are
150 :     stated in the compbasic.sig and compbasic.sml files. The version.sml
151 :     defines the version numbers.
152 :     TopLevel/viscomp/
153 :     How to build the visible compiler viscomp --- this is essentially
154 :     deciding what to export to the outside world. All the Compiler
155 :     control flags are defined in the control.sig and control.sml files
156 :     placed in this directory.
157 :    
158 :     Parse/
159 :     Phase 1 of the compilation process. Turning the SML source code into
160 :     the Concrete Synatx. The definition of concrete syntax is in ast/ast.sml.
161 :     The frontend.sig and frontend.sml files in the main directory contain
162 :     the big picture on the front end.
163 :    
164 :     Semant
165 :     This phase does semantic analysis, more specifically, it does the
166 :     elaboration (of concrete syntax into abstract syntax) and type-checking
167 :     of the core and module languages. The semantic objects are defined in
168 :     main/bindings.sml. The result is the Abstract Syntax, defined the
169 :     main/absyn.sml file.
170 :     Semant/basics/
171 :     Definition of several data structures and utility functions. They are
172 :     used by the code that does semantic analysis. The env.sig and env.sml
173 :     files defines the underlying data structures used to represent the
174 :     static environment.
175 :     Semant/elaborate/
176 :     How to turn a piece of code in the Concrete Syntax into one in the
177 :     Abstract Syntax. The top-level organization is in the following
178 :     elabtop.sml file.
179 :     Semant/main/absyn.sml
180 :     Definition of Abstract Syntax
181 :     Semant/main/bindings.sml
182 :     Top-level view of what semantic objects we have
183 :     Semant/main/elabtop.sml
184 :     Top-level view of the elaboration process. Notice that each piece
185 :     of core-ML program is first translated into the Abstract Syntax,
186 :     and gets type-checked. The type-checking does change the contents
187 :     of abstract syntax, as certain type information won't be known
188 :     until type-checking is done.
189 :     Semant/modules/
190 :     Utility functions for elaborations of modules. The module.sig and
191 :     module.sml contains the definitions of module-level semantic objects.
192 :     Semant/pickle/
193 :     How to write the static environments into a file! This is important
194 :     if you want to create the *.bin file. It is also useful to infer
195 :     a unique persistant id for each compilation unit (useful to detect
196 :     the cut-off compilation dependencies).
197 :     Semant/statenv/
198 :     The definition of Static Environment. The CM-ed version of Static
199 :     Environment is used to avoid environment blow-up in the pickling.
200 :     The prim.sml contains the list of primitive operators and primitive
201 :     types exported in the initial static environment (i.e., PrimEnv).
202 :     During bootstrapping, PrimEnv is the first environment you have to
203 :     set up before you can compile files in the Boot directory.
204 :     Semant/types/
205 :     This directory contains all the data structures and utility functions
206 :     used in type-checking the Core-ML language.
207 :     Semant/typing/
208 :     The type-checking and type-inference code for the core-ML programs.
209 :     It is performed on Abstract Syntax and it produces Abstract Syntax
210 :     also.
211 :    
212 :     FLINT
213 :     This phase translates the Abstract Syntax into the intermediate
214 :     Lambda language (i.e., FLINT). During the translation, it compiles
215 :     the Pattern Matches (see the mcomp directory). Then it does a bunch
216 :     of optimizations on FLINT; then it does representation analysis,
217 :     and it converts the FLINT code into CPS, finally it does closure
218 :     conversion.
219 :     FLINT/clos/
220 :     The closure conversion step. Check out Shao/Appel LFP94 paper for
221 :     the detailed algorithm.
222 :     FLINT/cps/
223 :     Definition of CPS plus on how to convert the FLINT code into the
224 :     CPS code. The compilation of the Switch statement is done in this
225 :     phase.
226 :     FLINT/cpsopt/
227 :     The CPS-based optimizations (check Appel's "Compiling with
228 :     Continuations" book for details). Eventually, all optimizations
229 :     in this directory will be migrated into FLINT.
230 :     FLINT/flint/
231 :     This directory defines the FLINT language. The detailed definitions
232 :     of primitive tycs, primitive operators, kinds, type constructors,
233 :     and types are in the FLINT/kernel directory.
234 :     FLINT/kernel/
235 :     Definiton of the kernel data structures used in the FLINT language.
236 :     This includes: deBruijn indices, primitive tycs, primitive operators,
237 :     FLINT kinds, FLINT constructors, and FLINT types. When you write
238 :     code that manipulates the FLINT code, please restrict yourself to
239 :     use the functions defined in the LTYEXTERN interface only.
240 :     FLINT/main/
241 :     The flintcomp.sml describes how the FLINT code gets compiled into
242 :     the optimized and closure-converted CPS code (eventually, it should
243 :     produce optimized, closure-converted, adn type-safe FLINT code).
244 :     FLINT/opt/
245 :     The FLINT-based optimizations, such as contraction, type
246 :     specializations, etc.
247 :     FLINT/plambda/
248 :     An older version of the Lambda language (not in the A-Normal form)
249 :     FLINT/reps/
250 :     Code for performing the representation analysis on FLINT
251 :     FLINT/trans/
252 :     Translation of Abstract Syntax into the PLambda code, then to the FLINT
253 :     code. All semantic objects used in the elaboration are translated into
254 :     the FLINT types as well. The translation phase also does match
255 :     compilation. The translation from PLambda to FLINT does the (partial)
256 :     type-based argument flattening.
257 :    
258 :     CodeGen/alpha32/
259 :     Alpha32 new code generator
260 :     CodeGen/alpha32x/
261 :     Alpha32 new code generator (with special patches)
262 :     CodeGen/cpscompile/
263 :     Compilation of CPS into the MLRISC abstract machine code
264 :     CodeGen/hppa/
265 :     HPPA new code genrator
266 :     CodeGen/main/
267 :     The big picture of the codegenerator; including important
268 :     files on machine specifications and runtime tagging schemes.
269 :    
270 :     OldCGen
271 :     The old code generator. May eventually go away after Lal's new
272 :     code generator becomes stable on all platforms. Each code generator
273 :     should produce a structure of signature CODEGENERATOR (defined in
274 :     the Toplevel/main/codes.sig file).
275 :     OldCGen/coder/
276 :     This directory contains the machine-independent parts of the
277 :     old code generator. Some important signatures are also here.
278 :     OldCGen/cpsgen/
279 :     Compilation of CPS into the abstract machine in the old code
280 :     generator. Probably the spill.sml and limit.sml files should
281 :     not be placed here. A counterpart of this in the new
282 :     code generator is the NewCGen/cpscompile directory.
283 :     OldCGen/mips/
284 :     MIPS code generator for both little endian and big endian
285 :     OldCGen/rs6000/
286 :     RS6000 code generator
287 :     OldCGen/sparc/
288 :     SPARC code generator
289 :     OldCGen/x86/
290 :     X86 code generator
291 :    
292 :     MLRISC
293 :     Lal George's new MLRISC based code generators (MLRISC).
294 :    
295 :     MiscUtil/
296 :     Contains various kinds of utility programs
297 :     MiscUtil/bignums/
298 :     Bignum packages. I have no clue how stable this is.
299 :     MiscUtil/fixityparse
300 :     MiscUtil/lazycomp
301 :     Some code for implementation of the lazy evaluation primitives.
302 :     MiscUtil/print/
303 :     Pretty printing. Very Adhoc, needs major clean up.
304 :     MiscUtil/profile/
305 :     The time and the space profiler.
306 :     MiscUtil/util/
307 :     Important utility functions including the Inputsource (for
308 :     reading in a program), and various Hashtable and Dictionary
309 :     implementations.
310 :    
311 :     ============================================================================
312 :     A. SUMMARY of PHASES:
313 :    
314 :     0. statenv : symbol -> binding
315 :     dynenv : pid -> object
316 :     symenv : pid -> flint
317 :     1. Parsing : source -> ast
318 :     2. Elaborator: ast + statenv -> absyn + pickle + newstatenv
319 :     3. FLINT : absyn -> FLINT -> CPS -> CLO
320 :     4. CodeGen : CPS -> csegments (via MLRISC)
321 :     5. OldCGen : CPS -> csegments (spilling, limit check, codegen)
322 :    
323 :     ============================================================================
324 :     B. CREATING all-files.cm
325 :    
326 :     How to recover the all-files.cm (or sources.cm) file after making
327 :     dramatic changes to the directory structure. Notice that the difference
328 :     between all-files.cm and sources.cm is just the bootstrap glue files.
329 :    
330 :     1. ls -1 [TopLevel,Parse,Semant,FLINT,CodeGen,OldCGen,MiscUtil]*/*/*.{sig,sml} \
331 :     | grep -i -v glue | grep -v obsol > xxx
332 :     2. Add ../MLRISC/MLRISC.cm
333 :     3. remove ml.lex.* and ml.grm.* files
334 :     4. Add ../comp-lib/UTIL.cm
335 :     5. Add ../ml-yacc/lib/sources.cm
336 :     ============================================================================

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0