Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/ckit/doc/overview.html
ViewVC logotype

Annotation of /sml/trunk/ckit/doc/overview.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 597 - (view) (download) (as text)

1 : dbm 597 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
2 :     <html>
3 :     <head>
4 :     <title>ckit Overview</title>
5 :     </head>
6 :    
7 :     <body>
8 :     <center>
9 :     <h1>ckit: A Front End for C in SML</h1>
10 :     </center>
11 :     <h3>1. Getting Started</h3>
12 :     <p>
13 :     On unpacking the ckit sources, you should see a src directory, a doc directory
14 :     and a README file (and possibly other directories, depending on the distribution).
15 :     <p>
16 :     The src directory contains the following subdirectories:
17 :     <dl>
18 :     <dt>parser/
19 :     <dd>lexer and parser, parse trees.
20 :     <dt>ast/
21 :     <dd>abstract syntax trees (Ast), type-checker, pretty-printer.
22 :     <dt>variants/
23 :     <dd>flags for controlling the parser and type-checker.
24 :     </dl>
25 :     To build the system, cd to src, run SML/NJ and type
26 :     <pre>
27 :     - CM.make();
28 :     </pre>
29 :     To test the parser on "test.c", type
30 :     <pre>
31 :     - ParseToAst.fileToAst "test.c";
32 :     </pre>
33 :     This parses and typechecks "test.c" and returns an abstract syntax tree for
34 :     "test.c". Alternatively, to parse, type-check and then pretty-print "test.c",
35 :     type
36 :     <pre>
37 :     - ParseToAst.fileToC "test.c";
38 :     </pre>
39 :     <p>
40 :     <h3>2. Using the Frontend</h3>
41 :     <p>
42 :     C source programs are processed in two steps. The lexer and parser
43 :     translate the source to parse trees (Parser.parseFile), and the
44 :     "elaboration" or static semantics phase (BuildAst.makeAst) performs
45 :     type checking and translates to abstract syntax. The parse tree
46 :     datatypes are defined in parse/parse-tree-sig.sml and the abstract
47 :     syntax types in ast/ast-sig.sml. These definitions are fairly
48 :     straightforward and should be self-explanatory.
49 :     <p>
50 :     Top level driving functions are in module <code>ParseToAst</code> (see
51 :     ast/parse-to-ast-sig.sml). The following subsections describe some
52 :     commonly used ckit functions.
53 :     <p>
54 :     <h4>2.1. <code>ParseToAst.fileToAst: string -> ParseToAst.astBundle</code></h4>
55 :     <p>
56 :     This is the main function to parse a file and produce abstract syntax.
57 :     When applied to a string (the C source file name), it produces a
58 :     bundle of information of type <code>astBundle</code>:
59 :     <pre>
60 :     type astBundle =
61 :     {ast: Ast.ast,
62 :     tidtab: Bindings.tidBinding Tidtab.uidtab,
63 :     errorCount: int,
64 :     warningCount: int,
65 :     auxiliaryInfo: {astTypes: Tables.astIndexTab,
66 :     implicits: Tables.astIndexTab,
67 :     env: State.symtab}}
68 :     </pre>
69 :     where:
70 :     <menu>
71 :     <li> <code>ast</code> is the abstract syntax tree.
72 :     <li> <code>tidtab</code> is the type identifier table that maps type identifiers into their meanings.
73 :     <li> <code>errorCount</code> is the count of all errors encountered during parsing and type checking.
74 :     <li> <code>warningCount</code> is the count of all warnings encountered during parsing and type checking.
75 :     <li> <code>astTypes</code> is a table mapping ast indexes into the types of the corresponding ast expressions.
76 :     <li> <code>env</code> is used to carry over global symbol information in some mult-file parsing applications.
77 :     </menu>
78 :    
79 :     <h4>2.2. <code>ParseToAst.fileToC : string -> unit</code></h4>
80 :     <p>
81 :     Process a file and pretty print the resulting ast.
82 :    
83 :     <h4>2.3. <code>Parser.parseFile : Error.errorState -> string -> ParseTree.externalDecl list</code></h4>
84 :     To get a hold of a parse tree (parser/parse-tree-sig.sml),
85 :     use <code>Parser.parseFile</code> (see parser/parser-sig.sml).
86 :     This function takes an <code>errorState</code> and the
87 :     name of a (preprocessed) C source file and returns a list of external
88 :     declaration parse trees corresponding to the top-level declarations in
89 :     the source file. See parser/parse-tree-sig.sml for definitions of
90 :     the parse tree types and parser/util/error-sig.sml for documentation
91 :     on Error.errorState.
92 :     <p>
93 :    
94 :     <h3>3. System Structure</h3>
95 :     <p>
96 :     The frontend consists of a number of phases.
97 :     The first phase consists of a lexer/parser (written using ml-lex and ml-yacc
98 :     respectively). The output of this phase is a data-structure (parse tree)
99 :     that is a simple "unprocessed" form that closely follows the structure of C's
100 :     grammar. The next phase inputs the parse tree data-structure, type checks it,
101 :     and produces a "processed" abstract syntax tree representation (Ast).
102 :     <p>
103 :     <h4>3.1. The Lexer and Parser</h4>
104 :     <p>
105 :     These are built using ml-lex and ml-yacc. The lex and yacc files can be found
106 :     in src/parser/grammar/[c.lex,c.grm]. The parser performs only a minimal amount
107 :     of syntactic processing. Many syntactic restrictions are enforced during the
108 :     type-checking phase e.g restrictions on the number and combination of type
109 :     specifiers used in a type.
110 :     <p>
111 :     Similarly, most scoping issues are addressed during type-checking.
112 :     One exception is typedef. This must be handled during parsing because typedefs
113 :     introduce new types and these can dramatically alter the shape of parse trees.
114 :     In principle, the scoping of typedefs could be delayed till later processing,
115 :     but in practice this is not feasible: in particular, if typedefs are not
116 :     processed during parsing, then we cannot distinguish between
117 :     declaration forms and expressions. Consider, the following program.
118 :     <pre>
119 :     char x;
120 :     f() {
121 :     typedef int x;
122 :     {
123 :     x * x;
124 :     }
125 :     }
126 :     </pre>
127 :     Here, "<code>x * x</code>" declares <code>x</code> as a pointer to an integer.
128 :     However, if the typedef is commented out, then
129 :     "<code>x * x</code>" is interpreted as an expression.
130 :     <p>
131 :     The treatment of typedefs involves a subtle interaction between the parser and
132 :     lexer. When the parser recognizes a typedef for an identifier, it communicates
133 :     to the lexer that the identifier should now be treated as a "type".
134 :     Parser lookahead introduces additional complication: we cannot lex a token until
135 :     any preceding typedefs have been processed. In particular, we must limit
136 :     lookahead to one symbol. In fact, this only works because C's grammar requires
137 :     typedefs to end in a semicolon --- this semicolon acts as a buffer so that a
138 :     typedef will be completely processed before any use of the new type is lexed.
139 :     Note that typedefs can be scoped (e.g. see the above program), and so the parser
140 :     must tell the lexer to undo the effect of a typedef when the typedef's scope is
141 :     exited. Another complication is the error recovery mechanism of ml-yacc.
142 :     <p>
143 :     The parser produces parse trees (see src/parser/parse-tree-sig.sml).
144 :     This data structure is a simple "unprocessed" form that closely follows the
145 :     structure of C's grammar. These parse trees are built up by the actions of the
146 :     ml-yacc grammar.
147 :     <p>
148 :     Any language extensions is likely to involve extensions to the lexer,
149 :     parser and to the parse tree datatype. When extending the lexer and
150 :     parser, care must be taken to preserve the interaction between the
151 :     lexer, the parser, and the use of one-token lookahead. Extensions to
152 :     the parse tree datatype are supported via a collection of "Ext"
153 :     constructors in the parse tree datatypes. The file
154 :     extensions/c/parse-tree-ext.sml contains the default "empty extension"
155 :     for standard C.
156 :     <p>
157 :     Files:
158 :     <dl>
159 :     <dt>parser/parser-tree-sig.sml, parser-tree.sml
160 :     <dd>definition of parse tree types
161 :     <dt>parser/grammar/c.lex, c.grm
162 :     <dd>lex and yacc specifications
163 :     <dt>parser/util/sourcemap-sig.sml, sourcemap.sml
164 :     <dd>mapping source file locations
165 :     <dt>parser/util/error-sig.sml, error.sml
166 :     <dd>error reporting functions
167 :     </dl>
168 :     <p>
169 :     <h4>3.2. Abstract Syntax Trees (AST'S) And BuildAst</h4>
170 :     <p>
171 :     BuildAst (src/ast/build-ast.sml) consumes parse trees and builds up abstract
172 :     syntax trees (Ast's) while performing type checking. Ast's (src/ast/ast.sml)
173 :     are defined so that each of the major syntactic categories (statements,
174 :     expressions, and declarations) have a unique integer index associated with them.
175 :     These indices are used to associate information with specific parts of the
176 :     code. Care must be taken to preserve their uniqueness when performing code
177 :     transformations.
178 :     <p>
179 :     Objects (global variables, local variables, functions, etc) and struct/union
180 :     fields are assigned globally unique integers called program identifiers
181 :     (pids). This simplifies treatment of scope in Ast. Similarly, types introduced
182 :     by structs, unions, enums and typedefs are assigned globally unique
183 :     integers called type identifiers (tids).
184 :     <p>
185 :     BuildAst performs the following tasks:
186 :     <ol>
187 :     <li> Scoping: scoping of variables, structs, unions, fields and enums
188 :     is resolved.
189 :     <p>
190 :     <li> Type Checking: Full ANSIC C type checking is performed, and
191 :     appropriate errors and warnings are generated. Errors and warnings
192 :     are suppressed in the case where there are parse errors. The
193 :     behaviour of the type checker can be customized using a collection of
194 :     flags in the TypeCheckControl structure defined in
195 :     src/variants/ansic/config.sml. BuildAst incrementally constructs a
196 :     mapping between expression indices and types that records the type of
197 :     each expression.
198 :     <p>
199 :     <li> Type Sizes And Memory Layout: BuildAst computes the sizes of the
200 :     objects declared in the program. It also optionally reduces sizeof
201 :     expressions to integer constants (the flag BuildAst.reduce_sizeof can
202 :     be used to enable this feature; the default setting does not reduce
203 :     sizeof constructs). BuildAst also computes the layout and alignment
204 :     properties of all objects, including the offsets for fields of
205 :     structs. Type size and memory layout is architecture and compiler
206 :     specific. The behaviour of this aspect of BuildAst is specified in
207 :     Sizes structure defined in src/variants/ansic/config.sml.
208 :     <p>
209 :     <li> Initializer Normalization: The meaning of an object initializer
210 :     is partly determined by the type of the object begin initialized.
211 :     BuildAst normalizes initializers so that they are easier to implement.
212 :     Moreover, certain aspects of the type of an object are inferred from
213 :     an initializer (e.g. int x[] = {1,2,3}).
214 :     </ol>
215 :     Files:
216 :     <dl>
217 :     <dt>ast/ast-sig.sml, ast.sml
218 :     <dd>definition of abstract syntax datatypes.
219 :     <dt>ast/build-ast.sml
220 :     <dd>translation from parse trees to abstract syntax, with type
221 :     checking and other static semantics processing.
222 :     <dt>extensions/c/
223 :     <dd>dummy extension structures for C
224 :     <dt>variants/ansic/config.sml
225 :     <dd>various flags controlling error checking, type checking, etc.
226 :     </dl>
227 :     <p>
228 :     <h4>3.3. Pretty Printer for AST</h4>
229 :     Ast comes equipped with a pretty-printer (ast/pp/pp-ast-sig.sml). Not
230 :     only is this useful for debugging purposes, but it also is an integral
231 :     component of source-to-source applications of the frontend. When
232 :     pretty printing Ast, pids and tids can be optionally printed. The
233 :     following flags control this behavior:
234 :     <pre>
235 :     PPLib.suppressPidUnderscores: controls printing of pids
236 :     PPLib.suppressPidGlobalUnderscores: controls printing of pids for global objects
237 :     PPLib.suppressTidUnderscores: controls printing of tids.
238 :     </pre>
239 :     Files:
240 :     <dl>
241 :     <dt>pp/pp-ast-fn.sml
242 :     <dd>the generic pretty printing code for ast
243 :     <dt>pp/pp-ast-sig.sml
244 :     <dd>pretty printing signature
245 :     <dt>pp/pp-ast.sml
246 :     <dd>default pretty printer
247 :     <dt>pp/pp-ast-adornment.sml
248 :     <dd>pretty printer for printing ast interspersed with adornment info
249 :     <dt>pp/pp-lib.sml
250 :     <dd>pretty printing for identifiers; some pretty printing combinators.
251 :     </dl>
252 :     <p>
253 :     <h4>3.4. AST-UTILS [Not distributed yet]</h4>
254 :     <p>
255 :     Files:
256 :     <dl>
257 :     <dt>ast-utils/copy/
258 :     <dd>copying ast types
259 :     <dt>ast-utils/equality/
260 :     <dd>equality for ast types
261 :     <dt>ast-utils/simplifier/
262 :     <dd>ast simplifier
263 :     </dl>
264 :     <p>
265 :    
266 :     <h3>4. Location Info</h5>
267 :     <p>
268 :     Program phrases (expressions, declarations, statements) are annotated
269 :     in the abstract syntax with source code locations, which are
270 :     represented by a data structure that determines a region within a
271 :     source file. See src/parser/sourcemap-sig.sml.
272 :    
273 :     <hr>
274 :     <address><a href="mailto:dbm@research.bell-labs.com">Dave MacQueen</a></address>
275 :     <!-- Created: Tue Dec 7 11:11:32 EST 1999 -->
276 :     <!-- hhmts start -->
277 :     Last modified: Tue Dec 7 15:39:13 EST 1999
278 :     <!-- hhmts end -->
279 :     </body>
280 :     </html>

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0