SCM Repository
Annotation of /sml/trunk/ckit/doc/overview
Parent Directory
|
Revision Log
Revision 597 - (view) (download)
1 : | dbm | 597 | ckit: A FRONT END FOR C IN SML |
2 : | ------------------------------ | ||
3 : | |||
4 : | 1. GETTING STARTED | ||
5 : | ------------------ | ||
6 : | |||
7 : | On unpacking the ckit sources, you should see a src directory, a doc directory | ||
8 : | and a README file (and possibly other directories, depending on the distribution). | ||
9 : | |||
10 : | The src directory contains the following subdirectories: | ||
11 : | |||
12 : | parser: lexer and parser, parse trees. | ||
13 : | ast: abstract syntax trees (Ast), type-checker, pretty-printer. | ||
14 : | variants: flags for controlling the parser and type-checker. | ||
15 : | |||
16 : | To build the system, cd to src, run SML/NJ and type | ||
17 : | |||
18 : | - CM.make(); | ||
19 : | |||
20 : | To test the parser on "test.c", type | ||
21 : | |||
22 : | - ParseToAst.fileToAst "test.c"; | ||
23 : | |||
24 : | This parses and typechecks "test.c" and returns an abstract syntax tree for | ||
25 : | "test.c". Alternatively, to parse, type-check and then pretty-print "test.c", | ||
26 : | type | ||
27 : | |||
28 : | - ParseToAst.fileToC "test.c"; | ||
29 : | |||
30 : | |||
31 : | 2. USING THE FRONTEND | ||
32 : | --------------------- | ||
33 : | |||
34 : | The following describes some commonly used ckit functions. | ||
35 : | |||
36 : | 2.1: ParseToAst.fileToAst: string -> ParseToAst.astBundle | ||
37 : | |||
38 : | This is the main function to parse a file and produce abstract syntax. | ||
39 : | When applied to a string (file name), it produces a bundle of information of | ||
40 : | type astBundle: | ||
41 : | |||
42 : | type astBundle = | ||
43 : | {ast: Ast.ast, | ||
44 : | tidtab: Bindings.tidBinding Tidtab.uidtab, | ||
45 : | errorCount: int, | ||
46 : | warningCount: int, | ||
47 : | auxiliaryInfo: {astTypes: Tables.astIndexTab, | ||
48 : | implicits: Tables.astIndexTab, | ||
49 : | env: State.symtab}} | ||
50 : | |||
51 : | where: | ||
52 : | ast is the abstract syntax tree. | ||
53 : | tidtab is the type identifier table that maps type identifiers into their meanings. | ||
54 : | errorCount is the count of all errors encountered during parsing and type checking. | ||
55 : | warningCount is the count of all warnings encountered during parsing and type checking. | ||
56 : | astTypes is a table mapping ast indexes into the types of the corresponding ast expressions. | ||
57 : | env is used to carry over global symbol information in some mult-file parsing applications. | ||
58 : | |||
59 : | 2.2 ParseToAst.fileToC | ||
60 : | |||
61 : | |||
62 : | 2.3 Parser.parseFile | ||
63 : | val parseFile : string -> ParseTree.externalDecl list * Error.errorStream | ||
64 : | |||
65 : | 2.4 errorStream | ||
66 : | -- not used for much | ||
67 : | -- contains count of warnings and errors | ||
68 : | |||
69 : | Top level driving functions are in file parse-to-ast.sml. Generally | ||
70 : | use ParseToAst.fileToAst. It returns a record of type BuildAst.ProgramInfo. | ||
71 : | |||
72 : | Example: | ||
73 : | |||
74 : | val {ast={ast: Ast.ast, | ||
75 : | tidtab: Bindings.tidBinding Tidtab.uidtab, | ||
76 : | errstrm: Error.errorStream}, | ||
77 : | aidtab: Tables.aidtab, | ||
78 : | opaidtab: Tables.aidtab, | ||
79 : | env: State.symtab} = ParseToAst.fileToAst ("file"); | ||
80 : | |||
81 : | Ast.ast is the abstract syntax type for translation units (a list of top-level | ||
82 : | C declarations). For further information, read the code. | ||
83 : | |||
84 : | To get a hold of parse trees (parser/parse-tree.sml), which is the raw data | ||
85 : | structure produced by the parser: | ||
86 : | |||
87 : | val (parseTree, errorStream) = Parser.parseFile "file"; | ||
88 : | |||
89 : | See parse/util/error.sml for the definition of the errorStream type. | ||
90 : | |||
91 : | |||
92 : | |||
93 : | 3. SYSTEM STRUCTURE | ||
94 : | ------------------- | ||
95 : | |||
96 : | The frontend consists of a number of phases. | ||
97 : | The first phase consists of a lexer/parser (written using ml-lex and ml-yacc | ||
98 : | respectively). The output of this phase is a data-structure (parse tree) | ||
99 : | that is a simple "unprocessed" form that closely follows the structure of C's | ||
100 : | grammar. The next phase inputs the parse tree data-structure, type checks it, | ||
101 : | and produces a "processed" abstract syntax tree representation (Ast). | ||
102 : | |||
103 : | 3.1 THE LEXER AND PARSER | ||
104 : | |||
105 : | These are built using ml-lex and ml-yacc. The lex and yacc files can be found | ||
106 : | in src/parser/grammar/[c.lex,c.grm]. The parser performs only a minimal amount | ||
107 : | of syntactic processing. Many syntactic restrictions are enforced during the | ||
108 : | type-checking phase e.g restrictions on the number and combination of type | ||
109 : | specifiers used in a type. | ||
110 : | |||
111 : | Similarly, most scoping issues are addressed during type-checking. | ||
112 : | One exception is typedef. This must be handled during parsing because typedefs | ||
113 : | introduce new types and these can dramatically alter the shape of parse trees. | ||
114 : | In principle, the scoping of typedefs could be delayed till later processing, | ||
115 : | but in practice this is not feasible: in particular, if typedefs are not | ||
116 : | processed during parsing, then we cannot distinguish between | ||
117 : | declaration forms and expressions. Consider, the following program. | ||
118 : | |||
119 : | char x; | ||
120 : | f() { | ||
121 : | typedef int x; | ||
122 : | { | ||
123 : | x * x; | ||
124 : | } | ||
125 : | } | ||
126 : | |||
127 : | Here, "x * x" declares x as a pointer to an integer. | ||
128 : | However, if the typedef is commented out, then | ||
129 : | "x * x" is interpreted as an expression. | ||
130 : | |||
131 : | The treatment of typedefs involves a subtle interaction between the parser and | ||
132 : | lexer. When the parser recognizes a typedef for an identifier, it communicates | ||
133 : | to the lexer that the identifier should now be treated as a "type". | ||
134 : | Parser lookahead introduces additional complication: we cannot lex a token until | ||
135 : | any preceding typedefs have been processed. In particular, we must limit | ||
136 : | lookahead to one symbol. In fact, this only works because C's grammar requires | ||
137 : | typedefs to end in a semicolon --- this semicolon acts as a buffer so that a | ||
138 : | typedef will be completely processed before any use of the new type is lexed. | ||
139 : | Note that typedefs can be scoped (e.g. see the above program), and so the parser | ||
140 : | must tell the lexer to undo the effect of a typedef when the typedef's scope is | ||
141 : | exited. Another complication is the error recovery mechanism of ml-yacc. | ||
142 : | |||
143 : | The parser produces parse trees (see src/parser/parse-tree-sig.sml). | ||
144 : | This data structure is a simple "unprocessed" form that closely follows the | ||
145 : | structure of C's grammar. These parse trees are built up by the actions of the | ||
146 : | ml-yacc grammar. | ||
147 : | |||
148 : | Any language extensions is likely to involve extensions to the lexer, parser and | ||
149 : | to the parse tree datatype. When extending the lexer and parser, care must be taken to | ||
150 : | preserve the interaction between the lexer, the parser, and the use of one-token | ||
151 : | lookahead. Extensions to the parse tree datatype are supported via a collection | ||
152 : | of "Ext" constructors in the parse tree datatypes. The file | ||
153 : | extensions/c/parse-tree-ext.sml contains the default "empty extension" for | ||
154 : | standard C. | ||
155 : | |||
156 : | Files: | ||
157 : | parser/parser-tree-sig.sml, parser-tree.sml: definition of parse tree types | ||
158 : | parser/grammar/c.lex, c.grm: lex and yacc specifications | ||
159 : | parser/util/sourcemap-sig.sml, sourcemap.sml: mapping source file locations | ||
160 : | parser/util/error-sig.sml, error.sml: error reporting functions | ||
161 : | |||
162 : | |||
163 : | 3.2 ABSTRACT SYNTAX TREES (AST'S) AND BUILD-AST | ||
164 : | |||
165 : | BuildAst (src/ast/build-ast.sml) consumes parse trees and builds up abstract | ||
166 : | syntax trees (Ast's) while performing type checking. Ast's (src/ast/ast.sml) | ||
167 : | are defined so that each of the major syntactic categories (statements, | ||
168 : | expressions, and declarations) have a unique integer index associated with them. | ||
169 : | These indices are used to associate information with specific parts of the | ||
170 : | code. Care must be taken to preserve their uniqueness when performing code | ||
171 : | transformations. | ||
172 : | |||
173 : | Objects (global variables, local variables, functions, etc) and struct/union | ||
174 : | fields are assigned globally unique integers called program identifiers | ||
175 : | (pids). This simplifies treatment of scope in Ast. Similarly, types introduced | ||
176 : | by structs, unions, enums and typedefs are assigned globally unique | ||
177 : | integers called type identifiers (tids). | ||
178 : | |||
179 : | BuildAst performs the following tasks: | ||
180 : | |||
181 : | 1. Scoping: scoping of variables, structs, unions, fields and enums | ||
182 : | is resolved. | ||
183 : | |||
184 : | 2. Type Checking: Full ANSIC C type checking is performed, and appropriate | ||
185 : | errors and warnings are generated. Errors and warnings are suppressed | ||
186 : | in the case where there are parse errors. The behaviour of the type | ||
187 : | checker can be customized using a collection of flags in | ||
188 : | the TypeCheckControl structure defined in src/variants/ansic/config.sml. | ||
189 : | BuildAst incrementally constructs a mapping between expression indices and | ||
190 : | types that records the type of each expression. | ||
191 : | |||
192 : | 3. Type Sizes And Memory Layout: BuildAst computes the sizes of the objects | ||
193 : | declared in the program. It also optionally reduces sizeof expressions to | ||
194 : | integer constants (the flag BuildAst.reduce_sizeof can be used to enable | ||
195 : | this feature; the default setting does not reduce sizeof constructs). | ||
196 : | BuildAst also computes the layout and alignment properties of | ||
197 : | all objects, including the offsets for fields of structs. | ||
198 : | Type size and memory layout is architecture and compiler specific. | ||
199 : | The behaviour of this aspect of BuildAst is specified in | ||
200 : | Sizes structure defined in src/variants/ansic/config.sml. | ||
201 : | |||
202 : | 4. Initializer Normalization: The meaning of an object initializer is partly | ||
203 : | determined by the type of the object begin initialized. BuildAst | ||
204 : | normalizes initializers so that they are easier to implement. | ||
205 : | Moreover, certain aspects of the type of an object are inferred from | ||
206 : | an initializer (e.g. int x[] = {1,2,3}). | ||
207 : | |||
208 : | Files: | ||
209 : | |||
210 : | ast/ast-sig.sml, ast.sml: definition of abstract syntax datatypes. | ||
211 : | ast/build-ast.sml: translation from parse trees to abstract syntax, with type checking and other | ||
212 : | static semantics processing. | ||
213 : | extensions/ | ||
214 : | c/ -- dummy extension structures for C | ||
215 : | variants/ | ||
216 : | ansic/ | ||
217 : | config.sml: various flags controlling error checking, type checking, etc. | ||
218 : | |||
219 : | 3.3 PRETTY PRINTER FOR AST | ||
220 : | Ast comes equipped with a pretty-printer (ast/pp/pp-ast-sig.sml). | ||
221 : | Not only is this useful for debugging purposes, but it also is an integral | ||
222 : | component of source-to-source applications of the frontend. | ||
223 : | When pretty printing Ast, pids and tids can be optionally printed. | ||
224 : | The following flags control this behavior: | ||
225 : | |||
226 : | PPLib.suppressPidUnderscores: controls printing of pids | ||
227 : | PPLib.suppressPidGlobalUnderscores: controls printing of pids for global objects | ||
228 : | PPLib.suppressTidUnderscores: controls printing of tids. | ||
229 : | |||
230 : | Files: | ||
231 : | pp/pp-ast-fn.sml : the generic pretty printing code for ast. | ||
232 : | pp/pp-ast-sig.sml: pretty printing signature. | ||
233 : | pp/pp-ast.sml: default pretty printer | ||
234 : | pp/pp-ast-adornment.sml: pretty printer for printing ast interspersed with adornment info. | ||
235 : | pp/pp-lib.sml: pretty printing for identifiers; some pretty printing combinators. | ||
236 : | |||
237 : | |||
238 : | 3.4 AST-UTILS [Not distributed yet] | ||
239 : | |||
240 : | Files: | ||
241 : | ast-utils/copy/: copying ast types | ||
242 : | ast-utils/equality/: equality for ast types | ||
243 : | ast-utils/simplifier/: ast simplifier | ||
244 : | |||
245 : | |||
246 : | 5. LOCATION INFO | ||
247 : | ---------------- |
root@smlnj-gforge.cs.uchicago.edu | ViewVC Help |
Powered by ViewVC 1.0.0 |