Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/releases/release-110.34/ckit/doc/overview
ViewVC logotype

Annotation of /sml/releases/release-110.34/ckit/doc/overview

Parent Directory Parent Directory | Revision Log Revision Log


Revision 877 - (view) (download)

1 : dbm 597 ckit: A FRONT END FOR C IN SML
2 :     ------------------------------
3 :    
4 :     1. GETTING STARTED
5 :     ------------------
6 :    
7 :     On unpacking the ckit sources, you should see a src directory, a doc directory
8 :     and a README file (and possibly other directories, depending on the distribution).
9 :    
10 :     The src directory contains the following subdirectories:
11 :    
12 :     parser: lexer and parser, parse trees.
13 :     ast: abstract syntax trees (Ast), type-checker, pretty-printer.
14 :     variants: flags for controlling the parser and type-checker.
15 :    
16 :     To build the system, cd to src, run SML/NJ and type
17 :    
18 :     - CM.make();
19 :    
20 :     To test the parser on "test.c", type
21 :    
22 :     - ParseToAst.fileToAst "test.c";
23 :    
24 :     This parses and typechecks "test.c" and returns an abstract syntax tree for
25 :     "test.c". Alternatively, to parse, type-check and then pretty-print "test.c",
26 :     type
27 :    
28 :     - ParseToAst.fileToC "test.c";
29 :    
30 :    
31 :     2. USING THE FRONTEND
32 :     ---------------------
33 :    
34 :     The following describes some commonly used ckit functions.
35 :    
36 :     2.1: ParseToAst.fileToAst: string -> ParseToAst.astBundle
37 :    
38 :     This is the main function to parse a file and produce abstract syntax.
39 :     When applied to a string (file name), it produces a bundle of information of
40 :     type astBundle:
41 :    
42 :     type astBundle =
43 :     {ast: Ast.ast,
44 :     tidtab: Bindings.tidBinding Tidtab.uidtab,
45 :     errorCount: int,
46 :     warningCount: int,
47 :     auxiliaryInfo: {astTypes: Tables.astIndexTab,
48 :     implicits: Tables.astIndexTab,
49 :     env: State.symtab}}
50 :    
51 :     where:
52 :     ast is the abstract syntax tree.
53 :     tidtab is the type identifier table that maps type identifiers into their meanings.
54 :     errorCount is the count of all errors encountered during parsing and type checking.
55 :     warningCount is the count of all warnings encountered during parsing and type checking.
56 :     astTypes is a table mapping ast indexes into the types of the corresponding ast expressions.
57 :     env is used to carry over global symbol information in some mult-file parsing applications.
58 :    
59 :     2.2 ParseToAst.fileToC
60 :    
61 :    
62 :     2.3 Parser.parseFile
63 :     val parseFile : string -> ParseTree.externalDecl list * Error.errorStream
64 :    
65 :     2.4 errorStream
66 :     -- not used for much
67 :     -- contains count of warnings and errors
68 :    
69 :     Top level driving functions are in file parse-to-ast.sml. Generally
70 :     use ParseToAst.fileToAst. It returns a record of type BuildAst.ProgramInfo.
71 :    
72 :     Example:
73 :    
74 :     val {ast={ast: Ast.ast,
75 :     tidtab: Bindings.tidBinding Tidtab.uidtab,
76 :     errstrm: Error.errorStream},
77 :     aidtab: Tables.aidtab,
78 :     opaidtab: Tables.aidtab,
79 :     env: State.symtab} = ParseToAst.fileToAst ("file");
80 :    
81 :     Ast.ast is the abstract syntax type for translation units (a list of top-level
82 :     C declarations). For further information, read the code.
83 :    
84 :     To get a hold of parse trees (parser/parse-tree.sml), which is the raw data
85 :     structure produced by the parser:
86 :    
87 :     val (parseTree, errorStream) = Parser.parseFile "file";
88 :    
89 :     See parse/util/error.sml for the definition of the errorStream type.
90 :    
91 :    
92 :    
93 :     3. SYSTEM STRUCTURE
94 :     -------------------
95 :    
96 :     The frontend consists of a number of phases.
97 :     The first phase consists of a lexer/parser (written using ml-lex and ml-yacc
98 :     respectively). The output of this phase is a data-structure (parse tree)
99 :     that is a simple "unprocessed" form that closely follows the structure of C's
100 :     grammar. The next phase inputs the parse tree data-structure, type checks it,
101 :     and produces a "processed" abstract syntax tree representation (Ast).
102 :    
103 :     3.1 THE LEXER AND PARSER
104 :    
105 :     These are built using ml-lex and ml-yacc. The lex and yacc files can be found
106 :     in src/parser/grammar/[c.lex,c.grm]. The parser performs only a minimal amount
107 :     of syntactic processing. Many syntactic restrictions are enforced during the
108 :     type-checking phase e.g restrictions on the number and combination of type
109 :     specifiers used in a type.
110 :    
111 :     Similarly, most scoping issues are addressed during type-checking.
112 :     One exception is typedef. This must be handled during parsing because typedefs
113 :     introduce new types and these can dramatically alter the shape of parse trees.
114 :     In principle, the scoping of typedefs could be delayed till later processing,
115 :     but in practice this is not feasible: in particular, if typedefs are not
116 :     processed during parsing, then we cannot distinguish between
117 :     declaration forms and expressions. Consider, the following program.
118 :    
119 :     char x;
120 :     f() {
121 :     typedef int x;
122 :     {
123 :     x * x;
124 :     }
125 :     }
126 :    
127 :     Here, "x * x" declares x as a pointer to an integer.
128 :     However, if the typedef is commented out, then
129 :     "x * x" is interpreted as an expression.
130 :    
131 :     The treatment of typedefs involves a subtle interaction between the parser and
132 :     lexer. When the parser recognizes a typedef for an identifier, it communicates
133 :     to the lexer that the identifier should now be treated as a "type".
134 :     Parser lookahead introduces additional complication: we cannot lex a token until
135 :     any preceding typedefs have been processed. In particular, we must limit
136 :     lookahead to one symbol. In fact, this only works because C's grammar requires
137 :     typedefs to end in a semicolon --- this semicolon acts as a buffer so that a
138 :     typedef will be completely processed before any use of the new type is lexed.
139 :     Note that typedefs can be scoped (e.g. see the above program), and so the parser
140 :     must tell the lexer to undo the effect of a typedef when the typedef's scope is
141 :     exited. Another complication is the error recovery mechanism of ml-yacc.
142 :    
143 :     The parser produces parse trees (see src/parser/parse-tree-sig.sml).
144 :     This data structure is a simple "unprocessed" form that closely follows the
145 :     structure of C's grammar. These parse trees are built up by the actions of the
146 :     ml-yacc grammar.
147 :    
148 :     Any language extensions is likely to involve extensions to the lexer, parser and
149 :     to the parse tree datatype. When extending the lexer and parser, care must be taken to
150 :     preserve the interaction between the lexer, the parser, and the use of one-token
151 :     lookahead. Extensions to the parse tree datatype are supported via a collection
152 :     of "Ext" constructors in the parse tree datatypes. The file
153 :     extensions/c/parse-tree-ext.sml contains the default "empty extension" for
154 :     standard C.
155 :    
156 :     Files:
157 :     parser/parser-tree-sig.sml, parser-tree.sml: definition of parse tree types
158 :     parser/grammar/c.lex, c.grm: lex and yacc specifications
159 :     parser/util/sourcemap-sig.sml, sourcemap.sml: mapping source file locations
160 :     parser/util/error-sig.sml, error.sml: error reporting functions
161 :    
162 :    
163 :     3.2 ABSTRACT SYNTAX TREES (AST'S) AND BUILD-AST
164 :    
165 :     BuildAst (src/ast/build-ast.sml) consumes parse trees and builds up abstract
166 :     syntax trees (Ast's) while performing type checking. Ast's (src/ast/ast.sml)
167 :     are defined so that each of the major syntactic categories (statements,
168 :     expressions, and declarations) have a unique integer index associated with them.
169 :     These indices are used to associate information with specific parts of the
170 :     code. Care must be taken to preserve their uniqueness when performing code
171 :     transformations.
172 :    
173 :     Objects (global variables, local variables, functions, etc) and struct/union
174 :     fields are assigned globally unique integers called program identifiers
175 :     (pids). This simplifies treatment of scope in Ast. Similarly, types introduced
176 :     by structs, unions, enums and typedefs are assigned globally unique
177 :     integers called type identifiers (tids).
178 :    
179 :     BuildAst performs the following tasks:
180 :    
181 :     1. Scoping: scoping of variables, structs, unions, fields and enums
182 :     is resolved.
183 :    
184 :     2. Type Checking: Full ANSIC C type checking is performed, and appropriate
185 :     errors and warnings are generated. Errors and warnings are suppressed
186 :     in the case where there are parse errors. The behaviour of the type
187 :     checker can be customized using a collection of flags in
188 :     the TypeCheckControl structure defined in src/variants/ansic/config.sml.
189 :     BuildAst incrementally constructs a mapping between expression indices and
190 :     types that records the type of each expression.
191 :    
192 :     3. Type Sizes And Memory Layout: BuildAst computes the sizes of the objects
193 :     declared in the program. It also optionally reduces sizeof expressions to
194 :     integer constants (the flag BuildAst.reduce_sizeof can be used to enable
195 :     this feature; the default setting does not reduce sizeof constructs).
196 :     BuildAst also computes the layout and alignment properties of
197 :     all objects, including the offsets for fields of structs.
198 :     Type size and memory layout is architecture and compiler specific.
199 :     The behaviour of this aspect of BuildAst is specified in
200 :     Sizes structure defined in src/variants/ansic/config.sml.
201 :    
202 :     4. Initializer Normalization: The meaning of an object initializer is partly
203 :     determined by the type of the object begin initialized. BuildAst
204 :     normalizes initializers so that they are easier to implement.
205 :     Moreover, certain aspects of the type of an object are inferred from
206 :     an initializer (e.g. int x[] = {1,2,3}).
207 :    
208 :     Files:
209 :    
210 :     ast/ast-sig.sml, ast.sml: definition of abstract syntax datatypes.
211 :     ast/build-ast.sml: translation from parse trees to abstract syntax, with type checking and other
212 :     static semantics processing.
213 :     extensions/
214 :     c/ -- dummy extension structures for C
215 :     variants/
216 :     ansic/
217 :     config.sml: various flags controlling error checking, type checking, etc.
218 :    
219 :     3.3 PRETTY PRINTER FOR AST
220 :     Ast comes equipped with a pretty-printer (ast/pp/pp-ast-sig.sml).
221 :     Not only is this useful for debugging purposes, but it also is an integral
222 :     component of source-to-source applications of the frontend.
223 :     When pretty printing Ast, pids and tids can be optionally printed.
224 :     The following flags control this behavior:
225 :    
226 :     PPLib.suppressPidUnderscores: controls printing of pids
227 :     PPLib.suppressPidGlobalUnderscores: controls printing of pids for global objects
228 :     PPLib.suppressTidUnderscores: controls printing of tids.
229 :    
230 :     Files:
231 :     pp/pp-ast-fn.sml : the generic pretty printing code for ast.
232 :     pp/pp-ast-sig.sml: pretty printing signature.
233 :     pp/pp-ast.sml: default pretty printer
234 :     pp/pp-ast-adornment.sml: pretty printer for printing ast interspersed with adornment info.
235 :     pp/pp-lib.sml: pretty printing for identifiers; some pretty printing combinators.
236 :    
237 :    
238 :     3.4 AST-UTILS [Not distributed yet]
239 :    
240 :     Files:
241 :     ast-utils/copy/: copying ast types
242 :     ast-utils/equality/: equality for ast types
243 :     ast-utils/simplifier/: ast simplifier
244 :    
245 :    
246 :     5. LOCATION INFO
247 :     ----------------

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0