Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /ml-lex/trunk/lexgen.sml
ViewVC logotype

Annotation of /ml-lex/trunk/lexgen.sml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1862 - (view) (download)
Original Path: sml/trunk/src/ml-lex/lexgen.sml

1 : monnier 249 (* Lexical analyzer generator for Standard ML.
2 :     Version 1.7.0, June 1998
3 :    
4 :     Copyright (c) 1989-1992 by Andrew W. Appel,
5 :     David R. Tarditi, James S. Mattson
6 :    
7 :     This software comes with ABSOLUTELY NO WARRANTY.
8 :     This software is subject only to the PRINCETON STANDARD ML SOFTWARE LIBRARY
9 :     COPYRIGHT NOTICE, LICENSE AND DISCLAIMER, (in the file "COPYRIGHT",
10 :     distributed with this software). You may copy and distribute this software;
11 :     see the COPYRIGHT NOTICE for details and restrictions.
12 :    
13 :     Changes:
14 :     07/25/89 (drt): added %header declaration, code to place
15 :     user declarations at same level as makeLexer, etc.
16 :     This is needed for the parser generator.
17 :     /10/89 (appel): added %arg declaration (see lexgen.doc).
18 :     /04/90 (drt): fixed following bug: couldn't use the lexer after an
19 :     error occurred -- NextTok and inquote weren't being reset
20 :     10/22/91 (drt): disabled use of lookahead
21 :     10/23/92 (drt): disabled use of $ operator (which involves lookahead),
22 :     added handlers for dictionary lookup routine
23 :     11/02/92 (drt): changed handler for exception Reject in generated lexer
24 :     to Internal.Reject
25 :     02/01/94 (appel): Moved the exception handler for Reject in such
26 :     a way as to allow tail-recursion (improves performance
27 :     wonderfully!).
28 :     02/01/94 (appel): Fixed a bug in parsing of state names.
29 :     05/19/94 (Mikael Pettersson, mpe@ida.liu.se):
30 :     Transition tables are usually represented as strings, but
31 :     when the range is too large, int vectors constructed by
32 :     code like "Vector.vector[1,2,3,...]" are used instead.
33 :     The problem with this isn't that the vector itself takes
34 :     a lot of space, but that the code generated by SML/NJ to
35 :     construct the intermediate list at run-time is *HUGE*. My
36 :     fix is to encode an int vector as a string literal (using
37 :     two bytes per int) and emit code to decode the string to
38 :     a vector at run-time. SML/NJ compiles string literals into
39 :     substrings in the code, so this uses much less space.
40 :     06/02/94 (jhr): Modified export-lex.sml to conform to new installation
41 :     scheme. Also removed tab characters from string literals.
42 :     10/05/94 (jhr): Changed generator to produce code that uses the new
43 :     basis style strings and characters.
44 :     10/06/94 (jhr) Modified code to compile under new basis style strings
45 :     and characters.
46 :     02/08/95 (jhr) Modified to use new List module interface.
47 :     05/18/95 (jhr) changed Vector.vector to Vector.fromList
48 : blume 656
49 : monnier 249 * Revision 1.9 1998/01/06 19:23:53 appel
50 :     * added %posarg feature to permit position-within-file to be passed
51 :     * as a parameter to makeLexer
52 :     *
53 :     # Revision 1.8 1998/01/06 19:01:48 appel
54 :     # repaired error messages like "cannot have both %structure and %header"
55 :     #
56 :     # Revision 1.7 1998/01/06 18:55:49 appel
57 :     # permit %% to be unescaped within regular expressions
58 :     #
59 :     # Revision 1.6 1998/01/06 18:46:13 appel
60 :     # removed undocumented feature that permitted extra %% at end of rules
61 :     #
62 :     # Revision 1.5 1998/01/06 18:29:23 appel
63 :     # put yylineno variable inside makeLexer function
64 :     #
65 :     # Revision 1.4 1998/01/06 18:19:59 appel
66 :     # check for newline inside quoted string
67 :     #
68 :     # Revision 1.3 1997/10/04 03:52:13 dbm
69 :     # Fix to remove output file if ml-lex fails.
70 :     #
71 : jhr 1286 10/17/02 (jhr) changed bad character error message to properly
72 :     print the bad character.
73 :     10/17/02 (jhr) fixed skipws to use Char.isSpace test.
74 : jhr 1832 07/27/05 (jhr) add \r as a recognized escape sequence.
75 : monnier 249 *)
76 :    
77 :     (* Subject: lookahead in sml-lex
78 :     Reply-to: david.tarditi@CS.CMU.EDU
79 :     Date: Mon, 21 Oct 91 14:13:26 -0400
80 :    
81 :     There is a serious bug in the implementation of lookahead,
82 :     as done in sml-lex, and described in Aho, Sethi, and Ullman,
83 :     p. 134 "Implementing the Lookahead Operator"
84 :    
85 :     We have disallowed the use of lookahead for now because
86 :     of this bug.
87 :    
88 :     As a counter-example to the implementation described in
89 :     ASU, consider the following specification with the
90 :     input string "aba" (this example is taken from
91 :     a comp.compilers message from Dec. 1989, I think):
92 :    
93 :     type lexresult=unit
94 :     val linenum = ref 1
95 :     fun error x = TextIO.output(TextIO.stdErr, x ^ "\n")
96 :     val eof = fn () => ()
97 :     %%
98 :     %structure Lex
99 :     %%
100 :     (a|ab)/ba => (print yytext; print "\n"; ());
101 :    
102 :     The ASU proposal works as follows. Suppose that we are
103 :     using NFA's to represent our regular expressions. Then to
104 :     build an NFA for e1 / e2, we build an NFA n1 for e1
105 :     and an NFA n2 for e2, and add an epsilon transition
106 :     from e1 to e2.
107 :    
108 :     When lexing, when we encounter the end state of e1e2,
109 :     we take as the end of the string the position in
110 :     the string that was the last occurrence of the state of
111 :     the NFA having a transition on the epsilon introduced
112 :     for /.
113 :    
114 :     Using the example we have above, we'll have an NFA
115 :     with the following states:
116 :    
117 :    
118 :     1 -- a --> 2 -- b --> 3
119 :     | |
120 :     | epsilon | epsilon
121 :     | |
122 :     |------------> 4 -- b --> 5 -- a --> 6
123 :    
124 :     On our example, we get the following list of transitions:
125 :    
126 :     a : 2, 4 (make an epsilon transition from 2 to 4)
127 :     ab : 3, 4, 5 (make an epsilon transition from 3 to 4)
128 :     aba : 6
129 :    
130 :     If we chose the last state in which we made an epsilon transition,
131 :     we'll chose the transition from 3 to 4, and end up with "ab"
132 :     as our token, when we should have "a" as our token.
133 :    
134 :     *)
135 :    
136 :     functor RedBlack(B : sig type key
137 :     val > : key*key->bool
138 :     end):
139 :     sig type tree
140 :     type key
141 :     val empty : tree
142 :     val insert : key * tree -> tree
143 :     val lookup : key * tree -> key
144 :     exception notfound of key
145 :     end =
146 :     struct
147 :     open B
148 :     datatype color = RED | BLACK
149 :     datatype tree = empty | tree of key * color * tree * tree
150 :     exception notfound of key
151 :    
152 :     fun insert (key,t) =
153 :     let fun f empty = tree(key,RED,empty,empty)
154 :     | f (tree(k,BLACK,l,r)) =
155 :     if key>k
156 :     then case f r
157 :     of r as tree(rk,RED, rl as tree(rlk,RED,rll,rlr),rr) =>
158 :     (case l
159 :     of tree(lk,RED,ll,lr) =>
160 :     tree(k,RED,tree(lk,BLACK,ll,lr),
161 :     tree(rk,BLACK,rl,rr))
162 :     | _ => tree(rlk,BLACK,tree(k,RED,l,rll),
163 :     tree(rk,RED,rlr,rr)))
164 :     | r as tree(rk,RED,rl, rr as tree(rrk,RED,rrl,rrr)) =>
165 :     (case l
166 :     of tree(lk,RED,ll,lr) =>
167 :     tree(k,RED,tree(lk,BLACK,ll,lr),
168 :     tree(rk,BLACK,rl,rr))
169 :     | _ => tree(rk,BLACK,tree(k,RED,l,rl),rr))
170 :     | r => tree(k,BLACK,l,r)
171 :     else if k>key
172 :     then case f l
173 :     of l as tree(lk,RED,ll, lr as tree(lrk,RED,lrl,lrr)) =>
174 :     (case r
175 :     of tree(rk,RED,rl,rr) =>
176 :     tree(k,RED,tree(lk,BLACK,ll,lr),
177 :     tree(rk,BLACK,rl,rr))
178 :     | _ => tree(lrk,BLACK,tree(lk,RED,ll,lrl),
179 :     tree(k,RED,lrr,r)))
180 :     | l as tree(lk,RED, ll as tree(llk,RED,lll,llr), lr) =>
181 :     (case r
182 :     of tree(rk,RED,rl,rr) =>
183 :     tree(k,RED,tree(lk,BLACK,ll,lr),
184 :     tree(rk,BLACK,rl,rr))
185 :     | _ => tree(lk,BLACK,ll,tree(k,RED,lr,r)))
186 :     | l => tree(k,BLACK,l,r)
187 :     else tree(key,BLACK,l,r)
188 :     | f (tree(k,RED,l,r)) =
189 :     if key>k then tree(k,RED,l, f r)
190 :     else if k>key then tree(k,RED, f l, r)
191 :     else tree(key,RED,l,r)
192 :     in case f t
193 :     of tree(k,RED, l as tree(_,RED,_,_), r) => tree(k,BLACK,l,r)
194 :     | tree(k,RED, l, r as tree(_,RED,_,_)) => tree(k,BLACK,l,r)
195 :     | t => t
196 :     end
197 :    
198 :    
199 :     fun lookup (key,t) =
200 :     let fun look empty = raise (notfound key)
201 :     | look (tree(k,_,l,r)) =
202 :     if k>key then look l
203 :     else if key>k then look r
204 :     else k
205 :     in look t
206 :     end
207 :    
208 :     end
209 :    
210 :     signature LEXGEN =
211 :     sig
212 :     val lexGen: string -> unit
213 :     end
214 :    
215 :     structure LexGen: LEXGEN =
216 :     struct
217 :     open Array List
218 :     infix 9 sub
219 :    
220 :     datatype token = CHARS of bool array | QMARK | STAR | PLUS | BAR
221 :     | LP | RP | CARAT | DOLLAR | SLASH | STATE of string list
222 :     | REPS of int * int | ID of string | ACTION of string
223 :     | BOF | EOF | ASSIGN | SEMI | ARROW | LEXMARK | LEXSTATES
224 :     | COUNT | REJECT | FULLCHARSET | STRUCT | HEADER | ARG | POSARG
225 :    
226 :     datatype exp = EPS | CLASS of bool array * int | CLOSURE of exp
227 :     | ALT of exp * exp | CAT of exp * exp | TRAIL of int
228 :     | END of int
229 :    
230 :     (* flags describing input Lex spec. - unnecessary code is omitted *)
231 :     (* if possible *)
232 :    
233 :     val CharFormat = ref false;
234 :     val UsesTrailingContext = ref false;
235 :     val UsesPrevNewLine = ref false;
236 :    
237 :     (* flags for various bells & whistles that Lex has. These slow the
238 :     lexer down and should be omitted from production lexers (if you
239 :     really want speed) *)
240 :    
241 :     val CountNewLines = ref false;
242 :     val PosArg = ref false;
243 :     val HaveReject = ref false;
244 :    
245 :     (* Can increase size of character set *)
246 :    
247 :     val CharSetSize = ref 129;
248 :    
249 :     (* Can name structure or declare header code *)
250 :    
251 :     val StrName = ref "Mlex"
252 :     val HeaderCode = ref ""
253 :     val HeaderDecl = ref false
254 :     val ArgCode = ref (NONE: string option)
255 :     val StrDecl = ref false
256 :    
257 :     val ResetFlags = fn () => (CountNewLines := false; HaveReject := false;
258 :     PosArg := false;
259 :     UsesTrailingContext := false;
260 :     CharSetSize := 129; StrName := "Mlex";
261 :     HeaderCode := ""; HeaderDecl:= false;
262 :     ArgCode := NONE;
263 :     StrDecl := false)
264 :    
265 :     val LexOut = ref(TextIO.stdOut)
266 :     fun say x = TextIO.output(!LexOut, x)
267 :    
268 :     (* Union: merge two sorted lists of integers *)
269 :    
270 :     fun union(a,b) = let val rec merge = fn
271 :     (nil,nil,z) => z
272 :     | (nil,el::more,z) => merge(nil,more,el::z)
273 :     | (el::more,nil,z) => merge(more,nil,el::z)
274 :     | (x::morex,y::morey,z) => if (x:int)=(y:int)
275 :     then merge(morex,morey,x::z)
276 :     else if x>y then merge(morex,y::morey,x::z)
277 :     else merge(x::morex,morey,y::z)
278 :     in merge(rev a,rev b,nil)
279 :     end
280 :    
281 :     (* Nullable: compute if a important expression parse tree node is nullable *)
282 :    
283 :     val rec nullable = fn
284 :     EPS => true
285 :     | CLASS(_) => false
286 :     | CLOSURE(_) => true
287 :     | ALT(n1,n2) => nullable(n1) orelse nullable(n2)
288 :     | CAT(n1,n2) => nullable(n1) andalso nullable(n2)
289 :     | TRAIL(_) => true
290 :     | END(_) => false
291 :    
292 :     (* FIRSTPOS: firstpos function for parse tree expressions *)
293 :    
294 :     and firstpos = fn
295 :     EPS => nil
296 :     | CLASS(_,i) => [i]
297 :     | CLOSURE(n) => firstpos(n)
298 :     | ALT(n1,n2) => union(firstpos(n1),firstpos(n2))
299 :     | CAT(n1,n2) => if nullable(n1) then union(firstpos(n1),firstpos(n2))
300 :     else firstpos(n1)
301 :     | TRAIL(i) => [i]
302 :     | END(i) => [i]
303 :    
304 :     (* LASTPOS: Lastpos function for parse tree expressions *)
305 :    
306 :     and lastpos = fn
307 :     EPS => nil
308 :     | CLASS(_,i) => [i]
309 :     | CLOSURE(n) => lastpos(n)
310 :     | ALT(n1,n2) => union(lastpos(n1),lastpos(n2))
311 :     | CAT(n1,n2) => if nullable(n2) then union(lastpos(n1),lastpos(n2))
312 :     else lastpos(n2)
313 :     | TRAIL(i) => [i]
314 :     | END(i) => [i]
315 :     ;
316 :    
317 :     (* ++: Increment an integer reference *)
318 :    
319 :     fun ++(x) : int = (x := !x + 1; !x);
320 :    
321 :     structure dict =
322 :     struct
323 :     type 'a relation = 'a * 'a -> bool
324 :     abstype ('b,'a) dictionary = DATA of { Table : ('b * 'a) list,
325 :     Leq : 'b * 'b -> bool }
326 :     with
327 :     exception LOOKUP
328 :     fun create Leqfunc = DATA { Table = nil, Leq = Leqfunc }
329 :     fun lookup (DATA { Table = entrylist, Leq = leq }) key =
330 :     let fun search [] = raise LOOKUP
331 :     | search((k,item)::entries) =
332 :     if leq(key,k)
333 :     then if leq(k,key) then item else raise LOOKUP
334 :     else search entries
335 :     in search entrylist
336 :     end
337 :     fun enter (DATA { Table = entrylist, Leq = leq })
338 :     (newentry as (key : 'b,item :'a)) : ('b,'a) dictionary =
339 :     let val gt = fn a => fn b => not (leq(a,b))
340 :     val eq = fn k => fn k' => (leq(k,k')) andalso (leq(k',k))
341 :     fun update nil = [ newentry ]
342 :     | update ((entry as (k,_))::entries) =
343 :     if (eq key k) then newentry::entries
344 :     else if gt k key then newentry::(entry::entries)
345 :     else entry::(update entries)
346 :     in DATA { Table = update entrylist, Leq = leq }
347 :     end
348 :     fun listofdict (DATA { Table = entrylist,Leq = leq}) =
349 :     let fun f (nil,r) = rev r
350 :     | f (a::b,r) = f (b,a::r)
351 :     in f(entrylist,nil)
352 :     end
353 :     end
354 :     end
355 :    
356 :     open dict;
357 :    
358 :     (* INPUT.ML : Input w/ one character push back capability *)
359 :    
360 :     val LineNum = ref 1;
361 :    
362 :     abstype ibuf =
363 :     BUF of TextIO.instream * {b : string ref, p : int ref}
364 :     with
365 :     fun make_ibuf(s) = BUF (s, {b=ref"", p = ref 0})
366 :     fun close_ibuf (BUF (s,_)) = TextIO.closeIn(s)
367 :     exception eof
368 :     fun getch (a as (BUF(s,{b,p}))) =
369 :     if (!p = (size (!b)))
370 :     then (b := TextIO.inputN(s, 1024);
371 :     p := 0;
372 :     if (size (!b))=0
373 :     then raise eof
374 :     else getch a)
375 :     else (let val ch = String.sub(!b,!p)
376 :     in (if ch = #"\n"
377 :     then LineNum := !LineNum + 1
378 :     else ();
379 :     p := !p + 1;
380 :     ch)
381 :     end)
382 :     fun ungetch(BUF(s,{b,p})) = (
383 :     p := !p - 1;
384 :     if String.sub(!b,!p) = #"\n"
385 :     then LineNum := !LineNum - 1
386 :     else ())
387 :     end;
388 :    
389 :     exception Error
390 :    
391 :     fun prErr x = (
392 :     TextIO.output (TextIO.stdErr, String.concat [
393 :     "ml-lex: error, line ", (Int.toString (!LineNum)), ": ", x, "\n"
394 :     ]);
395 :     raise Error)
396 :     fun prSynErr x = (
397 :     TextIO.output (TextIO.stdErr, String.concat [
398 :     "ml-lex: syntax error, line ", (Int.toString (!LineNum)), ": ", x, "\n"
399 :     ]);
400 :     raise Error)
401 :    
402 :     exception SyntaxError; (* error in user's input file *)
403 :    
404 :     exception LexError; (* unexpected error in lexer *)
405 :    
406 :     val LexBuf = ref(make_ibuf(TextIO.stdIn));
407 :     val LexState = ref 0;
408 :     val NextTok = ref BOF;
409 :     val inquote = ref false;
410 :    
411 :     fun AdvanceTok () : unit = let
412 :     fun isLetter c =
413 :     ((c >= #"a") andalso (c <= #"z")) orelse
414 :     ((c >= #"A") andalso (c <= #"Z"))
415 :     fun isDigit c = (c >= #"0") andalso (c <= #"9")
416 :     (* check for valid (non-leading) identifier character (added by JHR) *)
417 :     fun isIdentChr c =
418 :     ((isLetter c) orelse (isDigit c) orelse (c = #"_") orelse (c = #"'"))
419 :     fun atoi s = let
420 :     fun num (c::r, n) = if isDigit c
421 :     then num (r, 10*n + (Char.ord c - Char.ord #"0"))
422 :     else n
423 :     | num ([], n) = n
424 :     in
425 :     num (explode s, 0)
426 :     end
427 :    
428 : jhr 1286 fun skipws () = let val ch = nextch()
429 :     in
430 :     if Char.isSpace ch
431 :     then skipws()
432 :     else ch
433 :     end
434 : monnier 249
435 :     and nextch () = getch(!LexBuf)
436 :    
437 :     and escaped () = (case nextch()
438 :     of #"b" => #"\008"
439 :     | #"n" => #"\n"
440 : jhr 1832 | #"r" => #"\r"
441 : monnier 249 | #"t" => #"\t"
442 :     | #"h" => #"\128"
443 :     | x => let
444 :     fun err t = prErr("illegal ascii escape '"^(implode(rev t))^"'")
445 :     fun cvt c = (Char.ord c - Char.ord #"0")
446 :     fun f (n, c, t) = if c=3
447 :     then if n >= (!CharSetSize)
448 :     then err t
449 :     else Char.chr n
450 :     else let val ch=nextch()
451 :     in
452 :     if isDigit ch
453 :     then f(n*10+(cvt ch), c+1, ch::t)
454 :     else err t
455 :     end
456 :     in
457 :     if isDigit x then f(cvt x, 1, [x]) else x
458 :     end
459 :     (* end case *))
460 :    
461 :     and onechar x = let val c = array(!CharSetSize, false)
462 :     in
463 :     update(c, Char.ord(x), true); CHARS(c)
464 :     end
465 :    
466 :     in case !LexState of 0 => let val makeTok = fn () =>
467 :     case skipws()
468 :     (* Lex % operators *)
469 :     of #"%" => (case nextch() of
470 :     #"%" => LEXMARK
471 :     | a => let fun f s =
472 :     let val a = nextch()
473 :     in if isLetter a then f(a::s)
474 :     else (ungetch(!LexBuf);
475 :     implode(rev s))
476 :     end
477 :     in case f [a]
478 :     of "reject" => REJECT
479 :     | "count" => COUNT
480 :     | "full" => FULLCHARSET
481 :     | "s" => LEXSTATES
482 :     | "S" => LEXSTATES
483 :     | "structure" => STRUCT
484 :     | "header" => HEADER
485 :     | "arg" => ARG
486 :     | "posarg" => POSARG
487 :     | _ => prErr "unknown % operator "
488 :     end
489 :     )
490 :     (* semicolon (for end of LEXSTATES) *)
491 :     | #";" => SEMI
492 :     (* anything else *)
493 :     | ch => if isLetter(ch) then
494 :     let fun getID matched =
495 :     let val x = nextch()
496 :     (**** fix by JHR
497 :     in if isLetter(x) orelse isDigit(x) orelse
498 :     x = "_" orelse x = "'"
499 :     ****)
500 :     in if (isIdentChr x)
501 :     then getID (x::matched)
502 :     else (ungetch(!LexBuf); implode(rev matched))
503 :     end
504 :     in ID(getID [ch])
505 :     end
506 : jhr 1286 else prSynErr (String.concat[
507 :     "bad character: \"", Char.toString ch, "\""
508 :     ])
509 : monnier 249 in NextTok := makeTok()
510 :     end
511 :     | 1 => let val rec makeTok = fn () =>
512 :     if !inquote then case nextch() of
513 :     (* inside quoted string *)
514 :     #"\\" => onechar(escaped())
515 :     | #"\"" => (inquote := false; makeTok())
516 :     | #"\n" => (prSynErr "end-of-line inside quoted string";
517 :     inquote := false; makeTok())
518 :     | x => onechar(x)
519 :     else case skipws() of
520 :     (* single character operators *)
521 :     #"?" => QMARK
522 :     | #"*" => STAR
523 :     | #"+" => PLUS
524 :     | #"|" => BAR
525 :     | #"(" => LP
526 :     | #")" => RP
527 :     | #"^" => CARAT
528 :     | #"$" => DOLLAR
529 :     | #"/" => SLASH
530 :     | #";" => SEMI
531 :     | #"." => let val c = array(!CharSetSize,true) in
532 :     update(c,10,false); CHARS(c)
533 :     end
534 :     (* assign and arrow *)
535 :     | #"=" => let val c = nextch() in
536 :     if c = #">" then ARROW else (ungetch(!LexBuf); ASSIGN)
537 :     end
538 :     (* character set *)
539 :     | #"[" => let val rec classch = fn () => let val x = skipws()
540 :     in if x = #"\\" then escaped() else x
541 :     end;
542 :     val first = classch();
543 :     val flag = (first <> #"^");
544 :     val c = array(!CharSetSize,not flag);
545 :     fun add NONE = ()
546 :     | add (SOME x) = update(c, Char.ord(x), flag)
547 :     and range (x, y) = if x>y
548 :     then (prErr "bad char. range")
549 :     else let
550 :     val i = ref(Char.ord(x)) and j = Char.ord(y)
551 :     in while !i<=j do (
552 :     add (SOME(Char.chr(!i)));
553 :     i := !i + 1)
554 :     end
555 :     and getClass last = (case classch()
556 :     of #"]" => (add(last); c)
557 :     | #"-" => (case last
558 :     of NONE => getClass(SOME #"-")
559 :     | (SOME last') => let val x = classch()
560 :     in
561 :     if x = #"]"
562 :     then (add(last); add(SOME #"-"); c)
563 :     else (range(last',x); getClass(NONE))
564 :     end
565 :     (* end case *))
566 :     | x => (add(last); getClass(SOME x))
567 :     (* end case *))
568 :     in CHARS(getClass(if first = #"^" then NONE else SOME first))
569 :     end
570 :     (* Start States specification *)
571 :     | #"<" => let val rec get_state = fn (prev,matched) =>
572 :     case nextch() of
573 :     #">" => matched::prev
574 :     | #"," => get_state(matched::prev,"")
575 :     | x => if isIdentChr(x)
576 :     then get_state(prev,matched ^ String.str x)
577 :     else (prSynErr "bad start state list")
578 :     in STATE(get_state(nil,""))
579 :     end
580 :     (* {id} or repititions *)
581 :     | #"{" => let val ch = nextch() in if isLetter(ch) then
582 :     let fun getID matched = (case nextch()
583 :     of #"}" => matched
584 :     | x => if (isIdentChr x) then
585 :     getID(matched ^ String.str x)
586 :     else (prErr "invalid char. class name")
587 :     (* end case *))
588 :     in ID(getID(String.str ch))
589 :     end
590 :     else if isDigit(ch) then
591 :     let fun get_r (matched, r1) = (case nextch()
592 :     of #"}" => let val n = atoi(matched) in
593 :     if r1 = ~1 then (n,n) else (r1,n)
594 :     end
595 :     | #"," => if r1 = ~1 then get_r("",atoi(matched))
596 :     else (prErr "invalid repetitions spec.")
597 :     | x => if isDigit(x)
598 :     then get_r(matched ^ String.str x,r1)
599 :     else (prErr "invalid char in repetitions spec")
600 :     (* end case *))
601 :     in REPS(get_r(String.str ch,~1))
602 :     end
603 :     else (prErr "bad repetitions spec")
604 :     end
605 :     (* Lex % operators *)
606 :     | #"\\" => onechar(escaped())
607 :     (* start quoted string *)
608 :     | #"\"" => (inquote := true; makeTok())
609 :     (* anything else *)
610 :     | ch => onechar(ch)
611 :     in NextTok := makeTok()
612 :     end
613 : blume 919 | 2 => NextTok :=
614 :     (case skipws() of
615 :     #"(" =>
616 :     let
617 :     fun loop_to_end (backslash, x) =
618 :     let
619 :     val c = getch (! LexBuf)
620 :     val notb = not backslash
621 :     val nstr = c :: x
622 :     in
623 :     case c of
624 :     #"\"" => if notb then nstr
625 :     else loop_to_end (false, nstr)
626 :     | _ => loop_to_end (c = #"\\" andalso notb, nstr)
627 :     end
628 :     fun GetAct (lpct, x) =
629 :     let
630 :     val c = getch (! LexBuf)
631 :     val nstr = c :: x
632 :     in
633 :     case c of
634 :     #"\"" => GetAct (lpct, loop_to_end (false, nstr))
635 :     | #"(" => GetAct (lpct + 1, nstr)
636 :     | #")" => if lpct = 0 then implode (rev x)
637 :     else GetAct(lpct - 1, nstr)
638 :     | _ => GetAct(lpct, nstr)
639 :     end
640 :     in
641 :     ACTION (GetAct (0,nil))
642 :     end
643 :     | #";" => SEMI
644 :     | c => (prSynErr ("invalid character " ^ String.str c)))
645 :     | _ => raise LexError
646 : monnier 249 end
647 :     handle eof => NextTok := EOF ;
648 :    
649 :     fun GetTok (_:unit) : token =
650 :     let val t = !NextTok in AdvanceTok(); t
651 :     end;
652 :     val SymTab = ref (create String.<=) : (string,exp) dictionary ref
653 :    
654 :     fun GetExp () : exp =
655 :    
656 :     let val rec optional = fn e => ALT(EPS,e)
657 :    
658 :     and lookup' = fn name =>
659 :     lookup(!SymTab) name
660 :     handle LOOKUP => prErr ("bad regular expression name: "^
661 :     name)
662 :    
663 :     and newline = fn () => let val c = array(!CharSetSize,false) in
664 :     update(c,10,true); c
665 :     end
666 :    
667 :     and endline = fn e => trail(e,CLASS(newline(),0))
668 :    
669 :     and trail = fn (e1,e2) => CAT(CAT(e1,TRAIL(0)),e2)
670 :    
671 :     and closure1 = fn e => CAT(e,CLOSURE(e))
672 :    
673 :     and repeat = fn (min,max,e) => let val rec rep = fn
674 :     (0,0) => EPS
675 :     | (0,1) => ALT(e,EPS)
676 :     | (0,i) => CAT(rep(0,1),rep(0,i-1))
677 :     | (i,j) => CAT(e,rep(i-1,j-1))
678 :     in rep(min,max)
679 :     end
680 :    
681 :     and exp0 = fn () => case GetTok() of
682 :     CHARS(c) => exp1(CLASS(c,0))
683 :     | LP => let val e = exp0() in
684 :     if !NextTok = RP then
685 :     (AdvanceTok(); exp1(e))
686 :     else (prSynErr "missing '('") end
687 :     | ID(name) => exp1(lookup' name)
688 :     | _ => raise SyntaxError
689 :    
690 :     and exp1 = fn (e) => case !NextTok of
691 :     SEMI => e
692 :     | ARROW => e
693 :     | EOF => e
694 :     | LP => exp2(e,exp0())
695 :     | RP => e
696 :     | t => (AdvanceTok(); case t of
697 :     QMARK => exp1(optional(e))
698 :     | STAR => exp1(CLOSURE(e))
699 :     | PLUS => exp1(closure1(e))
700 :     | CHARS(c) => exp2(e,CLASS(c,0))
701 :     | BAR => ALT(e,exp0())
702 :     | DOLLAR => (UsesTrailingContext := true; endline(e))
703 :     | SLASH => (UsesTrailingContext := true;
704 :     trail(e,exp0()))
705 :     | REPS(i,j) => exp1(repeat(i,j,e))
706 :     | ID(name) => exp2(e,lookup' name)
707 :     | _ => raise SyntaxError)
708 :    
709 :     and exp2 = fn (e1,e2) => case !NextTok of
710 :     SEMI => CAT(e1,e2)
711 :     | ARROW => CAT(e1,e2)
712 :     | EOF => CAT(e1,e2)
713 :     | LP => exp2(CAT(e1,e2),exp0())
714 :     | RP => CAT(e1,e2)
715 :     | t => (AdvanceTok(); case t of
716 :     QMARK => exp1(CAT(e1,optional(e2)))
717 :     | STAR => exp1(CAT(e1,CLOSURE(e2)))
718 :     | PLUS => exp1(CAT(e1,closure1(e2)))
719 :     | CHARS(c) => exp2(CAT(e1,e2),CLASS(c,0))
720 :     | BAR => ALT(CAT(e1,e2),exp0())
721 :     | DOLLAR => (UsesTrailingContext := true;
722 :     endline(CAT(e1,e2)))
723 :     | SLASH => (UsesTrailingContext := true;
724 :     trail(CAT(e1,e2),exp0()))
725 :     | REPS(i,j) => exp1(CAT(e1,repeat(i,j,e2)))
726 :     | ID(name) => exp2(CAT(e1,e2),lookup' name)
727 :     | _ => raise SyntaxError)
728 :     in exp0()
729 :     end;
730 :     val StateTab = ref(create(String.<=)) : (string,int) dictionary ref
731 :    
732 :     val StateNum = ref 0;
733 :    
734 :     fun GetStates () : int list =
735 :    
736 :     let fun add nil sl = sl
737 :     | add (x::y) sl = add y (union ([lookup (!StateTab)(x)
738 :     handle LOOKUP =>
739 :     prErr ("bad state name: "^x)
740 :     ],sl))
741 :    
742 :     fun addall i sl =
743 :     if i <= !StateNum then addall (i+2) (union ([i],sl))
744 :     else sl
745 :    
746 :     fun incall (x::y) = (x+1)::incall y
747 :     | incall nil = nil
748 :    
749 :     fun addincs nil = nil
750 :     | addincs (x::y) = x::(x+1)::addincs y
751 :    
752 :     val state_list =
753 :     case !NextTok of
754 :     STATE s => (AdvanceTok(); LexState := 1; add s nil)
755 :     | _ => addall 1 nil
756 :    
757 :     in case !NextTok
758 :     of CARAT => (LexState := 1; AdvanceTok(); UsesPrevNewLine := true;
759 :     incall state_list)
760 :     | _ => addincs state_list
761 :     end
762 :    
763 :     val LeafNum = ref ~1;
764 :    
765 :     fun renum(e : exp) : exp =
766 :     let val rec label = fn
767 :     EPS => EPS
768 :     | CLASS(x,_) => CLASS(x,++LeafNum)
769 :     | CLOSURE(e) => CLOSURE(label(e))
770 :     | ALT(e1,e2) => ALT(label(e1),label(e2))
771 :     | CAT(e1,e2) => CAT(label(e1),label(e2))
772 :     | TRAIL(i) => TRAIL(++LeafNum)
773 :     | END(i) => END(++LeafNum)
774 :     in label(e)
775 :     end;
776 :    
777 :     exception ParseError;
778 :    
779 :     fun parse() : (string * (int list * exp) list * ((string,string) dictionary)) =
780 :     let val Accept = ref (create String.<=) : (string,string) dictionary ref
781 :     val rec ParseRtns = fn l => case getch(!LexBuf) of
782 :     #"%" => let val c = getch(!LexBuf) in
783 :     if c = #"%" then (implode (rev l))
784 :     else ParseRtns(c :: #"%" :: l)
785 :     end
786 :     | c => ParseRtns(c::l)
787 :     and ParseDefs = fn () =>
788 :     (LexState:=0; AdvanceTok(); case !NextTok of
789 :     LEXMARK => ()
790 :     | LEXSTATES =>
791 :     let fun f () = (case !NextTok of (ID i) =>
792 :     (StateTab := enter(!StateTab)(i,++StateNum);
793 :     ++StateNum; AdvanceTok(); f())
794 :     | _ => ())
795 :     in AdvanceTok(); f ();
796 :     if !NextTok=SEMI then ParseDefs() else
797 :     (prSynErr "expected ';'")
798 :     end
799 :     | ID x => (LexState:=1; AdvanceTok(); if GetTok() = ASSIGN
800 :     then (SymTab := enter(!SymTab)(x,GetExp());
801 :     if !NextTok = SEMI then ParseDefs()
802 :     else (prSynErr "expected ';'"))
803 :     else raise SyntaxError)
804 :     | REJECT => (HaveReject := true; ParseDefs())
805 :     | COUNT => (CountNewLines := true; ParseDefs())
806 :     | FULLCHARSET => (CharSetSize := 256; ParseDefs())
807 :     | HEADER => (LexState := 2; AdvanceTok();
808 :     case GetTok()
809 :     of ACTION s =>
810 :     if (!StrDecl) then
811 :     (prErr "cannot have both %structure and %header \
812 :     \declarations")
813 :     else if (!HeaderDecl) then
814 :     (prErr "duplicate %header declarations")
815 :     else
816 :     (HeaderCode := s; LexState := 0;
817 :     HeaderDecl := true; ParseDefs())
818 :     | _ => raise SyntaxError)
819 :     | POSARG => (PosArg := true; ParseDefs())
820 :     | ARG => (LexState := 2; AdvanceTok();
821 :     case GetTok()
822 :     of ACTION s =>
823 :     (case !ArgCode
824 :     of SOME _ => prErr "duplicate %arg declarations"
825 :     | NONE => ArgCode := SOME s;
826 :     LexState := 0;
827 :     ParseDefs())
828 :     | _ => raise SyntaxError)
829 :     | STRUCT => (AdvanceTok();
830 :     case !NextTok of
831 :     (ID i) =>
832 :     if (!HeaderDecl) then
833 :     (prErr "cannot have both %structure and %header \
834 :     \declarations")
835 :     else if (!StrDecl) then
836 :     (prErr "duplicate %structure declarations")
837 :     else (StrName := i; StrDecl := true)
838 :     | _ => (prErr "expected ID");
839 :     ParseDefs())
840 :     | _ => raise SyntaxError)
841 :     and ParseRules =
842 :     fn rules => (LexState:=1; AdvanceTok(); case !NextTok of
843 :     EOF => rules
844 :     | _ =>
845 :     let val s = GetStates()
846 :     val e = renum(CAT(GetExp(),END(0)))
847 :     in
848 :     if !NextTok = ARROW then
849 :     (LexState:=2; AdvanceTok();
850 :     case GetTok() of ACTION(act) =>
851 :     if !NextTok=SEMI then
852 :     (Accept:=enter(!Accept) (Int.toString (!LeafNum),act);
853 :     ParseRules((s,e)::rules))
854 :     else (prSynErr "expected ';'")
855 :     | _ => raise SyntaxError)
856 :     else (prSynErr "expected '=>'")
857 :     end)
858 :     in let val usercode = ParseRtns nil
859 :     in (ParseDefs(); (usercode,ParseRules(nil),!Accept))
860 :     end
861 :     end handle SyntaxError => (prSynErr "")
862 :    
863 :     fun makebegin () : unit =
864 :     let fun make nil = ()
865 :     | make ((x,n:int)::y)=(say "val "; say x; say " = " ;
866 :     say "STARTSTATE ";
867 :     say (Int.toString n); say ";\n"; make y)
868 :     in say "\n(* start state definitions *)\n\n"; make(listofdict(!StateTab))
869 :     end
870 :    
871 :     structure L =
872 :     struct
873 :     nonfix >
874 :     type key = int list * string
875 :     fun > ((key,item:string),(key',item')) =
876 :     let fun f ((a:int)::a') (b::b') = if Int.> (a,b) then true
877 :     else if a=b then f a' b'
878 :     else false
879 :     | f _ _ = false
880 :     in f key key'
881 :     end
882 :     end
883 :    
884 :     structure RB = RedBlack(L)
885 :    
886 :     fun maketable (fins:(int * (int list)) list,
887 :     tcs :(int * (int list)) list,
888 :     tcpairs: (int * int) list,
889 :     trans : (int*(int list)) list) : unit =
890 :    
891 :     (* Fins = (state #, list of final leaves for the state) list
892 :     tcs = (state #, list of trailing context leaves which begin in this state)
893 :     list
894 :     tcpairs = (trailing context leaf, end leaf) list
895 :     trans = (state #,list of transitions for state) list *)
896 :    
897 :     let datatype elem = N of int | T of int | D of int
898 :     val count = ref 0
899 :     val _ = (if length(trans)<256 then CharFormat := true
900 :     else CharFormat := false;
901 :     if !UsesTrailingContext then
902 :     (say "\ndatatype yyfinstate = N of int | \
903 :     \ T of int | D of int\n")
904 :     else say "\ndatatype yyfinstate = N of int";
905 :     say "\ntype statedata = {fin : yyfinstate list, trans: ";
906 :     case !CharFormat of
907 :     true => say "string}"
908 :     | false => say "int Vector.vector}";
909 :     say "\n(* transition & final state table *)\nval tab = let\n";
910 :     case !CharFormat of
911 :     true => ()
912 :     | false =>
913 :     (say "fun decode s k =\n";
914 :     say " let val k' = k + k\n";
915 :     say " val hi = Char.ord(String.sub(s, k'))\n";
916 :     say " val lo = Char.ord(String.sub(s, k' + 1))\n";
917 :     say " in hi * 256 + lo end\n"))
918 :    
919 :     val newfins =
920 :     let fun IsEndLeaf t =
921 :     let fun f ((l,e)::r) = if (e=t) then true else f r
922 :     | f nil = false in f tcpairs end
923 :    
924 : blume 515 fun GetEndLeaf t =
925 : monnier 249 let fun f ((tl,el)::r) = if (tl=t) then el else f r
926 : blume 515 | f _ = raise Match
927 :     in f tcpairs
928 : monnier 249 end
929 :     fun GetTrConLeaves s =
930 :     let fun f ((s',l)::r) = if (s = s') then l else f r
931 :     | f nil = nil
932 :     in f tcs
933 :     end
934 :     fun sort_leaves s =
935 :     let fun insert (x:int) (a::b) =
936 :     if (x <= a) then x::(a::b)
937 :     else a::(insert x b)
938 :     | insert x nil = [x]
939 :     in List.foldr (fn (x,r) => insert x r) [] s
940 :     end
941 :     fun conv a = if (IsEndLeaf a) then (D a) else (N a)
942 :     fun merge (a::a',b::b') =
943 :     if (a <= b) then (conv a)::merge(a',b::b')
944 :     else (T b)::(merge(a::a',b'))
945 :     | merge (a::a',nil) = (conv a)::(merge (a',nil))
946 :     | merge (nil,b::b') = (T b)::(merge (b',nil))
947 :     | merge (nil,nil) = nil
948 :    
949 :     in map (fn (x,l) =>
950 :     rev (merge (l,
951 :     sort_leaves (map (fn x => GetEndLeaf x) (GetTrConLeaves x)))))
952 :     fins
953 :     end
954 :    
955 :     val rs =
956 :     let open RB
957 :     fun makeItems x =
958 :     let fun emit8(x, pos) =
959 :     let val s = StringCvt.padLeft #"0" 3 (Int.toString x)
960 :     in
961 :     case pos
962 :     of 16 => (say "\\\n\\\\"; say s; 1)
963 :     | _ => (say "\\"; say s; pos+1)
964 :     end
965 :     fun emit16(x, pos) =
966 :     let val hi8 = x div 256
967 :     val lo8 = x - hi8 * 256 (* x rem 256 *)
968 :     in
969 :     emit8(lo8, emit8(hi8, pos))
970 :     end
971 :     fun MakeString([], _, _) = ()
972 :     | MakeString(x::xs, emitter, pos) =
973 :     MakeString(xs, emitter, emitter(x, pos))
974 :     in case !CharFormat of
975 :     true => (say " \n\""; MakeString(x,emit8,0); say "\"\n")
976 :     | false => (say (Int.toString(length x));
977 :     say ", \n\""; MakeString(x,emit16,0); say "\"\n")
978 :     end
979 :    
980 :     fun makeEntry(nil,rs,t) = rev rs
981 :     | makeEntry(((l:int,x)::y),rs,t) =
982 :     let val name = (Int.toString l)
983 :     in let val (r,n) = lookup ((x,name),t)
984 :     in makeEntry(y,(n::rs),t)
985 :     end handle notfound _ =>
986 :     (count := !count+1;
987 :     say " ("; say name; say ",";
988 :     makeItems x; say "),\n";
989 :     makeEntry(y,(name::rs),(insert ((x,name),t))))
990 :     end
991 :    
992 :     val _ = say "val s = [ \n"
993 :     val res = makeEntry(trans,nil,empty)
994 :     val _ =
995 :     case !CharFormat
996 :     of true => (say "(0, \"\")]\n"; say "fun f x = x \n")
997 :     | false => (say "(0, 0, \"\")]\n";
998 :     say "fun f(n, i, x) = (n, Vector.tabulate(i, decode x)) \n")
999 :    
1000 :     val _ = say "val s = map f (rev (tl (rev s))) \n"
1001 :     val _ = say "exception LexHackingError \n"
1002 : blume 515 val _ = say "fun look ((j,x)::r, i: int) = if i = j then x else look(r, i) \n"
1003 : monnier 249 val _ = say " | look ([], i) = raise LexHackingError\n"
1004 :    
1005 :     val _ = say "fun g {fin=x, trans=i} = {fin=x, trans=look(s,i)} \n"
1006 :     in res
1007 :     end
1008 :    
1009 : blume 515 fun makeTable args = let
1010 :     fun makeOne (a, b) = let
1011 :     fun item (N i) = ("N", i)
1012 :     | item (T i) = ("T", i)
1013 :     | item (D i) = ("D", i)
1014 :     fun makeItem x = let
1015 :     val (t, n) = item x
1016 :     in
1017 :     app say ["(", t, " ", Int.toString n, ")"]
1018 :     end
1019 :     fun makeItems [] = ()
1020 :     | makeItems [x] = makeItem x
1021 :     | makeItems (hd :: tl) =
1022 :     (makeItem hd; say ","; makeItems tl)
1023 :     in
1024 :     say "{fin = [";
1025 :     makeItems b;
1026 :     app say ["], trans = ", a, "}"]
1027 :     end
1028 :     fun mt ([], []) = ()
1029 :     | mt ([a], [b]) = makeOne (a, b)
1030 :     | mt (a :: a', b :: b') =
1031 :     (makeOne (a, b); say ",\n"; mt (a', b'))
1032 :     | mt _ = raise Match
1033 :     in
1034 :     mt args
1035 :     end
1036 :    
1037 :     (*
1038 : monnier 249 fun makeTable(nil,nil) = ()
1039 :     | makeTable(a::a',b::b') =
1040 :     let fun makeItems nil = ()
1041 :     | makeItems (hd::tl) =
1042 :     let val (t,n) =
1043 :     case hd of
1044 :     (N i) => ("(N ",i)
1045 :     | (T i) => ("(T ",i)
1046 :     | (D i) => ("(D ",i)
1047 :     in (say t; say (Int.toString n); say ")";
1048 :     if null tl
1049 :     then ()
1050 :     else (say ","; makeItems tl))
1051 :     end
1052 :     in (say "{fin = ["; makeItems b;
1053 :     say "], trans = "; say a; say "}";
1054 :     if null a'
1055 :     then ()
1056 :     else (say ",\n"; makeTable(a',b')))
1057 :     end
1058 : blume 515 *)
1059 : monnier 249
1060 :     fun msg x = TextIO.output(TextIO.stdOut, x)
1061 :    
1062 :     in (say "in Vector.fromList(map g \n["; makeTable(rs,newfins);
1063 :     say "])\nend\n";
1064 :     msg ("\nNumber of states = " ^ (Int.toString (length trans)));
1065 :     msg ("\nNumber of distinct rows = " ^ (Int.toString (!count)));
1066 :     msg ("\nApprox. memory size of trans. table = " ^
1067 :     (Int.toString (!count*(!CharSetSize)*(if !CharFormat then 1 else 8))));
1068 :     msg " bytes\n")
1069 :     end
1070 :    
1071 :     (* makeaccept: Takes a (string,string) dictionary, prints case statement for
1072 :     accepting leaf actions. The key strings are the leaf #'s, the data strings
1073 :     are the actions *)
1074 :    
1075 :     fun makeaccept ends =
1076 :     let fun startline f = if f then say " " else say "| "
1077 :     fun make(nil,f) = (startline f; say "_ => raise Internal.LexerError\n")
1078 :     | make((x,a)::y,f) = (startline f; say x; say " => ";
1079 : mblume 1862 if Substring.size(#2 (Substring.position "yytext" (Substring.full a))) = 0
1080 : monnier 249 then
1081 :     (say "("; say a; say ")")
1082 :     else (say "let val yytext=yymktext() in ";
1083 :     say a; say " end");
1084 :     say "\n"; make(y,false))
1085 :     in make (listofdict(ends),true)
1086 :     end
1087 :    
1088 :     fun leafdata(e:(int list * exp) list) =
1089 :     let val fp = array(!LeafNum + 1,nil)
1090 :     and leaf = array(!LeafNum + 1,EPS)
1091 :     and tcpairs = ref nil
1092 :     and trailmark = ref ~1;
1093 :     val rec add = fn
1094 :     (nil,x) => ()
1095 :     | (hd::tl,x) => (update(fp,hd,union(fp sub hd,x));
1096 :     add(tl,x))
1097 :     and moredata = fn
1098 :     CLOSURE(e1) =>
1099 :     (moredata(e1); add(lastpos(e1),firstpos(e1)))
1100 :     | ALT(e1,e2) => (moredata(e1); moredata(e2))
1101 :     | CAT(e1,e2) => (moredata(e1); moredata(e2);
1102 :     add(lastpos(e1),firstpos(e2)))
1103 :     | CLASS(x,i) => update(leaf,i,CLASS(x,i))
1104 :     | TRAIL(i) => (update(leaf,i,TRAIL(i)); if !trailmark = ~1
1105 :     then trailmark := i else ())
1106 :     | END(i) => (update(leaf,i,END(i)); if !trailmark <> ~1
1107 :     then (tcpairs := (!trailmark,i)::(!tcpairs);
1108 :     trailmark := ~1) else ())
1109 :     | _ => ()
1110 :     and makedata = fn
1111 :     nil => ()
1112 :     | (_,x)::tl => (moredata(x);makedata(tl))
1113 :     in trailmark := ~1; makedata(e); (fp,leaf,!tcpairs)
1114 :     end;
1115 :    
1116 :     fun makedfa(rules) =
1117 :     let val StateTab = ref (create(String.<=)) : (string,int) dictionary ref
1118 :     val fintab = ref (create(Int.<=)) : (int,(int list)) dictionary ref
1119 :     val transtab = ref (create(Int.<=)) : (int,int list) dictionary ref
1120 :     val tctab = ref (create(Int.<=)) : (int,(int list)) dictionary ref
1121 :     val (fp, leaf, tcpairs) = leafdata(rules);
1122 :    
1123 :     fun visit (state,statenum) =
1124 :     let val transitions = gettrans(state) in
1125 :     fintab := enter(!fintab)(statenum,getfin(state));
1126 :     tctab := enter(!tctab)(statenum,gettc(state));
1127 :     transtab := enter(!transtab)(statenum,transitions)
1128 :     end
1129 :    
1130 :     and visitstarts (states) =
1131 :     let fun vs nil i = ()
1132 :     | vs (hd::tl) i = (visit (hd,i); vs tl (i+1))
1133 :     in vs states 0
1134 :     end
1135 :    
1136 :     and hashstate(s: int list) =
1137 :     let val rec hs =
1138 :     fn (nil,z) => z
1139 :     | ((x:int)::y,z) => hs(y,z ^ " " ^ (Int.toString x))
1140 :     in hs(s,"")
1141 :     end
1142 :    
1143 :     and find(s) = lookup(!StateTab)(hashstate(s))
1144 :    
1145 :     and add(s,n) = StateTab := enter(!StateTab)(hashstate(s),n)
1146 :    
1147 :     and getstate (state) =
1148 :     find(state)
1149 :     handle LOOKUP => let val n = ++StateNum in
1150 :     add(state,n); visit(state,n); n
1151 :     end
1152 :    
1153 :     and getfin state =
1154 :     let fun f nil fins = fins
1155 :     | f (hd::tl) fins =
1156 :     case (leaf sub hd)
1157 :     of END _ => f tl (hd::fins)
1158 :     | _ => f tl fins
1159 :     in f state nil
1160 :     end
1161 :    
1162 :     and gettc state =
1163 :     let fun f nil fins = fins
1164 :     | f (hd::tl) fins =
1165 :     case (leaf sub hd)
1166 :     of TRAIL _ => f tl (hd::fins)
1167 :     | _ => f tl fins
1168 :     in f state nil
1169 :     end
1170 :    
1171 :     and gettrans (state) =
1172 :     let fun loop c tlist =
1173 :     let fun cktrans nil r = r
1174 :     | cktrans (hd::tl) r =
1175 :     case (leaf sub hd) of
1176 :     CLASS(i,_)=>
1177 :     (if (i sub c) then cktrans tl (union(r,fp sub hd))
1178 :     else cktrans tl r handle Subscript =>
1179 :     cktrans tl r
1180 :     )
1181 :     | _ => cktrans tl r
1182 :     in if c >= 0 then
1183 :     let val v=cktrans state nil
1184 :     in loop (c-1) (if v=nil then 0::tlist else (getstate v)::tlist)
1185 :     end
1186 :     else tlist
1187 :     end
1188 :     in loop ((!CharSetSize) - 1) nil
1189 :     end
1190 :    
1191 :     and startstates() =
1192 :     let val startarray = array(!StateNum + 1, nil);
1193 :     fun listofarray(a,n) =
1194 :     let fun f i l = if i >= 0 then f (i-1) ((a sub i)::l) else l
1195 :     in f (n-1) nil end
1196 :     val rec makess = fn
1197 :     nil => ()
1198 :     | (startlist,e)::tl => (fix(startlist,firstpos(e));makess(tl))
1199 :     and fix = fn
1200 :     (nil,_) => ()
1201 :     | (s::tl,firsts) => (update(startarray,s,
1202 :     union(firsts,startarray sub s));
1203 :     fix(tl,firsts))
1204 :     in makess(rules);listofarray(startarray, !StateNum + 1)
1205 :     end
1206 :    
1207 :     in visitstarts(startstates());
1208 :     (listofdict(!fintab),listofdict(!transtab),listofdict(!tctab),tcpairs)
1209 :     end
1210 :    
1211 :     val skel_hd =
1212 :     " struct\n\
1213 :     \ structure UserDeclarations =\n\
1214 :     \ struct\n\
1215 :     \"
1216 :    
1217 :     val skel_mid2 =
1218 :     " | Internal.D k => action (i,(acts::l),k::rs)\n\
1219 :     \ | Internal.T k =>\n\
1220 :     \ let fun f (a::b,r) =\n\
1221 :     \ if a=k\n\
1222 :     \ then action(i,(((Internal.N a)::acts)::l),(b@r))\n\
1223 :     \ else f (b,a::r)\n\
1224 :     \ | f (nil,r) = action(i,(acts::l),rs)\n\
1225 :     \ in f (rs,nil)\n\
1226 :     \ end\n\
1227 :     \"
1228 :    
1229 :     fun lexGen(infile) =
1230 :     let val outfile = infile ^ ".sml"
1231 :     fun PrintLexer (ends) =
1232 :     let val sayln = fn x => (say x; say "\n")
1233 :     in case !ArgCode
1234 :     of NONE => (sayln "fun lex () : Internal.result =";
1235 :     sayln "let fun continue() = lex() in")
1236 :     | SOME s => (say "fun lex "; say "(yyarg as ("; say s; sayln ")) =";
1237 :     sayln "let fun continue() : Internal.result = ");
1238 :     say " let fun scan (s,AcceptingLeaves : Internal.yyfinstate";
1239 :     sayln " list list,l,i0) =";
1240 :     if !UsesTrailingContext
1241 :     then say "\tlet fun action (i,nil,rs)"
1242 :     else say "\tlet fun action (i,nil)";
1243 :     sayln " = raise LexError";
1244 :     if !UsesTrailingContext
1245 :     then sayln "\t| action (i,nil::l,rs) = action(i-1,l,rs)"
1246 :     else sayln "\t| action (i,nil::l) = action (i-1,l)";
1247 :     if !UsesTrailingContext
1248 :     then sayln "\t| action (i,(node::acts)::l,rs) ="
1249 :     else sayln "\t| action (i,(node::acts)::l) =";
1250 :     sayln "\t\tcase node of";
1251 :     sayln "\t\t Internal.N yyk => ";
1252 :     sayln "\t\t\t(let fun yymktext() = substring(!yyb,i0,i-i0)\n\
1253 :     \\t\t\t val yypos = i0+ !yygone";
1254 :     if !CountNewLines
1255 : mblume 1350 then (sayln "\t\t\tval _ = yylineno := CharVectorSlice.foldli";
1256 :     sayln "\t\t\t\t(fn (_,#\"\\n\", n) => n+1 | (_,_, n) => n) (!yylineno) (CharVectorSlice.slice (!yyb,i0,SOME(i-i0)))")
1257 : monnier 249 else ();
1258 :     if !HaveReject
1259 :     then (say "\t\t\tfun REJECT() = action(i,acts::l";
1260 :     if !UsesTrailingContext
1261 :     then sayln ",rs)" else sayln ")")
1262 :     else ();
1263 :     sayln "\t\t\topen UserDeclarations Internal.StartStates";
1264 :     sayln " in (yybufpos := i; case yyk of ";
1265 :     sayln "";
1266 :     sayln "\t\t\t(* Application actions *)\n";
1267 :     makeaccept(ends);
1268 :     say "\n\t\t) end ";
1269 :     say ")\n\n";
1270 :     if (!UsesTrailingContext) then say skel_mid2 else ();
1271 :     sayln "\tval {fin,trans} = Unsafe.Vector.sub(Internal.tab, s)";
1272 :     sayln "\tval NewAcceptingLeaves = fin::AcceptingLeaves";
1273 :     sayln "\tin if l = !yybl then";
1274 :     sayln "\t if trans = #trans(Vector.sub(Internal.tab,0))";
1275 :     sayln "\t then action(l,NewAcceptingLeaves";
1276 :     if !UsesTrailingContext then say ",nil" else ();
1277 :     say ") else";
1278 :     sayln "\t let val newchars= if !yydone then \"\" else yyinput 1024";
1279 :     sayln "\t in if (size newchars)=0";
1280 :     sayln "\t\t then (yydone := true;";
1281 :     say "\t\t if (l=i0) then UserDeclarations.eof ";
1282 :     sayln (case !ArgCode of NONE => "()" | SOME _ => "yyarg");
1283 :     say "\t\t else action(l,NewAcceptingLeaves";
1284 :     if !UsesTrailingContext then
1285 :     sayln ",nil))" else sayln "))";
1286 :     sayln "\t\t else (if i0=l then yyb := newchars";
1287 :     sayln "\t\t else yyb := substring(!yyb,i0,l-i0)^newchars;";
1288 :     sayln "\t\t yygone := !yygone+i0;";
1289 :     sayln "\t\t yybl := size (!yyb);";
1290 :     sayln "\t\t scan (s,AcceptingLeaves,l-i0,0))";
1291 :     sayln "\t end";
1292 :     sayln "\t else let val NewChar = Char.ord(Unsafe.CharVector.sub(!yyb,l))";
1293 :     if !CharSetSize=129
1294 :     then sayln "\t\tval NewChar = if NewChar<128 then NewChar else 128"
1295 :     else ();
1296 :     say "\t\tval NewState = ";
1297 :     sayln (if !CharFormat
1298 :     then "Char.ord(Unsafe.CharVector.sub(trans,NewChar))"
1299 :     else "Unsafe.Vector.sub(trans, NewChar)");
1300 :     say "\t\tin if NewState=0 then action(l,NewAcceptingLeaves";
1301 :     if !UsesTrailingContext then sayln ",nil)" else sayln ")";
1302 :     sayln "\t\telse scan(NewState,NewAcceptingLeaves,l+1,i0)";
1303 :     sayln "\tend";
1304 :     sayln "\tend";
1305 :     if !UsesPrevNewLine then () else sayln "(*";
1306 :     sayln "\tval start= if substring(!yyb,!yybufpos-1,1)=\"\\n\"";
1307 :     sayln "then !yybegin+1 else !yybegin";
1308 :     if !UsesPrevNewLine then () else sayln "*)";
1309 :     say "\tin scan(";
1310 :     if !UsesPrevNewLine then say "start"
1311 :     else say "!yybegin (* start *)";
1312 :     sayln ",nil,!yybufpos,!yybufpos)";
1313 :     sayln " end";
1314 :     sayln (case !ArgCode of NONE => "end" | SOME _ => "in continue end");
1315 :     sayln " in lex";
1316 :     sayln " end";
1317 :     sayln "end"
1318 :     end
1319 :    
1320 :     in (UsesPrevNewLine := false;
1321 :     ResetFlags();
1322 :     LexBuf := make_ibuf(TextIO.openIn infile);
1323 :     NextTok := BOF;
1324 :     inquote := false;
1325 :     LexOut := TextIO.openOut(outfile);
1326 :     StateNum := 2;
1327 :     LineNum := 1;
1328 :     StateTab := enter(create(String.<=))("INITIAL",1);
1329 :     LeafNum := ~1;
1330 :     let
1331 :     val (user_code,rules,ends) =
1332 :     parse() handle x =>
1333 :     (close_ibuf(!LexBuf);
1334 :     TextIO.closeOut(!LexOut);
1335 :     OS.FileSys.remove outfile;
1336 :     raise x)
1337 :     val (fins,trans,tctab,tcpairs) = makedfa(rules)
1338 :     val _ = if !UsesTrailingContext then
1339 :     (close_ibuf(!LexBuf);
1340 :     TextIO.closeOut(!LexOut);
1341 :     OS.FileSys.remove outfile;
1342 :     prErr "lookahead is unimplemented")
1343 :     else ()
1344 :     in
1345 :     if (!HeaderDecl)
1346 :     then say (!HeaderCode)
1347 :     else say ("structure " ^ (!StrName));
1348 :     say "=\n";
1349 :     say skel_hd;
1350 :     say user_code;
1351 :     say "end (* end of user routines *)\n";
1352 :     say "exception LexError (* raised if illegal leaf ";
1353 :     say "action tried *)\n";
1354 :     say "structure Internal =\n\tstruct\n";
1355 :     maketable(fins,tctab,tcpairs,trans);
1356 :     say "structure StartStates =\n\tstruct\n";
1357 :     say "\tdatatype yystartstate = STARTSTATE of int\n";
1358 :     makebegin();
1359 :     say "\nend\n";
1360 :     say "type result = UserDeclarations.lexresult\n";
1361 :     say "\texception LexerError (* raised if illegal leaf ";
1362 :     say "action tried *)\n";
1363 :     say "end\n\n";
1364 :     say (if (!PosArg) then "fun makeLexer (yyinput,yygone0:int) =\nlet\n"
1365 :     else "fun makeLexer yyinput =\nlet\tval yygone0=1\n");
1366 :     if !CountNewLines then say "\tval yylineno = ref 0\n\n" else ();
1367 :     say "\tval yyb = ref \"\\n\" \t\t(* buffer *)\n\
1368 :     \\tval yybl = ref 1\t\t(*buffer length *)\n\
1369 :     \\tval yybufpos = ref 1\t\t(* location of next character to use *)\n\
1370 :     \\tval yygone = ref yygone0\t(* position in file of beginning of buffer *)\n\
1371 :     \\tval yydone = ref false\t\t(* eof found yet? *)\n\
1372 :     \\tval yybegin = ref 1\t\t(*Current 'start state' for lexer *)\n\
1373 :     \\n\tval YYBEGIN = fn (Internal.StartStates.STARTSTATE x) =>\n\
1374 :     \\t\t yybegin := x\n\n";
1375 :     PrintLexer(ends);
1376 :     close_ibuf(!LexBuf);
1377 :     TextIO.closeOut(!LexOut)
1378 :     end)
1379 :     end
1380 :     end

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0