Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/smlnj-lib/RegExp/README
ViewVC logotype

Annotation of /sml/trunk/src/smlnj-lib/RegExp/README

Parent Directory Parent Directory | Revision Log Revision Log


Revision 104 - (view) (download)

1 : monnier 104 This is a regular expressions library. It is based on a decoupling
2 :     of the surface syntax used to specify regular expressions and the
3 :     backend engine that implements the matcher. An abstract syntax is used
4 :     to communicate between the front end and the back end of the system,
5 :    
6 :    
7 :     USING REGULAR EXPRESSIONS
8 :    
9 :     Given a structure S1 describing a surface syntax and a structure S2
10 :     describing a matching engine, a regular expression package can be
11 :     defined by applying the functor:
12 :    
13 :     RegExpFn (structure P=S1 structure E=S2) : REGEXP
14 :    
15 :     To match a regular expression, one first needs to compile a
16 :     representation in the surface syntax. The type of a compiled regular
17 :     expression is given in the REGEXP signature as:
18 :    
19 :     type regexp
20 :    
21 :     Two functions are provided in
22 :     the REGEXP signature:
23 :    
24 :     val compile : (char,'a) StringCvt.reader -> (regexp, 'a) StringCvt.reader
25 :     val compileString : string -> regexp
26 :    
27 :     The function compile is a regexp reader, while compileString is
28 :     specialized to strings.
29 :    
30 :     Once a regular expression has been compiled, three functions are
31 :     provided to perform the matching:
32 :    
33 :     val find :
34 :     val prefix : [[ See types in Glue/regexp-sig.sml ]]
35 :     val match :
36 :    
37 :     The function find returns a reader that searches a stream and attempts
38 :     to match the given regular expression. The function prefix returns a
39 :     reader that attempts to match the regular expression at the current
40 :     position in the stream. The function match takes a list of regular
41 :     expressions and functions and returns a reader that attempts to match
42 :     one of the regular expressions at the current position in the
43 :     stream. The function corresponding to the matched regular expression
44 :     is invoked on the matching information.
45 :    
46 :    
47 :     MATCHING INFORMATION
48 :    
49 :     Once a match is found, it is returned as a match_tree datatype
50 :     (defined in Glue/match-tree.sml). This is a hierarchical structure
51 :     describing the matches of the various subexpressions appearing in the
52 :     matched regular expression. A match for an expression is a record
53 :     containing the position of the match and its length. The root of the
54 :     structure always described the outermost match (the whole string
55 :     matched by the regular expression). See the file Glue/match-tree.sml
56 :     for more details.
57 :    
58 :    
59 :     ROADMAP
60 :    
61 :     FrontEnd/ Implementation of various surface syntaxes
62 :     BackEnd/ Implementation of various matching engines
63 :     Glue/ Glue code
64 :    
65 :    
66 :     FRONT ENDS
67 :    
68 :     A single surface syntax is currently implemented, providing an AWK
69 :     syntax to describe regular expressions. See the file
70 :     FrontEnd/awk-syntax.sml for a description of the actual syntax.
71 :    
72 :    
73 :     BACK ENDS
74 :    
75 :     Two matching engines are implemented:
76 :     1) A backtracking engine (BackEnd/bt-engine.sml), that provides full
77 :     submatching information. Slow, low memory footprint, low startup
78 :     cost.
79 :     2) An automaton-based engine (BackEnd/dfa-engine.sml), that provides
80 :     only top-level matching information (the string matched). Fast,
81 :     but memory-intensive and high startup cost (the cost of
82 :     constructing the automaton in the first place)
83 :    

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0