SCM Repository
Annotation of /sml/trunk/src/smlnj-lib/RegExp/README
Parent Directory
|
Revision Log
Revision 104 - (view) (download)
1 : | monnier | 104 | This is a regular expressions library. It is based on a decoupling |
2 : | of the surface syntax used to specify regular expressions and the | ||
3 : | backend engine that implements the matcher. An abstract syntax is used | ||
4 : | to communicate between the front end and the back end of the system, | ||
5 : | |||
6 : | |||
7 : | USING REGULAR EXPRESSIONS | ||
8 : | |||
9 : | Given a structure S1 describing a surface syntax and a structure S2 | ||
10 : | describing a matching engine, a regular expression package can be | ||
11 : | defined by applying the functor: | ||
12 : | |||
13 : | RegExpFn (structure P=S1 structure E=S2) : REGEXP | ||
14 : | |||
15 : | To match a regular expression, one first needs to compile a | ||
16 : | representation in the surface syntax. The type of a compiled regular | ||
17 : | expression is given in the REGEXP signature as: | ||
18 : | |||
19 : | type regexp | ||
20 : | |||
21 : | Two functions are provided in | ||
22 : | the REGEXP signature: | ||
23 : | |||
24 : | val compile : (char,'a) StringCvt.reader -> (regexp, 'a) StringCvt.reader | ||
25 : | val compileString : string -> regexp | ||
26 : | |||
27 : | The function compile is a regexp reader, while compileString is | ||
28 : | specialized to strings. | ||
29 : | |||
30 : | Once a regular expression has been compiled, three functions are | ||
31 : | provided to perform the matching: | ||
32 : | |||
33 : | val find : | ||
34 : | val prefix : [[ See types in Glue/regexp-sig.sml ]] | ||
35 : | val match : | ||
36 : | |||
37 : | The function find returns a reader that searches a stream and attempts | ||
38 : | to match the given regular expression. The function prefix returns a | ||
39 : | reader that attempts to match the regular expression at the current | ||
40 : | position in the stream. The function match takes a list of regular | ||
41 : | expressions and functions and returns a reader that attempts to match | ||
42 : | one of the regular expressions at the current position in the | ||
43 : | stream. The function corresponding to the matched regular expression | ||
44 : | is invoked on the matching information. | ||
45 : | |||
46 : | |||
47 : | MATCHING INFORMATION | ||
48 : | |||
49 : | Once a match is found, it is returned as a match_tree datatype | ||
50 : | (defined in Glue/match-tree.sml). This is a hierarchical structure | ||
51 : | describing the matches of the various subexpressions appearing in the | ||
52 : | matched regular expression. A match for an expression is a record | ||
53 : | containing the position of the match and its length. The root of the | ||
54 : | structure always described the outermost match (the whole string | ||
55 : | matched by the regular expression). See the file Glue/match-tree.sml | ||
56 : | for more details. | ||
57 : | |||
58 : | |||
59 : | ROADMAP | ||
60 : | |||
61 : | FrontEnd/ Implementation of various surface syntaxes | ||
62 : | BackEnd/ Implementation of various matching engines | ||
63 : | Glue/ Glue code | ||
64 : | |||
65 : | |||
66 : | FRONT ENDS | ||
67 : | |||
68 : | A single surface syntax is currently implemented, providing an AWK | ||
69 : | syntax to describe regular expressions. See the file | ||
70 : | FrontEnd/awk-syntax.sml for a description of the actual syntax. | ||
71 : | |||
72 : | |||
73 : | BACK ENDS | ||
74 : | |||
75 : | Two matching engines are implemented: | ||
76 : | 1) A backtracking engine (BackEnd/bt-engine.sml), that provides full | ||
77 : | submatching information. Slow, low memory footprint, low startup | ||
78 : | cost. | ||
79 : | 2) An automaton-based engine (BackEnd/dfa-engine.sml), that provides | ||
80 : | only top-level matching information (the string matched). Fast, | ||
81 : | but memory-intensive and high startup cost (the cost of | ||
82 : | constructing the automaton in the first place) | ||
83 : |
root@smlnj-gforge.cs.uchicago.edu | ViewVC Help |
Powered by ViewVC 1.0.0 |