SML/NJ Library Manual


This is a regular expressions library. It is based on a decoupling of the surface syntax used to specify regular expressions (the frontend) and the engine that implements the matcher (the matcher). An abstract syntax is used to communicate between the front end and the back end of the system.

Given a structure S1 describing a surface syntax and a structure S2 describing a matching engine, a regular expression package can be defined by applying the functor RegExpFn:

RegExpFn (structure P=S1  structure E=S2) : REGEXP

To match a regular expression, one first needs to compile a representation in the surface syntax. The type of a compiled regular expression is given in the REGEXP signature as:

type regexp

Once a regular expression has been compiled, three functions are provided to perform the matching, find, prefix and match. These functions operate on readers as defined in the StringCvt structure of the Basis Library. A reader of type ('a,'b) reader is a function 'b -> ('a,'b) option taking a stream of type 'b and returning an element of type 'a and the remainder of the stream, or NONE if the end of the stream is reached.

The function find returns a reader that searches a stream and attempts to match the given regular expression. The function prefix returns a reader that attempts to match the regular expression at the current position in the stream. The function match takes a list of regular expressions and functions and returns a reader that attempts to match one of the regular expressions at the current position in the stream. The function corresponding to the matched regular expression is invoked on the matching information.

Once a match is found, it is returned as a match_tree datatype This is a hierarchical structure describing the matches of the various subexpressions appearing in the matched regular expression. A match for an expression is a record containing the position of the match and its length. The root of the structure always describes the outermost match (the whole string matched by the regular expression).

[ Top | Parent | Contents | Index | Root ]

Last Modified June 1, 1998
Comments to Riccardo Pucella.
Copyright © 1998 Bell Labs, Lucent Technologies

Click to toggle
does not end with </html> tag
does not end with </body> tag
The output has ended thus: Pucella.</A></I><BR> Copyright &copy; 1998 Bell Labs, Lucent Technologies <BR> <HR> </BODY></HTML>