Home My Page Projects Code Snippets Project Openings SML/NJ
 Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

# SCM Repository

[smlnj] View of /sml/trunk/src/cm/Doc/12-moretools.tex
 [smlnj] / sml / trunk / src / cm / Doc / 12-moretools.tex

# View of /sml/trunk/src/cm/Doc/12-moretools.tex

Wed Nov 21 21:03:17 2001 UTC (18 years, 4 months ago) by blume
File size: 26791 byte(s)
Release 110.37 -- see HISTORY

% -*- latex -*-

\section{Extending the tool set}
\label{sec:moretools}

CM's tool set is extensible: new tools can be added by writing a few
lines of ML code.  The necessary hooks for this are provided by a
structure {\tt Tools} which is exported by the {\tt \$smlnj/cm/tools.cm} library. \subsection{Adding simple shell-command tools} \label{sec:addshellclass} If the tool is implemented as a typical'' shell command, then all that needs to be done is a single call of: \begin{verbatim} Tools.registerStdShellCmdTool \end{verbatim} For example, suppose you have made a new, improved version of ML-Yacc (New-ML-Yacc'') and want to register it under a class called {\tt nmlyacc}. Here is what you write: \begin{verbatim} val _ = Tools.registerStdShellCmdTool { tool = "New-ML-Yacc", class = "nmlyacc", suffixes = ["ngrm", "ny"], cmdStdPath = "new-ml-yacc", template = NONE, extensionStyle = Tools.EXTEND [("sig", SOME "sml", fn _ => NONE), ("sml", SOME "sml", fn x => x)], dflopts = [] } \end{verbatim} This code can be packaged as a CM library and loaded via {\tt CM.make} or {\tt CM.load\_plugin}. ({\tt CM.autoload} is not enough because of its lazy nature which prevents the required side-effects to occur.) Alternatively, the code could also be entered at the interactive top level after loading library {\tt \$smlnj/cm/tools.cm}.

In our example, the shell command name for our tool is {\tt
new-ml-yacc}.  When looking for this command in the filesystem, CM
first tries to treat it as a path anchor (see
section~\ref{sec:anchor:env}).  For example, suppose {\tt new-ml-yacc} is
mapped to {\tt /bin}.  In this case the command to be
invoked would be {\tt /bin/new-ml-yacc}.  If path anchor resolution
fails, then the command name will be used as-is.  Normally this
causes the shell's path search mechanism to be used as a fallback.

{\tt Tools.registerStdShellCmdTool} creates the class and installs the
tool for it.  The arguments must be specified as follows:

\begin{description}
\item[tool] a descriptive name of the tool (used in error messages);
type: {\tt string}
\item[class] the name of the class; the string must not contain
upper-case letters; type: {\tt string}
\item[suffixes] a list of file name suffixes that let CM automatically
recognize files of the class; type: {\tt string list}
\item[cmdStdPath] the command string from above; type: {\tt string}
\item[template] an optional string that describes how the command line
is to be constructed from pieces; \\
The string is taken verbatim except for embedded \% format specifiers:
\begin{description}\setlength{\itemsep}{0pt}
\item[\%c] the command name (i.e., the elaboration of {\tt cmdStdPath})
\item[\%s] the source file name in native pathname syntax
\item[\%$n$t] the $n$-th target file in native pathname syntax; \\
($n$ is specified as a decimal number, counting starts at $1$, and
each target file name is constructed from the corresponding {\tt
extensionStyle} entry; if $n$ is $0$ (or missing), then all
targets---separated by single spaces---are inserted;
if $n$ is not in the range between $0$ and the number of available
targets, then {\bf \%$n$t} expands into itself)
\item[\%$n$o] the $n$-th tool parameter; \\
(named sub-option parameters are ignored;
$n$ is specified as a decimal number, counting starts at $1$;
if $n$ is $0$ (or missing), then all options---separated by single
spaces---are inserted;
if $n$ is not in the range between $0$ and the number of available
options, then {\bf \%$n$o} expands into itself)
\item[\%$x$] the character $x$ (where $x$ is neither {\bf c}, nor
{\bf s}, {\bf t}, or {\bf o})
\end{description}
If no template string is given, then it defaults to {\tt "\%c \%s"}.
\item[extensionStyle] a specification of how the names of files
generated by the tool relate to the name of the tool input file;
type: {\tt Tools.extensionStyle}. \\
Currently, there are two possible cases:
\begin{enumerate}
\item {\tt Tools.EXTEND} $l$'' says that if the tool source file is
{\it file} then for each suffix {\it sfx} in {\tt (map \#1 $l$)} there
will be one tool output file named {\it file}{\tt .}{\it sfx}.  The
list $l$ consists of triplets where the first component specifies the
suffix string, the second component optionally specifies the
member class name of the corresponding derived file, and the
third component is a function to calculate tool options for the
target from those of the source. (Argument and result type of these
functions is {\tt Tools.toolopts option}.)
\item {\tt Tools.REPLACE }$(l_1, l_2)$'' says that given the
base name {\it base} there will be one tool output file {\it base}{\tt
.}{\it sfx} for each suffix {\it sfx} in {\tt (map \#1 $l_2$)}.  Here,
{\it base} is determined by the following rule: If the name of the
tool input file has a suffix that occurs in $l_1$, then {\it base} is
the name without that suffix.  Otherwise the whole file name is taken
as {\it base} (just like in the case of {\tt Tools.EXTEND}).  As with
{\tt Tools.EXTEND}, the second components of the elements of $l_2$ can
optionally specify the member class name of the corresponding derived
file, and the third component maps source options to target options.
\end{enumerate}
\item[dflopts] a list of tool options which is used for
substituting {\bf \%$n$o} fields in {\tt template} (see above) if no
options were specified.  (Note that the value of {\tt dflopts} is never
passed to the option mappers in {\tt Tools.EXTEND} or {\tt
Tools. REPLACE}.)  Type: {\tt Tools.toolopts}.
\end{description}

Examples for the {\tt EXTEND} expansion style are tools such as
ML-Yacc and ML-Lex, while others, e.g., ML-Burg, use the {\tt REPLACE}
style (see section~\ref{sec:builtin-tools}).

Adding a new class whose behavior is not covered by the mechanism
described in section~\ref{sec:addshellclass} is not complicated
either, but it requires a bit more code.

\subsubsection{Filename abstractions}

CM represents filenames as something that could be called a {\em
filename closure}.  Essentially, what this means is that not only a
string is being remembered but also the context in which to interpret
the string.  For a relative path, context information is the directory
in which it is to be interpreted; for an anchored path, the context
takes care of the anchoring.

The {\tt Tools} module provides two abstract types related to this
filename abstraction:

\begin{verbatim}
type presrcpath
type srcpath
\end{verbatim}

Since many tools invoke external shell commands or perform other
operation on physical files, it is often necessary to obtain an actual
native filename string from an abstract path:

\begin{verbatim}
val nativeSpec : srcpath -> string
val nativePreSpec : presrcpath -> string
\end{verbatim}

It is important to remember that these two functions frequently return
relative filenames, and such relative names must be interpreted from
within the right directory.  This right'' directory is the directory
that contains the CM description file on whose behalf the tool's rule
was invoked.  Rules that perform physical operations on files whose
names result from {\tt nativeSpec} or {\tt nativePreSpec} must
therefore first switch to that directory.  See the discussion of the
{\tt context} argument to rule functions below.

Strings that can potentially be used as pathnames are being passed
around as records containing a {\tt name}- and a {\tt mkpath} field.
The {\tt name} field contains the string itself while {\tt mkpath}
is a suspended'' abstract version of the path.

A fresh pathmaker can be constructed from native strings using {\tt
native2pathmaker}, which like {\tt context} is also an argument to
rule functions.  This function takes care of interpreting relative
strings within the correct context.  (For this, it is not even
necessary to first switch the current working directory.)

Recall that filenames that appear in description files can be written
using either standard'' or native'' syntax.  The field {\tt name}
will, thus, contain a string that can be in either of these two forms.
However, CM will pass an {\tt mkpath} function that accounts for these
syntactic differences and which also takes care of interpreting the
string in the correct context.

When new name specifications are constructed by the tool, the
appropriate {\tt mkpath} function must be provided.  For most tools
this will be a value constructed by applying {\tt native2pathmaker} to
some native filename string (because most tools internally operate on
native paths).

As shown above, there are two abstract path types: {\tt presrcpath}
and {\tt srcpath}.  The former can represent both directory names and
source file names, the latter can represent only source file names.
To convert from {\tt presrcpath} to {\tt srcpath}, use function {\tt
Tools.srcpath}:
\begin{verbatim}
val srcpath : presrcpath -> srcpath
\end{verbatim}
This function enforces CM's rule that there has to be at least one arc
in every such name (i.e., that it cannot be just an anchor).

One can also construct a new abstract path from an existing path by
adding arcs at the end.  The constructed path will share its internal
context with the old one.

\begin{verbatim}
val augment : presrcpath -> string list -> presrcpath
\end{verbatim}

The list of strings must contain simple pathname arcs.

\subsubsection{Adding a class and its rule}

The interface to add arbitrary classes is the routine {\tt
Tools.registerClass}:

\begin{verbatim}
val registerClass : class * rule -> unit
\end{verbatim}

Here, type {\tt class} is simply synonymous to {\tt string}; a class
string is the name of the class to be registered.  It must not contain
upper-case letters:

\begin{verbatim}
type class = string
\end{verbatim}

Type {\tt rule} is a function type.  It describes the rule function
that CM will invoke for every member of the new class. The rule
function is responsible for invoking the auxiliary mechanism necessary
to bring its targets up-to-date.  The function result of the rule
function describes to CM what the targets are.  Thus, the function
maps the {\em specification} of the given member to its (partial) {\em
expansion}:

\begin{verbatim}
type rule =
{ spec: spec,
native2pathmaker: string -> pathmaker,
context: rulecontext,
defaultClassOf: fnspec -> class option } ->
partial_expansion
\end{verbatim}

The specification {\tt spec} consists of the name of the member
together with a function to produce its corresponding abstract
path (should that be necessary), the member's optional class (in case
it had been given explicitly), its tool options, and a boolean flag
that tells whether this member was the result of another tool:

\begin{verbatim}
type spec = { name: string,
mkpath: pathmaker,
class: class option,
opts: toolopts option,
derived: bool }
\end{verbatim}

\begin{description}
\item[name:] The name is the verbatim member string from the
description file.  Be sure not to use this string directly as a file
name (although some tools might use it directly for purposes other
than file names).  Instead, first convert it to an abstract path (see
{\tt mkpath} below) and then convert back to a {\em native} file name
string using one of {\tt nativeSpec} or {\tt nativePreSpec}.
\item[mkpath] This is a function of type {\tt pathmaker} that produces
an abstract pathname corresponding to {\tt name}.  CM will pass in different
functions here depending on whether {\tt name} was given in CM's
standard pathname syntax or in the underlying operating system's
native syntax.
\begin{verbatim}
type pathmaker = unit -> presrcpath
\end{verbatim}
\item[class:] This argument carries the class name if such a class
name was explicitly specified.  If the class was inferred from the
member name, then it will be set to {\tt NONE}.
\item[opts:] Tool options are represented by a data structure
resembling Lisp lists:
\begin{verbatim}
type fnspec = { name: string, mkpath: pathmaker }
datatype toolopt =
STRING of fnspec
| SUBOPTS of { name: string, opts: toolopts }
withtype toolopts = toolopt list
\end{verbatim}
The nesting of {\tt SUBOPTS} reflects the nesting of sub-option lists
in the member's tool option specification.
Again, names which are potentially to be interpreted as file names are
represented by their original specification string and a function {\tt
mkpath} to get the corresponding abstract path, thereby taking
care of interpreting the name according to its respective syntactic
rules and its context. (Type {\tt fnspec} is a slimmed-down version of
type {\tt spec}.  It also appears as the argument type of function
{\tt defaultClassOf}.  See below.)
\item[derived:] This flag is set to {\tt true} if the source file
represented by the specification is the result of a another, earlier
tool invocation.
\end{description}

The other three arguments of a rule function are {\tt native2pathmaker},
{\tt context}, and {\tt defaultClassOf}:

\begin{description}
\item[native2pathmaker:] This function takes a string and produces a
function of the same type as {\tt mkpath} above.  When the rule
constructs the specifications for its result files, it must provide
the corresponding {\tt mkpath} functions for those.  Since most tools
internally operate on native pathnames, these {\tt mkpath} functions
will have to be constructed using {\tt native2pathmaker}.
\item[context:] The context argument of a rule represents the
directory that contains the CM description file on whose behalf the
rule was invoked.  It is represented as a higher-order function that
invokes its function argument after temporarily setting the working
directory to the context directory and returns the result of this
invocation after restoring the original working directory.  Not all
rules require such a temporary change of directories, but those that
do should encapsulate all their work into a local function and then
pass this function to the context.
\begin{verbatim}
type rulefn = unit -> partial_expansion
type rulecontext = rulefn -> partial_expansion
\end{verbatim}
\item[defaultClassOf:] This function can be used to directly invoke
CM's internal classification mechanism, taking advantage of any
registered classifiers.  The argument to be passed is of type {\tt
fnspec}, i.e., a record consisting of a name string and a function to
convert the string to its corresponding abstract path.
\end{description}

A (full) {\em expansion} consists of three lists: a list of ML files,
a list of CM files, and a list of {\em sources}.  A partial expansion
is a full expansion together with a list of specifications that still
need to be expanded further.

\begin{verbatim}
type expansion =
{ smlfiles: (srcpath * Sharing.request * setup) list,
cmfiles: (srcpath * Version.t option * rebindings) list,
sources: (srcpath * { class: class, derived: bool}) list }

type partial_expansion = expansion * spec list
\end{verbatim}

A rule always returns a partial expansion.  CM will derive a full
expansion by repeatedly applying rules until the list of pending
specification becomes empty.

Most rules (except those for classes {\tt sml} and {\tt cm}) leave the
lists {\tt smlfiles} and {\tt cmfiles} empty.  A tool that produces an
ML source file or a CM description file as output should put a
specification for this file into the specification list of a partial
expansion, letting the rules for classes {\tt sml} and {\tt cm} take
care of the rest.  At this point we will therefore not dwell on
explanations for the types of these two fields.

The {\tt sources} field is used to implement {\tt CM.sources} (see
section~\ref{sec:makedepend:support}).  Therefore, the rule should
include here every file that it consumes if its implementer wishes to
have it reported by {\tt CM.sources}.  (Do not include source files
that are {\em produced} by the rule because those will be reported by
subsequent rules.)

\subsubsection{Reporting errors from tools}

When a rule encounters an error, it should raise the following
exception, setting {\tt tool} to a string describing the current tool
and {\tt msg} to a diagnostic string describing the nature of the
error:

\begin{verbatim}
exception ToolError of { tool: string, msg: string }
\end{verbatim}

A classifier is a mechanism that enables CM to infer a member's class
from its name.  Classifiers are invoked if no explicit class was
given.  CM supports two kinds of classifiers: suffix classifiers and
general classifiers.

\begin{verbatim}
datatype classifier =
SFX_CLASSIFIER of string -> class option
| GEN_CLASSIFIER of { name: string, mkfname: unit -> string } ->
class option
\end{verbatim}

Most of the time classifiers look at the file name suffix as their only
clue.  This idea is captured by {\tt SFX\_CLASSIFIER} which carries a
partial function from suffixes to class names.  The function should
return {\tt NONE} if it does not know about the given argument suffix.

The {\tt GEN\_CLASSIFIER} constructor carries a similar function---the
difference being that the entire member name is passed to it.
Moreover, the function can also invoke the {\tt mkfname} argument to
obtain a native filename string.  This string can at this point be
used to perform actual filesystem operations.

Invocation of {\tt mkfname} may raise exceptions, usually due to
syntax errors in {\tt name} that prevent it from being interpreted as
a filename.  Tools that use {\tt mkfname} should therefore be prepared
to handle such exceptions.

Moreover, it is advisable not to over-use this feature, and not to
perform extensive filesystem processing in order to perform
classification.  Otherwise the presence of this classifier might cause

By the way, suffix classifiers could be implemented as general
classifiers, but using {\tt SFX\_CLASSIFIER} for them is slightly more
efficient.  CM extracts the suffix from the name only once and applies
all suffix classifier before ever considering any generic classifier.
If some suffix classifier succeeds, there will be no overhead caused
by any generic classifier.

Function {\tt Tools.stdSfxClassifier} is a simple wrapper around {\tt
SFX\_CLASSIFIER} and produces a classifier that looks for precisely
one suffix string.

\begin{verbatim}
val stdSfxClassifier : { sfx: string , class: class } -> classifier
\end{verbatim}

Classifiers are registered with CM by invoking {\tt
Tools.registerClassifier}:

\begin{verbatim}
val registerClassifier : classifier -> unit
\end{verbatim}

\subsubsection{Miscellaneous}

Structure {\tt Tools} also provides a number of other types and
functions with the purpose of making it easier to write rule
functions.

\noindent{\bf Filename extension:} Many tools derive the names of
their targets from the name of their source.  As discussed in
section~\ref{sec:addshellclass}, CM provides some support for this via
values of type {\tt extensionStyle}:

\begin{verbatim}
type tooloptcvt = toolopts option -> toolopts option
datatype extensionStyle =
EXTEND of (string * class option * tooloptcvt) list
| REPLACE of string list * (string * class option * tooloptcvt) list
\end{verbatim}

These values can not only be passed to {\tt
Tools.registerStdShellCmdTool} but also be used to let CM perform name
extension directly.  To do so, one must invoke function {\tt
Tools.extend}:

\begin{verbatim}
val extend : extensionStyle ->
(string * toolopts option) ->
(string * class option * toolopts option) list
\end{verbatim}

\noindent{\bf Checking time stamps:} A tool can check whether a given
source file is older than all of its corresponding target files.

\begin{verbatim}
val outdated : string -> string list * string -> bool
\end{verbatim}

In a call {\tt (Tools.outdated $t$ ($l$, $s$))}, $t$ is the name of
the tool, $l$ is the list of targets (as native file names),
and $s$ is the source (also as a native file name).

An alternative way of checking for outdated sources (in the style of
the Noweb-tool; see section~\ref{sec:builtin-tools:noweb}) is the
following:

\begin{verbatim}
val outdated' : string ->
{ src: string, wtn: string, tgt: string } -> bool
\end{verbatim}

The idea here is that if both {\tt tgt} (target'') and {\tt wtn}
(witness'') exist, then {\tt tgt} is considered outdated if {\tt
wtn} is older than {\tt src}.  Otherwise, if {\tt tgt} exists but {\tt
wtn} does not, then {\tt tgt} is considered outdated if it is older
than {\tt src}.  If {\tt tgt} does not exist, then it is always
considered outdated.

\noindent{\bf File- and directory-creation:}  To open a text file for
output in such a way that all directories leading up to it are created
when they do not already exist, use {\tt Tools.openTextOut}:

\begin{verbatim}
val openTextOut : string -> TextIO.outstream
\end{verbatim}

To create the same directories without opening the file (and without
even creating it if it does not exist), use function {\tt
Tools.makeDirs}:

\begin{verbatim}
val makeDirs : string -> unit
\end{verbatim}

Note that the string passed to {\tt makeDirs} is still the name of a
file!

\noindent{\bf Option processing:}  For simple tools, the following
function for parsing'' tool options can be useful:

\begin{verbatim}
val parseOptions :
{ tool : string, keywords : string list, options : toolopts } ->
{ matches : string -> toolopts option, restoptions : string list }
\end{verbatim}

Given a list of accepted keywords, this function scans the tool
options and collects occurrences of sub-option lists labelled by one
of these keywords.  Any sub-option list that is not recognized and any
keyword that occurs more than once will be rejected as an error.  The
result consists of a function {\tt matches} that can be uses to query
each of the keywords.  The function also collects and returns all the
{\tt STRING} options.

\noindent{\bf Issuing diagnostics:} Functions {\tt Tools.say} and {\tt
Tools.vsay} both take a list of strings and output the concatenation
of these strings to the compiler's standard control output stream
(i.e., usually {\tt TextIO.stdOut}).  The difference between {\tt say}
and {\tt vsay} is that the former works unconditionally while the
latter is controlled by {\tt CM.Control.verbose} (see
section~\ref{sec:registers}).

\noindent{\bf Anchor-configurable strings:} Mainly for the purpose of
implementing anchor-configurable names for auxiliary shell commands
(such as {\tt ml-yacc}), one can invoke {\tt Tools.mkCmdName}:

\begin{verbatim}
val mkCmdName : string -> string
\end{verbatim}

If $m$ is a path anchor that points to $d$, then {\tt (mkCmdName $m$)}
returns $d${\tt /}$m$; otherwise it returns $m$.

\subsection{Plug-in Tools}
\label{sec:plugintools}

If CM encounters a member class name $c$ that it does not know about,
then it tries to load a plugin module named {\tt \$/}$c${\tt -tool.cm}. If it sees a file whose name ends in suffix$s$for which no explicit member class has been specified in the CM description file and for which automatic member classification fails, it will try to load a plugin module named {\tt \$/}$s${\tt -ext.cm}.  The so-loaded
module can then register the required tool, thereby enabling CM to
successfully deal with the previously unknown member.

This mechanism makes it possible for new tools to be added by simply
placing appropriately named plug-in libraries in some convenient place
and making the corresponding adjustments to the anchor environment.
In other words, description files {\tt \$/}$c${\tt -tool.cm} and {\tt \$/}$s${\tt -ext.cm} that correspond to general-purpose tools should
be registered either by modifying the global or the local path
configuration file or by directly invoking functon {\tt
CM.Anchor.anchor} (see section~\ref{sec:api:anchors}).  Actual
description files for the tools' implementations can then be placed in
arbitrary locations.

\label{sec:localtools}

Some projects might want to use their own special-purpose tools for
which a global installation is not convenient or not appropriate.  In
such a case, the project's description file can explicitly demand the
tool to be registered temporarily.  This is the purpose of the special
tool class {\tt tool}.  Example:

\begin{verbatim}
Library
structure Foo
is
bar-tool.cm : tool
foo.b : bar
\end{verbatim}

Here, the member whose class is {\tt tool} (i.e, {\tt bar-tool.cm})
must be the CM description file of the tool's implementation.  The
difference to class {\tt cm} is that the so-specified library does not
immediately via {\tt CM.load\_plugin}, causing one or more new classes
and their classifiers to be registered.

If we assume that loading {\tt bar-tool.cm} causes a class {\tt bar}
to be registered with its associated rule (e.g., by invoking {\tt
Tools.registerStdShellCmdTool}), the class name {\tt bar} will be
available for all subsequent members of the current description file.
Likewise, classifiers (e.g., filename suffixes) registered by {\tt
bar-tool.cm} will also be available.

The effect of registering classes and classifiers using class {\tt
tool} lasts until the end of the current description file and is
restricted to that file.  This means that other description files that
also want to use class {\tt bar} will have to have their own {\tt
tool} entry.\footnote{Note that CM cannot enforce that the tool
library actually register a class or a classifier.  Any side-effects
other than registering classes or classifiers are beyond CM's control
and will not be undone once processing the current description file is
complete.}

Local tool classes and suffixes temporarily override any equally-named
global classes or suffixes, respectively.

\subsubsection{Locally declared suffixes}
\label{sec:localsuffixes}

It is sometimes convenient to locally add another recognized filename
suffix to an already registered class.  This is the purpose of the
special tool class {\tt suffix}.  For example, a programmer who has
named all ML files in such a way that file names end in {\tt .ml}
could write near the beginning of the description file:

\begin{verbatim}
ml : suffix (sml)
\end{verbatim}

For the remainder of the current description file, all such {\tt
.ml}-files will now be treated as members of class {\tt sml}.