Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] View of /sml/trunk/src/cm/Doc/10-moretools.tex
ViewVC logotype

View of /sml/trunk/src/cm/Doc/10-moretools.tex

Parent Directory Parent Directory | Revision Log Revision Log

Revision 743 - (download) (as text) (annotate)
Thu Dec 7 15:31:24 2000 UTC (19 years, 9 months ago) by blume
File size: 22381 byte(s)
merging changes from private branch
% -*- latex -*-

\section{Extending the tool set}

CM's tool set is extensible: new tools can be added by writing a few
lines of ML code.  The necessary hooks for this are provided by a
structure {\tt Tools} which is exported by the {\tt \$smlnj/cm/tools.cm}

\subsection{Adding simple shell-command tools}

If the tool is implemented as a ``typical'' shell command, then all
that needs to be done is a single call of:


For example, suppose you have made a
new, improved version of ML-Yacc (``New-ML-Yacc'') and want to
register it under a class called {\tt nmlyacc}.  Here is what you

  val _ = Tools.registerStdShellCmdTool
    { tool = "New-ML-Yacc",
      class = "nmlyacc",
      suffixes = ["ngrm", "ny"],
      cmdStdPath = "new-ml-yacc",
      template = NONE,
      extensionStyle =
          Tools.EXTEND [("sig", SOME "sml", fn _ => NONE),
                        ("sml", SOME "sml", fn x => x)],
      dflopts = [] }

This code can be packaged as a CM library and loaded via {\tt CM.make}
or {\tt CM.load\_plugin}.  ({\tt CM.autoload} is not enough because of
its lazy nature which prevents the required side-effects to occur.)
Alternatively, the code could also be entered at the interactive top
level after loading library {\tt \$smlnj/cm/tools.cm}.

In our example, the shell command name for our tool is {\tt
new-ml-yacc}.  When looking for this command in the file system, CM
first tries to treat it as a path anchor (see
section~\ref{sec:anchor:env}).  For example, suppose {\tt new-ml-yacc} is
mapped to {\tt /bin}.  In this case the command to be
invoked would be {\tt /bin/new-ml-yacc}.  If path anchor resolution
fails, then the command name will be used as-is.  Normally this
causes the shell's path search mechanism to be used as a fallback.

{\tt Tools.registerStdShellCmdTool} creates the class and installs the
tool for it.  The arguments must be specified as follows:

\item[tool] a descriptive name of the tool (used in error messages);
type: {\tt string}
\item[class] the name of the class; the string must not contain
upper-case letters; type: {\tt string}
\item[suffixes] a list of file name suffixes that let CM automatically
recognize files of the class; type: {\tt string list}
\item[cmdStdPath] the command string from above; type: {\tt string}
\item[template] an optional string that describes how the command line
is to be constructed from pieces; \\
The string is taken verbatim except for embedded \% format specifiers:
  \item[\%c] the command name (i.e., the elaboration of {\tt cmdStdPath})
  \item[\%s] the source file name in native pathname syntax
  \item[\%$n$t] the $n$-th target file in native pathname syntax; \\
    ($n$ is specified as a decimal number, counting starts at $1$, and
    each target file name is constructed from the corresponding {\tt
    extensionStyle} entry; if $n$ is $0$ (or missing), then all
    targets---separated by single spaces---are inserted;
    if $n$ is not in the range between $0$ and the number of available
    targets, then {\bf \%$n$t} expands into itself) 
  \item[\%$n$o] the $n$-th tool parameter; \\
    (named sub-option parameters are ignored;
     $n$ is specified as a decimal number, counting starts at $1$;
     if $n$ is $0$ (or missing), then all options---separated by single
     spaces---are inserted;
     if $n$ is not in the range between $0$ and the number of available
     options, then {\bf \%$n$o} expands into itself) 
  \item[\%$x$] the character $x$ (where $x$ is neither {\bf c}, nor
    {\bf s}, {\bf t}, or {\bf o})
If no template string is given, then it defaults to {\tt "\%c \%s"}.
\item[extensionStyle] a specification of how the names of files
generated by the tool relate to the name of the tool input file;
type: {\tt Tools.extensionStyle}. \\
Currently, there are two possible cases:
\item ``{\tt Tools.EXTEND} $l$'' says that if the tool source file is
{\it file} then for each suffix {\it sfx} in {\tt (map \#1 $l$)} there
will be one tool output file named {\it file}{\tt .}{\it sfx}.  The
list $l$ consists of triplets where the first component specifies the
suffix string, the second component optionally specifies the
member class name of the corresponding derived file, and the
third component is a function to calculate tool options for the 
target from those of the source. (Argument and result type of these
functions is {\tt Tools.toolopts option}.)
\item ``{\tt Tools.REPLACE }$(l_1, l_2)$'' says that given the
base name {\it base} there will be one tool output file {\it base}{\tt
.}{\it sfx} for each suffix {\it sfx} in {\tt (map \#1 $l_2$)}.  Here,
{\it base} is determined by the following rule: If the name of the
tool input file has a suffix that occurs in $l_1$, then {\it base} is
the name without that suffix.  Otherwise the whole file name is taken
as {\it base} (just like in the case of {\tt Tools.EXTEND}).  As with
{\tt Tools.EXTEND}, the second components of the elements of $l_2$ can
optionally specify the member class name of the corresponding derived
file, and the third component maps source options to target options.
\item[dflopts] a list of tool options which is used for
substituting {\bf \%$n$o} fields in {\tt template} (see above) if no
options were specified.  (Note that the value of {\tt dflopts} is never
passed to the option mappers in {\tt Tools.EXTEND} or {\tt
Tools. REPLACE}.)  Type: {\tt Tools.toolopts}.

Examples for the {\tt EXTEND} expansion style are tools such as
ML-Yacc and ML-Lex, while others, e.g., ML-Burg, use the {\tt REPLACE}
style (see section~\ref{sec:builtin-tools}).

\subsection{Adding other classes}

Adding a new class whose behavior is not covered by the mechanism
described in section~\ref{sec:addshellclass} is not complicated
either, but it requires a bit more code.

\subsubsection{Adding a class and its rule}

The interface to add arbitrary classes is the routine {\tt

  val registerClass : class * rule -> unit

Here, type {\tt class} is simply synonymous to {\tt string}; a class
string is the name of the class to be registered.  It must not contain
upper-case letters:

  type class = string

Type {\tt rule} is a function type.  It describes the rule function
that CM will invoke for every member of the new class. The rule
function is responsible for invoking the auxiliary mechanism necessary
to bring its targets up-to-date.  The function result of the rule
function describes to CM what the targets are.  Thus, the function
maps the {\em specification} of the given member to its (partial) {\em

  type rule =
    { spec: spec, mkNativePath: pathmaker, context: rulecontext } ->

The specification {\tt spec} consists of the name of the member
together with a function to convert it to an abstract path (should
that be necessary), the member's optional class (in case it had been
given explicitly), its tool options, and a boolean flag that tells
whether this member was the result of another tool:

  type spec = { name: string,
                mkpath: pathmaker,
                class: class option,
                opts: toolopts option,
                derived: bool }

\item[name:] The name is the verbatim member string from the
description file.  Be sure not to use this string directly as a file
name (although some tools might use it directly for purposes other
than file names).  Instead, first convert it to an abstract path (see
{\tt mkpath} below) and convert back to a {\em native} file name
string using one of:
  val nativeSpec : srcpath -> string
  val nativePre : presrcpath -> string
\item[mkpath] This is a function of type {\tt pathmaker} that converts
the name string to an abstract path name.  CM will pass in different
functions here depending on whether {\tt name} was given in CM's
standard path name syntax or in the underlying operating system's
native syntax.
  type pathmaker = string -> presrcpath
There are two abstract path types: {\tt presrcpath} and {\tt srcpath}.
The former can represent both directory names and source file names,
the latter can represent only source file names.  To convert from {\tt
presrcpath} to {\tt srcpath}, use function {\tt Tools.srcpath}:
  val srcpath : presrcpath -> srcpath
This function enforces CM's rule that there has to be at least one arc
in every such name (i.e., that it cannot be just an anchor).
\item[class:] This argument carries the class name if such a class
name was explicitly specified.  If the class was inferred from the
member name, then it will be set to {\tt NONE}.
\item[opts:] Tool options are represented by a data structure
resembling Lisp lists:
  datatype toolopt =
      STRING of { name: string, mkpath: pathmaker }
    | SUBOPTS of { name: string, opts: toolopts }
  withtype toolopts = toolopt list
The nesting of {\tt SUBOPTS} reflects the nesting of sub-option lists
in the member's tool option specification.
\item[derived:] This flag is set to {\tt true} if the source file
represented by the specification is the result of a another, earlier
tool invocation.

The other two arguments of a rule function are {\tt mkNativePath} and
{\tt context}:

\item[mkNativePath:] This is a function of the same type as {\tt
mkpath} above.  When the rule constructs the specifications for its
result files, it must provide the corresponding {\tt mkpath} function
for those.  Since most tools internally operate on native path names
(for example, because they pass these names to other, external
programs), this {\tt mkpath} function will have to be {\tt
\item[context:] The context argument of a rule represents the
directory that contains the CM description file on whose behalf the
rule was invoked.  It is represented as a higher-order function that
invokes its function argument after temporarily setting the working
directory to the context directory and returns the result of this
invocation after restoring the original working directory.  Not all
rules require such a temporary change of directories, but those that
do should encapsulate all their work into a local function and then
pass this function to the context.
  type rulefn = unit -> partial_expansion
  type rulecontext = rulefn -> partial_expansion

A (full) {\em expansion} consists of three lists: a list of ML files,
a list of CM files, and a list of {\em sources}.  A partial expansion
is a full expansion together with a list of specifications that still
need to be expanded further.

  type expansion =
    { smlfiles: (srcpath * Sharing.request * setup) list,
      cmfiles: (srcpath * Version.t option * rebindings) list,
      sources: (srcpath * { class: class, derived: bool}) list }

  type partial_expansion = expansion * spec list

A rule always returns a partial expansion.  CM will derive a full
expansion by repeatedly applying rules until the list of pending
specification becomes empty.

Most rules (except those for classes {\tt sml} and {\tt cm}) leave the
lists {\tt smlfiles} and {\tt cmfiles} empty.  A tool that produces an
ML source file or a CM description file as output should put a
specification for this file into the specification list of a partial
expansion, letting the rules for classes {\tt sml} and {\tt cm} take
care of the rest.  At this point we will therefore not dwell on
explanations for the types of these two fields.

The {\tt sources} field is used to implement {\tt CM.sources} (see
section~\ref{sec:makedepend:support}).  Therefore, the rule should
include here every file that it consumes if its implementer wishes to
have it reported by {\tt CM.sources}.  (Do not include source files
that are {\em produced} by the rule because those will be reported by
subsequent rules.)

\subsubsection{Reporting errors from tools}

When a rule encounters an error, it should raise the following
exception, setting {\tt tool} to a string describing the current tool
and {\tt msg} to a diagnostic string describing the nature of the

  exception ToolError of { tool: string, msg: string }

\subsubsection{Adding a classifier}

A classifier is a mechanism that enables CM to infer a member's class
form its name.  Classifiers are invoked if no explicit class was
given.  CM supports two kinds of classifiers: suffix classifiers and
general classifiers.

  datatype classifier =
      SFX_CLASSIFIER of string -> class option
    | GEN_CLASSIFIER of string -> class option

Most of the time, classifiers look at the file name suffix as the only
clue.  This idea is captured by {\tt SFX\_CLASSIFIER} which carries a
partial function from suffixes to class names.  The function should
return {\tt NONE} if it does not know about the given argument suffix.

The {\tt GEN\_CLASSIFIER} constructor carries a similar function---the
only difference being that the entire member name is passed to it.
(Suffix classifiers could be implemented as general classifiers, but
using {\tt SFX\_CLASSIFIER} for them is slightly more efficient
because CM will extract the suffix from the name only once.)

Function {\tt Tools.stdSfxClassifier} is a simple wrapper around {\tt
SFX\_CLASSIFIER} and produces a classifier that looks for precisely
one suffix string.

  val stdSfxClassifier : { sfx: string , class: class } -> classifier

Classifiers are registered with CM by invoking {\tt

  val registerClassifier : classifier -> unit


Structure {\tt Tools} also provides a number of other types and
functions with the purpose of making it easier to write rule

\noindent{\bf Filename extension:} Many tools derive the names of
their targets from the name of their source.  As discussed in
section~\ref{sec:addshellclass}, CM provides some support for this via
values of type {\tt extensionStyle}:

  type tooloptcvt = toolopts option -> toolopts option
  datatype extensionStyle =
      EXTEND of (string * class option * tooloptcvt) list
    | REPLACE of string list * (string * class option * tooloptcvt) list

These values can not only be passed to {\tt
Tools.registerStdShellCmdTool} but also be used to let CM perform name
extension directly.  To do so, one must invoke function {\tt

  val extend : extensionStyle ->
               (string * toolopts option) ->
               (string * class option * toolopts option) list

\noindent{\bf Checking time stamps:} A tool can check whether a given
source file is older than all of its corresponding target files.

  val outdated : string -> string list * string -> bool

In a call {\tt (Tools.outdated $t$ ($l$, $s$))}, $t$ is the name of
the tool, $l$ is the list of targets (as native file names), and $s$
is the source (also as a native file name).

An alternative way of checking for outdated sources (in the style of
the Noweb-tool; see section~\ref{sec:builtin-tools:noweb}) is the

  val outdated' : string ->
                  { src: string, wtn: string, tgt: string } -> bool

The idea here is that if both {\tt tgt} (``target'') and {\tt wtn}
(``witness'') exist, then {\tt tgt} is considered outdated if {\tt
wtn} is older than {\tt src}.  Otherwise, if {\tt tgt} exists but {\tt
wtn} does not, then {\tt tgt} is considered outdated if it is older
than {\tt src}.  If {\tt tgt} does not exist, then it is always
considered outdated.

\noindent{\bf File- and directory-creation:}  To open a text file for
output in such a way that all directories leading up to it are created
when they do not already exist, use {\tt Tools.openTextOut}:

  val openTextOut : string -> TextIO.outstream

To create the same directories without opening the file (and without
even creating it if it does not exist), use function {\tt

  val makeDirs : string -> unit

Note that the string passed to {\tt makeDirs} is still the name of a

\noindent{\bf Option processing:}  For simple tools, the following
function for ``parsing'' tool options can be useful:

  val parseOptions :
      { tool : string, keywords : string list, options : toolopts } ->
      { matches : string -> toolopts option, restoptions : string list }

Given a list of accepted keywords, this function scans the tool
options and collects occurrences of sub-option lists labelled by one
of these keywords.  Any sub-option list that is not recognized and any
keyword that occurs more than once will be rejected as an error.  The
result consists of a function {\tt matches} that can be uses to query
each of the keywords.  The function also collects and returns all the
{\tt STRING} options.

\noindent{\bf Issuing diagnostics:} Functions {\tt Tools.say} and {\tt
Tools.vsay} both take a list of strings and output the concatenation
of these strings to the compiler's standard control output stream
(i.e., usually {\tt TextIO.stdOut}).  The difference between {\tt say}
and {\tt vsay} is that the former works unconditionally while the
latter is controlled by {\tt CM.Control.verbose} (see

\noindent{\bf Anchor-configurable strings:} Mainly for the purpose of
implementing anchor-configurable names for auxiliary shell commands
(such as {\tt ml-yacc}), one can invoke {\tt Tools.mkCmdName}:

  val mkCmdName : string -> string

If $m$ is a path anchor that points to $d$, then {\tt (mkCmdName $m$)}
returns $d${\tt /}$m$; otherwise it returns $m$.

\noindent{\bf Querying the default class of a member:} One can
directly invoke CM's internal classification mechanism, taking
advantage of any registered classifiers:

  val defaultClassOf : string -> class option

\subsection{Plug-in Tools}

\subsubsection{Automatically loaded, global plug-in tools}

If CM encounters a member class name $c$ that it does not know about,
then it tries to load a plugin module named {\tt \$/}$c${\tt
-tool.cm}.  If it sees a file whose name ends in suffix $s$ for which
no explicit member class has been specified in the CM description file
and for which automatic member classification fails, it will try to
load a plugin module named {\tt \$/}$s${\tt -ext.cm}.  The so-loaded
module can then register the required tool, thereby enabling CM to
successfully deal with the previously unknown member.

This mechanism makes it possible for new tools to be added by simply
placing appropriately named plug-in libraries in some convenient place
and making the corresponding adjustments to the anchor environment.
In other words, description files {\tt \$/}$c${\tt -tool.cm} and {\tt
\$/}$s${\tt -ext.cm} that correspond to general-purpose tools should
be registered by modifying either the global or the local path
configuration file (or by directly invoking functon {\tt
CM.Anchor.anchor}; see section~\ref{sec:api:anchors}).  If this is
done, actual description files for the tools' implementations can be
placed in arbitrary locations.

\subsubsection{Explicitly loaded, local plug-in tools}

Some projects might want to use their own special-purpose tools for
which a global installation is not convenient or not appropriate.  In
such a case, the project's description file can explicitly demand the
tool to be registered temporarily.  This is the purpose of the special
tool class {\tt tool}.  Example:

    structure Foo
    bar-tool.cm : tool
    foo.b : bar

Here, the member whose class is {\tt tool} (i.e, {\tt bar-tool.cm})
must be the CM description file of the tool's implementation.  The
difference to class {\tt cm} is that the so-specified library does not
become part of the current project but is loaded and linked
immediately via {\tt CM.load\_plugin}, causing one or more new classes
and their classifiers to be registered.

If we assume that loading {\tt bar-tool.cm} causes a class {\tt bar}
to be registered with its associated rule (e.g., by invoking {\tt
Tools.registerStdShellCmdTool}), the class name {\tt bar} will be
available for all subsequent members of the current description file.
Likewise, classifiers (e.g., filename suffixes) registered by {\tt
bar-tool.cm} will also be available.

The effect of registering classes and classifiers using class {\tt
tool} lasts until the end of the current description file and is
restricted to that file.  This means that other description files that
also want to use class {\tt bar} will have to have their own {\tt
tool} entry.\footnote{Note that CM cannot enforce that the tool
library actually register a class or a classifier.  Any side-effects
other than registering classes or classifiers are beyond CM's control
and will not be undone once processing the current description file is

Local tool classes and suffixes temporarily override any equally-named
global classes or suffixes, respectively.

\subsubsection{Locally declared suffixes}

It is sometimes convenient to locally add another recognized filename
suffix to an already registered class.  This is the purpose of the
special tool class {\tt suffix}.  For example, a programmer who has
named all ML files in such a way that file names end in {\tt .ml}
could write near the beginning of the description file:

    ml : suffix (sml)

For the remainder of the current description file, all such {\tt
.ml}-files will now be treated as members of class {\tt sml}.

ViewVC Help
Powered by ViewVC 1.0.0