Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] View of /sml/trunk/src/cm/Doc/10-moretools.tex
ViewVC logotype

View of /sml/trunk/src/cm/Doc/10-moretools.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 742 - (download) (as text) (annotate)
Thu Nov 30 14:09:32 2000 UTC (18 years, 9 months ago) by blume
File size: 21231 byte(s)
merging changes from private branch
% -*- latex -*-

\section{Extending the tool set}
\label{sec:moretools}

CM's tool set is extensible: new tools can be added by writing a few
lines of ML code.  The necessary hooks for this are provided by a
structure {\tt Tools} which is exported by the {\tt \$smlnj/cm/tools.cm}
library.

\subsection{Adding simple shell-command tools}
\label{sec:addshellclass}

If the tool is implemented as a ``typical'' shell command, then all
that needs to be done is a single call of:

\begin{verbatim}
  Tools.registerStdShellCmdTool
\end{verbatim}

For example, suppose you have made a
new, improved version of ML-Yacc (``New-ML-Yacc'') and want to
register it under a class called {\tt nmlyacc}.  Here is what you
write:

\begin{verbatim}
  val _ = Tools.registerStdShellCmdTool
    { tool = "New-ML-Yacc",
      class = "nmlyacc",
      suffixes = ["ngrm", "ny"],
      cmdStdPath = "new-ml-yacc",
      template = NONE,
      extensionStyle =
          Tools.EXTEND [("sig", SOME "sml", fn _ => NONE),
                        ("sml", SOME "sml", fn x => x)],
      dflopts = [] }
\end{verbatim}

This code can be packaged as a CM library and loaded via {\tt CM.make}
or {\tt CM.load\_plugin}.  ({\tt CM.autoload} is not enough because of
its lazy nature which prevents the required side-effects to occur.)
Alternatively, the code could also be entered at the interactive top
level after loading library {\tt \$smlnj/cm/tools.cm}.

In our example, the shell command name for our tool is {\tt
new-ml-yacc}.  When looking for this command in the file system, CM
first tries to treat it as a path anchor (see
section~\ref{sec:anchor:env}).  For example, suppose {\tt new-ml-yacc} is
mapped to {\tt /bin}.  In this case the command to be
invoked would be {\tt /bin/new-ml-yacc}.  If path anchor resolution
fails, then the command name will be used as-is.  Normally this
causes the shell's path search mechanism to be used as a fallback.

{\tt Tools.registerStdShellCmdTool} creates the class and installs the
tool for it.  The arguments must be specified as follows:

\begin{description}
\item[tool] a descriptive name of the tool (used in error messages);
type: {\tt string}
\item[class] the name of the class; the string must not contain
upper-case letters; type: {\tt string}
\item[suffixes] a list of file name suffixes that let CM automatically
recognize files of the class; type: {\tt string list}
\item[cmdStdPath] the command string from above; type: {\tt string}
\item[template] an optional string that describes how the command line
is to be constructed from pieces; \\
The string is taken verbatim except for embedded \% format specifiers:
  \begin{description}\setlength{\itemsep}{0pt}
  \item[\%c] the command name (i.e., the elaboration of {\tt cmdStdPath})
  \item[\%s] the source file name in native pathname syntax
  \item[\%$n$t] the $n$-th target file in native pathname syntax; \\
    ($n$ is specified as a decimal number, counting starts at $1$, and
    each target file name is constructed from the corresponding {\tt
    extensionStyle} entry; if $n$ is $0$ (or missing), then all
    targets---separated by single spaces---are inserted;
    if $n$ is not in the range between $0$ and the number of available
    targets, then {\bf \%$n$t} expands into itself) 
  \item[\%$n$o] the $n$-th tool parameter; \\
    (named sub-option parameters are ignored;
     $n$ is specified as a decimal number, counting starts at $1$;
     if $n$ is $0$ (or missing), then all options---separated by single
     spaces---are inserted;
     if $n$ is not in the range between $0$ and the number of available
     options, then {\bf \%$n$o} expands into itself) 
  \item[\%$x$] the character $x$ (where $x$ is neither {\bf c}, nor
    {\bf s}, {\bf t}, or {\bf o})
  \end{description}
If no template string is given, then it defaults to {\tt "\%c \%s"}.
\item[extensionStyle] a specification of how the names of files
generated by the tool relate to the name of the tool input file;
type: {\tt Tools.extensionStyle}. \\
Currently, there are two possible cases:
\begin{enumerate}
\item ``{\tt Tools.EXTEND} $l$'' says that if the tool source file is
{\it file} then for each suffix {\it sfx} in {\tt (map \#1 $l$)} there
will be one tool output file named {\it file}{\tt .}{\it sfx}.  The
list $l$ consists of triplets where the first component specifies the
suffix string, the second component optionally specifies the
member class name of the corresponding derived file, and the
third component is a function to calculate tool options for the 
target from those of the source. (Argument and result type of these
functions is {\tt Tools.toolopts option}.)
\item ``{\tt Tools.REPLACE }$(l_1, l_2)$'' says that given the
base name {\it base} there will be one tool output file {\it base}{\tt
.}{\it sfx} for each suffix {\it sfx} in {\tt (map \#1 $l_2$)}.  Here,
{\it base} is determined by the following rule: If the name of the
tool input file has a suffix that occurs in $l_1$, then {\it base} is
the name without that suffix.  Otherwise the whole file name is taken
as {\it base} (just like in the case of {\tt Tools.EXTEND}).  As with
{\tt Tools.EXTEND}, the second components of the elements of $l_2$ can
optionally specify the member class name of the corresponding derived
file, and the third component maps source options to target options.
\end{enumerate}
\item[dflopts] a list of tool options which is used for
substituting {\bf \%$n$o} fields in {\tt template} (see above) if no
options were specified.  (Note that the value of {\tt dflopts} is never
passed to the option mappers in {\tt Tools.EXTEND} or {\tt
Tools. REPLACE}.)  Type: {\tt Tools.toolopts}.
\end{description}

Examples for the {\tt EXTEND} expansion style are tools such as
ML-Yacc and ML-Lex, while others, e.g., ML-Burg, use the {\tt REPLACE}
style (see section~\ref{sec:builtin-tools}).

\subsection{Adding other classes}

Adding a new class whose behavior is not covered by the mechanism
described in section~\ref{sec:addshellclass} is also not complicated,
but it requires a bit more code.

\subsubsection{Adding a class and its rule}

The interface to add arbitrary classes is the routine {\tt
Tools.registerClass}:

\begin{verbatim}
  val registerClass : class * rule -> unit
\end{verbatim}

Here, the type {\tt class} is simply synonymous to {\tt string}; a
class string is the name of the class to be registered.  It may not
contain upper-case letters:

\begin{verbatim}
  type class = string
\end{verbatim}

Type {\tt rule} is a function type.  It describes the rule function
that CM will invoke for every member of the new class.  In essence,
this function maps the {\em specification} of the given member to its
(partial) {\em expansion}:

\begin{verbatim}
  type rule =
    { spec: spec,
      mkNativePath: pathmaker,
      context: rulecontext } ->
    partial_expansion
\end{verbatim}

The specification {\tt spec} is the name of the member together with a
function to convert the name to an abstract path, the member's
optional class (in case it had been given explicitly), its tool
options, and a boolean flag that tells whether this member was the
result of another tool:

\begin{verbatim}
  type spec = { name: string,
                mkpath: pathmaker,
                class: class option,
                opts: toolopts option,
                derived: bool }
\end{verbatim}

\begin{description}
\item[name:] The name is the verbatim member string from the
description file.  Be sure not to use this string directly as a file
name.  Instead, first convert it to an abstract path (see {\tt mkpath}
below) and convert back to a {\em native} file name string using one
of:
\begin{verbatim}
  val nativeSpec : srcpath -> string
  val nativePre : presrcpath -> string
\end{verbatim}
\item[mkpath] This is a function that converts the name string to an
abstract path name.  (The same string can denote different paths
depending on whether it was specified using CM's standard path name
syntax or the underlying operating system's native syntax.)
\begin{verbatim}
  type pathmaker = string -> presrcpath
\end{verbatim}
There are two abstract path types: {\tt presrcpath} and {\tt srcpath}.
The former can represent both directory names and source file names,
the latter can represent only source file names.  To convert from {\tt
presrcpath} to {\tt srcpath}, use function {\tt Tools.srcpath}:
\begin{verbatim}
  val srcpath : presrcpath -> srcpath
\end{verbatim}
This function enforces CM's rule that there has to be at least one arc
in every such name (i.e., that it cannot be just an anchor).
\item[class:] This argument carries the class name if such a class
name was explicitly specified.  If the class was inferred from the
member name, then this argument is {\tt NONE}.
\item[opts:] Tool options are represented by a data structure
resembling Lisp lists:
\begin{verbatim}
  datatype toolopt =
      STRING of { name: string, mkpath: pathmaker }
    | SUBOPTS of { name: string, opts: toolopts }
  withtype toolopts = toolopt list
\end{verbatim}
\item[derived:] This flag is set to {\tt true} if the source file
represented by the specification is the result of a another, earlier
tool invocation.
\end{description}

The other two arguments ({\tt mkNativePath} and {\tt context}) of a
rule function are:

\begin{description}
\item[mkNativePath:] This is a function like {\tt mkpath} above.  When
the rule constructs the specification for result files, it must also
provide the corresponding {\tt mkpath} function.  Since most tools
internally operate on native path names (for example, because they
pass these names to other, external programs), this {\tt mkpath}
function will have to be {\tt mkNativePath}.
\item[context:] The context represents the directory that contains the
CM description file on whose behalf the rule was invoked.  It is
represented as a higher-order function that invokes its function
argument after temporarily setting the working directory to the
context directory and returns the result of this function invocation
after restoring the original working directory.  Not all rules
require such a temporary change of directories, but those that do
should encapsulate all their work into a local function and then pass
this function to the context.
\begin{verbatim}
  type rulefn = unit -> partial_expansion
  type rulecontext = rulefn -> partial_expansion
\end{verbatim}
\end{description}

A (full) {\em expansion} consists of three lists: a list of ML files,
a list of CM files, and a list of {\em sources}.  A partial expansion
is a full expansion together with a list of specifications that still
need to be expanded further.

\begin{verbatim}
  type expansion =
    { smlfiles: (srcpath * Sharing.request * setup) list,
      cmfiles: (srcpath * Version.t option * rebindings) list,
      sources: (srcpath * { class: class, derived: bool}) list }

  type partial_expansion = expansion * spec list
\end{verbatim}

A rule always returns a partial expansion.  CM will derive a full
expansion by repeatedly applying rules until the list of pending
specification becomes empty.

Most rules (except those for classes {\tt sml} and {\tt cm}) leave the
lists {\tt smlfiles} and {\tt cmfiles} empty.  A tool that produces an
ML source file or a CM description file as output should put a
specification for it into the specification list of a partial
expansion and let the rules for {\tt sml} and {\tt cm} take care of
the rest.  At this point we will therefore not dwell on explanations
for the types of these two values.

The {\tt sources} list is used to implement {\tt CM.sources} (see
section~\ref{sec:makedepend:support}).  Therefore, the rule should
include here every file that it consumes and that its implementer
wishes to be reported by {\tt CM.sources}. However, do not include
source files that are {\em produced} by the rule because those will be
reported by subsequent rules.

When a rule encounters an error, it should raise the following
exception, setting {\tt tool} to a string describing the current tool
and {\tt msg} to a diagnostic string describing the nature of the
error:

\begin{verbatim}
  exception ToolError of { tool: string, msg: string }
\end{verbatim}

\subsubsection{Adding a classifier}

A classifier is a mechanism that enables CM to infer a member's class
form its name.  CM supports two kinds of classifiers: suffix
classifiers and general classifiers.

\begin{verbatim}
  datatype classifier =
      SFX_CLASSIFIER of string -> class option
    | GEN_CLASSIFIER of string -> class option
\end{verbatim}

Most of the time, classifiers look at the file name suffix as the only
clue.  This is captured by {\tt SFX\_CLASSIFIER} which carries a
partial function from suffixes to class names.  The function should
return {\tt NONE} if it does not know about the given argument suffix.

The {\tt GEN\_CLASSIFIER} carries a similar function---the only
difference being that the entire member name is passed to it.  (Suffix
classifiers could be implemented as general classifiers, but using
{\tt SFX\_CLASSIFIER} for them is slightly more efficient.)

Function {\tt Tools.stdSfxClassifier} is a simple wrapper around {\tt
SFX\_CLASSIFIER} and produces a classifier that looks for precisely
one suffix string.

\begin{verbatim}
  val stdSfxClassifier : { sfx: string , class: class } -> classifier
\end{verbatim}

Classifiers are registered with CM by invoking {\tt
Tools.registerClassifier}:

\begin{verbatim}
  val registerClassifier : classifier -> unit
\end{verbatim}

\subsubsection{Miscellaneous}

Structure {\tt Tools} also provides a number of other types and
functions with the purpose of making it easier to write rule
functions.

\noindent{\bf Filename extension:} Many tools derive the names of
their targets from the name of their source.  As discussed in
section~\ref{sec:addshellclass}, CM provides some support for this via
values of type {\tt extensionStyle}:

\begin{verbatim}
  type tooloptcvt = toolopts option -> toolopts option
  datatype extensionStyle =
      EXTEND of (string * class option * tooloptcvt) list
    | REPLACE of string list * (string * class option * tooloptcvt) list
\end{verbatim}

These values can not only be passed to {\tt
Tools.registerStdShellCmdTool} but also be used to let CM perform name
extension directly.  This is done by invoking function {\tt
Tools.extend}:

\begin{verbatim}
  val extend : extensionStyle ->
               (string * toolopts option) ->
               (string * class option * toolopts option) list
\end{verbatim}

\noindent{\bf Checking time stamps:} A tool can check whether a given
source file is older than all of its corresponding target files.

\begin{verbatim}
  val outdated : string -> string list * string -> bool
\end{verbatim}

Here, the first (curried) argument is the name of the tool, the string
list is the list of targets (as native file names), and other string
is the source (also as a native file name).

An alternative way of checking for outdated sources (in the style of
the Noweb-tool; see section~\ref{sec:builtin-tools:noweb}) is the
following:

\begin{verbatim}
  val outdated' : string ->
                  { src: string, wtn: string, tgt: string } -> bool
\end{verbatim}

The idea here is that if both {\tt tgt} (``target'') and {\tt wtn}
(``witness'') exist, then {\tt tgt} is considered outdated if {\tt
wtn} is older than {\tt src}.  Otherwise, if {\tt tgt} exists but {\tt
wtn} does not, then {\tt tgt} is considered outdated if it is older
than {\tt src}.  If {\tt tgt} does not exist, then it is always
considered outdated.

\noindent{\bf File- and directory-creation:}  To open a text file for
output in such a way that all directories leading up to it are created
when they do not already exist, use {\tt Tools.openTextOut}:

\begin{verbatim}
  val openTextOut : string -> TextIO.outstream
\end{verbatim}

To create the directories without opening the file (and without even
creating it if it does not exist), one can use function {\tt
Tools.makeDirs}:

\begin{verbatim}
  val makeDirs : string -> unit
\end{verbatim}

Note that the string passed to {\tt makeDirs} is still the name of a
file!

\noindent{\bf Option processing:}  For simple tools, the following
function for ``parsing'' tool options can come in handy:

\begin{verbatim}
  val parseOptions :
      { tool : string, keywords : string list, options : toolopts } ->
      { matches : string -> toolopts option, restoptions : string list }
\end{verbatim}

Given a list of accepted keywords, this function scans the tool
options and collects occurrences of sub-option lists labelled by one
of these keywords.  Any sub-option list that that is not recognized
and any keyword that occurs more than once will be rejected with an
error.  The result consists of a function {\tt matches} that can be
uses to query each of the keywords and a list of simple rest options.

\noindent{\bf Issuing diagnostics:}  Functions {\tt Tools.say} and
{\tt Tools.vsay} both take a list of strings and output the
concatenation of these strings to {\tt TextIO.stdOut}.  The difference
between {\tt say} and {\tt vsay} is that the former works
unconditionally while the latter is controlled by {\tt
CM.Control.verbose} (see section~\ref{sec:registers}).

\noindent{\bf Anchor-configurable strings:} Mainly for the purpose of
implementing anchor-configurable names for auxiliary shell commands
(such as {\tt ml-yacc}), one can invoke {\tt Tools.mkCmdName}:

\begin{verbatim}
  val mkCmdName : string -> string
\end{verbatim}

If $m$ is a path anchor that points to $d$, then {\tt (mkCmdName $m$)}
returns $d${\tt /}$m$; otherwise it returns $m$.

\noindent{\bf Querying the default class of a member:} One can
directly invoke CM's internal classification mechanism using {\tt
Tools.defaultClassOf}:

\begin{verbatim}
  val defaultClassOf : string -> class option
\end{verbatim}

\subsection{Plug-in Tools}
\label{sec:plugintools}

\subsubsection{Automatically-loaded, global plug-in tools}

If CM comes across a member class name $c$ that it does not know
about, then it tries to load a plugin module named {\tt \$/}$c${\tt
-tool.cm}.  If it sees a file whose name ends in suffix $s$ for which
no explicit member class has been specified in the CM description file
and for which automatic member classification fails, then it tries to
load a plugin module named {\tt \$/}$s${\tt -ext.cm}.  The so-loaded
module can then register the required tool which enables CM to
successfully deal with the previously unknown member.

This mechanism makes it possible for new tools to be added by simply
placing appropriately-named plug-in libraries in some convenient place
and making the corresponding adjustments to the anchor environment.
In other words, description files {\tt \$/}$c${\tt -tool.cm} and {\tt
\$/}$s${\tt -ext.cm} that correspond to general-purpose tools should
be registered using the path anchor mechanism.  If this is done,
actual description files for the tools' implementations can be placed
in arbitrary locations.

\subsubsection{Explicitly-loaded, local plug-in tools}
\label{sec:localtools}

Some projects might want to use their own special-purpose tools for
which a global installation is not convenient or not appropriate.  In
such a case, the project's description file can explicitly demand the
tool to be registered temporarily.  This is the purpose of the special
tool class {\tt tool}.  Example:

\begin{verbatim}
Library
    structure Foo
is
    bar-tool.cm : tool
    foo.b : bar
\end{verbatim}

Here, the member whose class is {\tt tool} (i.e, {\tt bar-tool.cm})
must be the CM description file of the tool's implementation.  The
difference to class {\tt cm} is that the so-specified library does not
become part of the current project but is loaded and linked
immediately via {\tt CM.load\_plugin}, causing one or more new classes
and their classifiers to be registered.

If we assume that loading {\tt bar-tool.cm} causes a class {\tt bar}
to be registered with its associated rule (e.g., by invoking {\tt
Tools.registerStdShellCmdTool}), the class name {\tt bar} will be
available for all subsequent members of the current description file.
Likewise, classifiers (e.g., filename suffixes) registered by {\tt
bar-tool.cm} will also be available.

The effect of registering classes and classifiers using class {\tt
tool} lasts until the end of the current description file and is
restricted to that file.  This means that other description files that
also want to use class {\tt bar} will have to have their own {\tt
tool} entry.

Local tool classes and suffixes temporarily override any equally-named
global classes or suffixes, respectively.

\subsubsection{Locally declared suffixes}
\label{sec:localsuffixes}

It is sometimes convenient to locally add another recognized filename
suffix to an already registered class.  This is done by using the
special tool class {\tt suffix}.  For example, a programmer who has
named all her ML files in such a way that they end in {\tt .ml}
could write near the beginning of her description file:

\begin{verbatim}
    ml : suffix (sml)
\end{verbatim}

For the remainder of the current description file, all such {\tt
.ml}-files will now be classified under {\tt sml}.

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0