Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] View of /sml/trunk/src/cm/Doc/05-classes.tex
ViewVC logotype

View of /sml/trunk/src/cm/Doc/05-classes.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 742 - (download) (as text) (annotate)
Thu Nov 30 14:09:32 2000 UTC (18 years, 9 months ago) by blume
File size: 19501 byte(s)
merging changes from private branch
% -*- latex -*-

\section{Member classes and tools}
\label{sec:classes}

Most members of groups and libraries are either plain ML files or
other description files.  However, it is possible to incorporate other
types of files---as long as their contents can in some way be expanded
into ML code or CM descriptions.  The expansion is carried out by CM's
{\it tools} facility.

CM maintains an internal registry of {\em classes} and associated {\em
rules}.  Each class represents the set of source files that its
corresponding rule is applicable to.  For example, the class {\tt
mlyacc} is responsible for files that contain input for the parser
generator ML-Yacc~\cite{tarditi90:yacc}.  The rule for {\tt mlyacc}
takes care of expanding an ML-Yacc specifications {\tt foo.grm} by
invoking the auxiliary program {\tt ml-yacc}.  The resulting ML files
{\tt foo.grm.sig} and {\tt foo.grm.sml} are then used as if their
names had directly been specified in place of {\tt foo.grm}.

CM knows a small number of built-in classes.  In many situations these
classes will be sufficient, but in more complicated cases it may be
worthwhile to add a new class.  Since class rules are programmed in
ML, adding a class is not as simple a matter as writing a rule for
{\sc Unix}' {\tt make} program~\cite{feldman79}.  Of course,
using ML has also advantages because it keeps CM extremely flexible in
what rules can do.  Moreover, it is not necessary to learn yet another
``little language'' in order to be able to program CM's tool facility.

When looking at the member of a description file, CM determines which
tool to use by looking at clues like the file name suffix.  However,
it is also possible to specify the class of a member explicitly.  For
this, the member name is followed by a colon {\bf :} and the name of
the member class.  All class names are case-insensitive.

In addition to genuine tool classes, there are four member classes
that refer to facilities internal to CM:
\begin{description}
\item[{\tt sml}] is the class of ordinary ML source files.
\item[{\tt cm}] is the class of CM library or group description files.
\item[{\tt tool}] is the class of {\em plugin tools}.  Its purpose is
to trigger the loading of an auxiliary plugin module---usually with the
purpose of extending the set of tool classes that CM understands.
See section~\ref{sec:plugintools} for more information.
\item[{\tt suffix}] is a class similar to {\tt tool}.  Its purpose is
to declare additional filename suffixes and their associated classes.
See section~\ref{sec:plugintools}.
\end{description}

By default, CM automatically classifies files with a {\tt .sml}
suffix, a {\tt .sig} suffix, or a {\tt .fun} suffix as ML-source, file
names ending in {\tt .cm} as CM descriptions.\footnote{Suffixes that
are not known and for which no plugin module can be found are treated
as ML source code.  However, as new tools are added there is no
guarantee that this behavior will be preserved in future versions of
CM.}

\subsection{Tool parameters}
\label{sec:toolparam}

In many cases the name of the member that caused a rule to be invoked
is the only input to that rule.  However, rules can be written in such
a way that they take additional parameters.  Those parameters, if
present, must be specified in the CM description file between
parentheses following the name of the member and the optional member
class.

CM's core mechanism parses these tool options and breaks them up into
a list of items, where each item is either a filename (i.e., {\em
looks} like a filename) or a named list of sub-options.  However, CM
itself does not interpret the result but passes it on to the tool's
rule function.  It is in each rule's own responsibility to assign
meaning to its options.

All named sub-option lists (for any class) are specified by a name
string followed by a colon {\bf :} and a parenthesized list of other
tool options.  If the list contains precisely one element, then its
parentheses may be omitted.

\subsubsection{Parameters for class {\tt sml}}

The {\tt sml} class accepts two optional parameters.  One is the {\em
sharing annotation} that was explained earlier (see
Section~\ref{sec:sharing}).  The sharing annotation must be one of the
two strings {\tt shared} and {\tt private}.  If {\tt shared} is
specified, then dynamic state created by the compilation unit at
link-time must be shared across invocations of {\tt CM.make} or {\tt
CM.autoload}.  The {\tt private} annotation, on the other hand, means
that dynamic state cannot be shared across such calls to {\tt CM.make}
or {\tt CM.autoload}.

The other possible parameter for class {\tt sml} is a sub-option
list labeled {\tt setup} and can be used to specify code that will be
executed just before and just after the compiler is invoked for the
ML source file.  Code to be executed before compilation is labeled
{\tt pre}, code to be executed after compilation is complete is
labeled {\tt post}; either part is optional.  Executable code itself
is specified using strings that contain ML source text.

For example, if one wishes to disable warning messages for a specific
source file {\tt poorlywritten.sml} (but not for others), then one
could write:

\begin{verbatim}
  poorlywritten.sml (setup:(pre: "local open Compiler.Control\n\
                                 \   in val w = !printWarnings before\n\
                                 \              printWarnings := false\n\
                                 \  end;"
                            post:"Compiler.Control.printWarnings := w;"))
\end{verbatim}

\noindent Note that neither the pre- nor the post-section will be
executed if the ML file does not need to be compiled.

The pre-section is compiled and executed in the current
toplevel-environment while the post-section uses the
toplevel-environment augmented with definitions from the pre-section.
After the ML file has been compiled and the post-section (if present)
has completed execution, definitions made by either section will be
erased.  This means that setup code for other files {\em cannot} refer
to them, and neither can code that in the future might be entered at
top level.

\subsubsection{Parameters for class {\tt cm}}
\label{sec:toolparam:cm}

The {\tt cm} class understands two kinds of parameters.  The first is
a named parameter labeled by the string {\tt version}.  It must have
the format of a version number.  CM will interpret this as a version
request, thereby insuring that the imported library is not too old or
too new. (See section~\ref{sec:versions} for more on this topic.)

Example:

\begin{verbatim}
  euler.cm (version:2.71828)
  pi.cm    (version:3.14159)
\end{verbatim}

Normally, CM looks for stable library files in directory
{\tt CM/}{\it arch}{\tt -}{\it os} (see section~\ref{sec:files}).
However, if an explicit version has been requested, it will first try
directory {\tt CM/}{\it version}{\tt /}{\it arch}{\tt -}{\it os}
before looking at the default location.  This way it is possible to
keep several versions of the same library in the file system.

However, CM normally does {\em not} permit the simultaneous use of
multiple versions of the same library in one session.  The
disambiguating rule is that the version that gets loaded first
``wins''; subsequent attempts to load different versions result in
warnings or errors.  (See the discussion of {\tt CM.unshare} in
section~\ref{sec:libreg} for how to to circumvent this restriction.)

The second kind of parameter understood by {\tt cm} is a named
parameter labeled by the string {\tt bind} (see
Section~\ref{sec:anchor:env}).  It can occur arbitrarily many times
and each occurence must be a suboption-list of the form {\tt
(anchor:$a$ value:$v$)}.  The set of {\tt bind}-parameters augments
the current anchor environment to form the environment that is used
while processing the contents of the named CM description file.

\subsubsection{Parameters for classes {\tt tool} and {\tt suffix}}

Class {\tt tool} (see the discussion is section~\ref{sec:localtools})
does not accept any parameters.

Class {\tt suffix} (see section~\ref{sec:localsuffixes}) takes one
mandatory parameter which is either simply a class name or the same
class name labeled by {\tt class}.  Thus, the following two lines are
equivalent:

\begin{verbatim}
ml : suffix (sml)
ml : suffix (class:sml)
\end{verbatim}

There are no recognized filename suffixes for these two classes.

\subsection{Built-in tools}
\label{sec:builtin-tools}

\subsubsection{ML-Yacc}

The ML-Yacc tool is responsible for files that are input to the
ML-Yacc parser generator.  Its class name is {\tt mlyacc}.  Recognized
file name suffixes are {\tt .grm} and {\tt .y}.  For a source file
$f$, the tool produces two targets $f${\tt .sig} and $f${\tt .sml},
both of which are always treated as ML source files.  The {\tt mlyacc}
class accepts two optional tool parameters labeled {\tt sigoptions}
and {\tt smloptions}.  They specify tool options to be passed on to
the generated {\tt .sig}- and {\tt .sml}-files, respectively.
Example\footnote{Since the generated {\tt .sig}-file contains nothing
more than an ML signature definition, it is typically not very useful
to pass any options to it.}:

\begin{verbatim}
  lang.grm (sigoptions:(setup:(pre:"print \"compiling lang.grm.sig\\n\";"))
            smloptions:(private))
\end{verbatim}

The tool invokes the {\tt ml-yacc} command if the targets are
``outdated''.  A target is outdated if it is missing or older than the
source.  Unless anchored using the path anchor mechanism (see
Section~\ref{sec:anchor:env}), the command {\tt ml-yacc} will be located
using the operating system's path search mechanism (e.g., the {\tt
\$PATH} environment variable).

\subsubsection{ML-Lex}

The ML-Lex tool governs files that are input to the ML-Lex lexical
analyzer generator~\cite{appel89:lex}.  Its class name is {\tt mllex}.
Recognized file name suffixes are {\tt .lex} and {\tt .l}.  For a
source file $f$, the tool produces one targets $f${\tt .sml} which
will always be treated as ML source code.  Tool parameters are passed
on without change to that file.

The tool invokes the {\tt ml-lex} command if the target is outdated
(just like in the case of ML-Yacc).  Unless anchored using the path
anchor mechanism (see Section~\ref{sec:anchor:env}), the command {\tt
ml-lex} will be located using the operating system's path search
mechanism (e.g., the {\tt \$PATH} environment variable).

\subsubsection{ML-Burg}

The ML-Burg tool deals with files that are input to the ML-Burg
code-generater generator~\cite{mlburg93}.  Its class name is {\tt
mlburg}.  The only recognized file name suffix is {\tt .burg}.  For a
source file $f${\tt .burg}, the tool produces one targets $f${\tt
.sml} which will always be treated as ML source code.  Any tool
parameters are passed on without change to the target.

The tool invokes the {\tt ml-burg} command if the target is outdated.
Unless anchored using the path anchor mechanism (see
Section~\ref{sec:anchor:env}), the command {\tt ml-lex} will be located
using the operating system's path search mechanism (e.g., the {\tt
\$PATH} environment variable).

\subsubsection{Shell}

The Shell tool can be used to specify arbitrary shell commands to be
invoked on behalf of a given file.  The name of the class is {\tt
shell}.  There are no recognized file name suffixes.  This means that
in order to use the shell tool one must always specify the {\tt shell}
member class explicitly.

The rule for the {\tt shell} class relies on tool parameters.  The
parameter list must be given in parentheses and follow the {\tt shell}
class specification.

Consider the following example:

\begin{verbatim}
  foo.pp : shell (target:foo.sml options:(shared)
                        /lib/cpp -P -Dbar=baz %s %t)
\end{verbatim}

This member specification says that file {\tt foo.sml} can be obtained
from {\tt foo.pp} by running it through the C preprocessor {\tt cpp}.
The fact that the target file is given as a tool parameter implies
that the member itself is the source.  The named parameter {\tt
options} lists the tool parameters to be used for that target. (In the
example, the parentheses around {\tt shared} are optional because it
is the only element of the list.) The command line itself is given by
the remaining non-keyword parameters.  Here, a single {\bf \%s} is
replaced by the source file name, and a single {\bf \%t} is replaced
by the target file name; any other string beginning with {\bf \%} is
shortened by its first character.

In the specification one can swap the positions of source and target
(i.e., let the member name be the target) by using a {\tt source}
parameter:

\begin{verbatim}
  foo.sml : shell (source:foo.pp options:shared
                         /lib/cpp -P -Dbar=baz %s %t)
\end{verbatim}

Exactly one of the {\tt source} and {\tt target} parameters must be
specified; the other one is taken to be the member name itself.  The
target class can be given by writing a {\tt class} parameter whose
single sub-option must be the desired class name.

The usual distinction between native and standard filename syntax
applies to any given {\tt source} or {\tt target} parameter.

For example, if one were working on a Win32 system and the target file
is supposed to be in the root directory on volume {\tt D:},
then one must use native syntax to write it.  One way of doing this
would be:

\begin{verbatim}
  "D:\\foo.sml" : shell (source : foo.pp options : shared
                               cpp -P -Dbar=baz %s %t)
\end{verbatim}

\noindent As a result, {\tt foo.sml} is interpreted using native
syntax while {\tt foo.pp} uses standard conventions (although in this
case it does not make a difference).  Had we used the {\tt target}
version from above, one would have to write:

\begin{verbatim}
  foo.pp : shell (target : "D:\\foo.sml" options : shared
                                 cpp -P -Dbar=baz %s %t)
\end{verbatim}

The shell tool invokes its command whenever the target is outdated
with respect to the source.

\subsubsection{Make}

The Make tool (class {\tt make}) can (almost) be seen as a specialized
version of the Shell tool.  It has no source and one target (the
member itself) which is always considered outdated.  As with the Shell
tool, it is possible to specify target class and parameters using the
{\tt class} and {\tt options} keyword parameters.

The tool invokes the shell command {\tt make} on the target.  Unless
anchored using the path anchor mechanism~\ref{sec:anchor:env}, the
command will be located using the operating system's path search
mechanism (e.g., the {\tt \$PATH} environment variable).

Any parameters other than the {\tt class} and {\tt options}
specifications must be plain strings and are given as additional
command line arguments to {\tt make}.  The target name is always the
last command line argument.

Example:

\begin{verbatim}
  bar-grm : make (class:mlyacc -f bar-grm.mk)
\end{verbatim}

Here, file {\tt bar-grm} is generated (and kept up-to-date) by
invoking the command:
\begin{verbatim}
  make -f bar-grm.mk bar-grm
\end{verbatim}
\noindent The target file is then treated as input for {\tt ml-yacc}.

Cascading Shell- and Make-tools is easily possible.  Here is an
example that first uses Make to build {\tt bar.pp} and then filters
the contens of {\tt bar.pp} through the C preprocessor to arrive at
{\tt bar.sml}:

\begin{verbatim}
  bar.pp : make (class:shell
                     options:(target:bar.sml cpp -Dbar=baz %s %t)
                 -f bar-pp.mk)
\end{verbatim}

\subsubsection{Noweb}
\label{sec:builtin-tools:noweb}

The {\tt noweb} class handles sources written for Ramsey's {\it noweb}
literate programming facility~\cite{ramsey:simplified}.  Files ending
with suffix {\tt .nw} are automatically recognized as belonging to
this class.

The list of targets that are to be extracted from a noweb file must be
specified using tool options.  A target can then have a variety of its
own options.  Each target is specified by a separate tool option
labelled {\tt target}.  The option usually has the form of a
sub-option list.  Recognized sub-options are:

\begin{description}
\item[name] the name of the target
\item[root] the (optional) root tag for the target (given to the {\tt
-R} command line switch for the {\tt notangle} command); if {\tt root}
is missing, {\tt name} is used instead
\item[class] the (optional) class of the target
\item[options] (optional) options for the tool that handles the
target's class
\item[lineformat] a string that will be passed to the {\tt -L} command
line option of {\tt notangle}
\end{description}

Example:

\begin{verbatim}
  project.nw (target:(name:main.sml options:(private))
              target:(name:grammar class:mlyacc)
              target:(name:parse.sml))
\end{verbatim}

In place of the sub-option list there can be a single string option
which will be used for {\tt name} or even an unnamed parameter (i.e.,
without the {\tt target} label).  If no targets are specified, the
tool will assume two default targets by stripping the {\tt .nw}
suffix (if present) from the source name and adding {\tt .sig} as well
as {\tt .sml}.

The following four examples are all equivalent:

\begin{verbatim}
  foo.nw (target:(name:foo.sig) target:(name:foo.sml))
  foo.nw (target:foo.sig target:foo.sml)
  foo.nw (foo.sig foo.sml)
  foo.nw
\end{verbatim}

If {\tt lineformat} is missing, then a default based on the target
class is used.  Currently only the {\tt sml} and {\tt cm} classes are
known to CM; other classes can be added or removed by using the {\tt
NowebTool.lineNumbering} controller function exported from library
{\tt \$/noweb-tool.cm}:

\begin{verbatim}
  val lineNumbering: string -> { get: unit -> string option,
                                 set: string option -> unit }
\end{verbatim}

The {\tt noweb} class accepts two other parameter besides {\tt
target}:

\begin{description}
\item[subdir] specifies a sub-option that is used to specify a
directory where derived files (i.e., target files and witness files as
far as they have been specified using relative path names) are
created.  If the {\tt subdir} option is missing, its value defaults to
{\tt NW}.
\item[witness] specifies an auxiliary derived file whose time stamp is
used by CM to avoid recompiling extracted files whose contents have
not changed.  If {\tt witness} has not been specified, then CM uses
time stamps on extracted files directly to determine whether {\tt
notangle} needs to be run.  Thus, with no witness, any change to the
master file causes time stamps on all extracted files to be updated as
well.  If a witness was specified, then CM will write over extracted
files, causing their time stamps to change, only if their contents
have also changed.  The {\tt subdir} specification also applies to the
name of the witness file.
\end{description}

Example:

\begin{verbatim}
  foo.nw (subdir:NOWEBFILES
          witness:foo.wtn
          target:(name:main.sml))
\end{verbatim}

Here, the files named {\tt main.sml} and {\tt foo.wtn} will be
created as
\begin{verbatim}
  NOWEBFILES/main.sml
  NOWEBFILES/foo.wtn
\end{verbatim}
\noindent while without the {\tt subdir}-option it would have been
\begin{verbatim}
  NW/main.sml
  NW/foo.wtn
\end{verbatim}
\noindent To avoid the creation of such a sub-directory, one can use
the {\em current arc} ``{\bf .}'' and write:
\begin{verbatim}
  foo.nw (subdir:.
          witness:foo.wtn
          target:(name:main.sml))
\end{verbatim}

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0