Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] View of /sml/trunk/src/ml-nlffi-lib/Doc/manual/nlffi.tex
ViewVC logotype

View of /sml/trunk/src/ml-nlffi-lib/Doc/manual/nlffi.tex

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1568 - (download) (as text) (annotate)
Wed Jul 28 23:23:57 2004 UTC (15 years, 5 months ago) by mblume
File size: 19653 byte(s)
slowly fleshing out the nlffi manual
% -*- latex -*-
\documentclass[titlepage,letterpaper]{article}
\usepackage{latexsym}
\usepackage{times}
\usepackage{hyperref}

\newcommand{\gentool}{{\tt ml-nlffigen}}

\marginparwidth0pt\oddsidemargin0pt\evensidemargin0pt\marginparsep0pt
\topmargin0pt\advance\topmargin by-\headheight\advance\topmargin by-\headsep
\textwidth6.7in\textheight9.1in
\columnsep0.25in

\newcommand{\smlmj}{110}
\newcommand{\smlmn}{46}

\author{Matthias Blume \\
Toyota Technological Institute at Chicago}

\title{{\bf NLFFI}\\
A new SML/NJ Foreign-Function Interface \\
{\it\small (for SML/NJ version \smlmj.\smlmn~and later)} \\
User Manual}

\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 3pt minus 2pt}

\newcommand{\nt}[1]{{\it #1}}
\newcommand{\tl}[1]{{\underline{\bf #1}}}
\newcommand{\ttl}[1]{{\underline{\tt #1}}}
\newcommand{\ar}{$\rightarrow$\ }
\newcommand{\vb}{~$|$~}

\begin{document}

\bibliographystyle{alpha}

\maketitle

\pagebreak

\tableofcontents

\pagebreak

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}

Introduce...

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The C Library}

The C library...

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Translation conventions}

The {\gentool} tool generates one ML structure for each
exported C definition.  In particular, there is one structure per
external variable, function, {\tt typedef}, {\tt struct}, {\tt union},
and {\tt enum}.
Each generated ML structure contains the ML type and values necessary
to manipulate the corresponding C item.

%-------------------------------------------------------------------------
\subsection{External variables}

An external C variable $v$ of type $t_C$ is represented by an ML
structure {\tt G\_}$v$.  This structure always contains a type {\tt t}
encoding $t_C$ and a value {\tt obj'} providing (``light-weight'')
access to the memory location that $v$ stands for in C.  If $t_C$ is
{\em complete}, then {\tt G\_}$v$ will also contain a value {\tt obj}
(the ``heavy-weight'' equivalent of {\tt obj'}) as well as value {\tt
  typ} holding run-time type information corresponding to $t_C$ (and
{\tt t}).

\paragraph*{Details}

\begin{description}\setlength{\itemsep}{0pt}
\item[{\tt type t}] is the type to be substituted for $\tau$ in {\tt
    ($\tau$, $\zeta$) C.obj} to yield the correct type for ML values
  representing C memory objects of type $t_C$ (i.e., $v$'s type).
  (This assumes a properly instantiated $\zeta$ based on whether or
  not the corresponding object was declared {\tt const}.)
\item[!{\tt val typ}] is the run-time type information corresponding
  to type {\tt t}.  The ML type of {\tt typ} is {\tt t C.T.typ}.  This
  value is not present if $t_C$ is {\em incomplete}.
\item[!{\tt val obj}] is a function that returns the ML-side
  representative of the C object (i.e., the memory location) referred
  to by $v$.  Depending on whether or not $v$ was declared {\tt
    const}, the type of {\tt obj} is either {\tt unit -> (t, C.ro)
    C.obj} or {\tt unit -> (t, C.rw) C.obj}.  The result of {\tt
    obj()} is ``heavy-weight,'' i.e., it implicitly carries run-time
  type information.  This value is not present if $t_C$ is {\em
    incomplete}.
\item[{\tt val obj'}] is analogous to {\tt val obj}, the only
  difference being that its result is ``light-weight,'' i.e., without
  run-time type information.  The type of {\tt val obj'} is
  either {\tt unit -> (t, C.ro) C.obj} or {\tt unit -> (t, C.rw) C.obj}.
\end{description}

\subsubsection*{Examples}

\begin{small}
\begin{center}
\begin{tabular}{c|c}
C declaration & signature of ML-side representation \\ \hline\hline
\begin{minipage}{2in}
\begin{verbatim}
extern int i;
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure G_i : sig
    type t   = C.sint
    val typ  : t C.T.typ
    val obj  : unit -> (t, C.rw) C.obj
    val obj' : unit -> (t, C.rw) C.obj'
end

\end{verbatim}
\end{minipage}
\\ \hline
\begin{minipage}{2in}
\begin{verbatim}
extern const double d;
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure G_d : sig
    type t   = C.double
    val typ  : t C.T.typ
    val obj  : unit -> (t, C.ro) C.obj
    val obj' : unit -> (t, C.ro) C.obj'
end

\end{verbatim}
\end{minipage}
\\ \hline
\begin{minipage}{2in}
\begin{verbatim}
extern struct str s1;
/* str complete */
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure G_s1 : sig
    type t   = (S_str.tag, rw) C.su_obj C.ptr
    val typ  : t C.T.typ
    val obj  : unit -> (t, C.rw) C.obj
    val obj' : unit -> (t, C.rw) C.obj'
end

\end{verbatim}
\end{minipage}
\\ \hline
\begin{minipage}{2in}
\begin{verbatim}
extern struct istr s2;
/* istr incomplete */
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure G_s2 : sig
    type t   = (ST_istr.tag, rw) C.su_obj C.ptr
    val obj' : unit -> (t, C.rw) C.obj'
end

\end{verbatim}
\end{minipage}
\end{tabular}
\end{center}
\end{small}

%-------------------------------------------------------------------------
\subsection{Functions}

An external C function $f$ is represented by an ML structure {\tt
  F\_}$f$.  Each such structure always contains at last three values:
{\tt typ}, {\tt fptr}, and {\tt f'}.  Variable {\tt typ} holds
run-time type information regarding function pointers that share $f$'s
prototype.  The most important part of this information is the code
that implements native C calling conventions for these functions.
Variable {\tt fptr} provides access to a C pointer to $f$.  And {\tt
  f'} is an ML function that dispatches a call of $f$ (through {\tt
  fptr}), using ``light-weight'' types for arguments and results.  If
the result type of $f$ is {\em complete}, then {\tt F\_}$f$ will also
contain a function {\tt f}, using ``heavy-weight'' argument- and
result-types.

\paragraph*{Details}

\begin{description}\setlength{\itemsep}{0pt}
\item[{\tt val typ}] holds run-time type information for pointers to
  functions of the same prototype.  The ML type of {\tt typ} is {\tt
    ($A$ -> $B$) C.fptr C.T.typ} where $A$ and $B$ are types encoding
  $f$'s argument list and result type, respectively.  A
  description of $A$ and $B$ is given below.
\item[{\tt val fptr}] is a function that returns the (heavy-weight)
  function pointer to $f$. The type of {\tt fptr} is {\tt unit -> ($A$
    -> $B$) C.fptr}.  The encodings of argument- and result types in
  $A$ and $B$ is the same as the one used for {\tt typ} (see below).
  Notice that although {\tt fptr} is a heavy-weight value carrying
  run-time type information, pointer arguments within $A$ or $B$ still
  use the light-weight version!
\item[!{\tt val f}] is an ML function that dispatches a call to $f$
  via {\tt fptr}.  For convenience, {\tt f} has built-in conversions
  for arguments (from ML to C) and the result (from C to ML).  For
  example, if $f$ has an argument of type {\tt double}, then {\tt f}
  will take an argument of type {\tt MLRep.Real.real} in its place and
  implicitly convert it to its C equivalent using {\tt
    C.Cvt.c\_double}.  Similarly, if $f$ returns an {\tt unsigned
    int}, then {\tt f} has a result type of {\tt MLRep.Unsigned.word}.
  This is done for all types that have a conversion function in
  {\tt C.Cvt}.
  Pointer values (as well as the object argument used for {\tt
    struct}- or {\tt union}-return values) are taken and returned in
  their heavy-weight versions.  Function {\tt f} will not be generated
  if the return type of $f$ is incomplete.
\item[{\tt val f'}] is the light-weight equivalent to {\tt f}.  a
  light-weight function.  The main difference is that pointer- and
  object-values are passed and returned in their light-weight
  versions.
\end{description}

\subsubsection*{Type encoding rules for {\tt ($A$ -> $B$) C.fptr}}

A C function $f$'s prototype is encoded as an ML type {\tt $A$ ->
  $B$}.  Calls of $f$ from ML take an argument of type $A$ and
produce a result of type $B$.

\begin{itemize}
\item Type $A$ is constructed from a sequence $\langle T_1, \ldots,
  T_k \rangle$ of types.  If that sequence is empty, then {\tt $A =$
    unit}; if the sequence has only one element $T_1$, then $A = T_1$.
  Otherwise $A$ is a tuple type {\tt $T_1$ * $\ldots$ * $T_k$}.
\item If $f$'s result is neither a {\tt struct} nor a {\tt union},
  then $T_1$ encodes the type of $f$'s first argument, $T_2$ that of
  the second, $T_3$ that of the third, and so on.
\item If $f$'s result is some {\tt struct} or some {\tt union}, then
  $T_1$ will be {\tt ($\tau$, C.rw) C.su\_obj'} with $\tau$
  instantiated to the appropriate {\tt struct}- or {\tt union}-tag
  type.  Moreover, we then also have $B = T_1$. $T_2$ encodes the type
  of $f$'s {\em first} argument, $T_3$ that of the second.  (In
  general, $T_{i+1}$ will encode the type of the $i$th argument of
  $f$ in this case.)
\item The encoding of the $i$th argument of $f$ ($T_i$ or $T_{i+1}$
  depending on $f$'s return type) is the light-weight ML equivalent of
  the C type of that argument.
\item An argument of C {\tt struct}- or {\tt union}-type corresponds
  to {\tt ($\tau$, C.ro) C.su\_obj'} with $\tau$ instantiated to the
  appropriate tag type.
\item If $f$'s result type is {\tt void}, then {\tt $B =$ unit}.  If
  the result type is not a {\tt struct}- or {\tt union}-type, then $B$
  is the light-weight ML encoding of that type.  Otherwise $B = T_1$
  (see above).
\end{itemize}

\subsubsection*{Examples}

\begin{small}
\begin{center}
\begin{tabular}{c|c}
C declaration & signature of ML-side representation \\ \hline\hline
{\tt void f1 (void);}
&
\begin{minipage}{4in}
\begin{verbatim}

structure F_f1 : sig
    val typ  : (unit -> unit) C.fptr C.T.typ
    val fptr : unit -> (unit -> unit) C.fptr
    val f    : unit -> unit
    val f'   : unit -> unit
end

\end{verbatim}
\end{minipage}
\\ \hline
{\tt int f2 (void);}
&
\begin{minipage}{4in}
\begin{verbatim}

structure F_f2 : sig
    val typ  : (C.sint -> unit) C.fptr C.T.typ
    val fptr : unit -> (C.sint -> unit) C.fptr
    val f    : MLRep.Signed.int -> unit
    val f'   : MLRep.Signed.int -> unit
end

\end{verbatim}
\end{minipage}
\\ \hline
{\tt void f3 (int);}
&
\begin{minipage}{4in}
\begin{verbatim}

structure F_f3 : sig
    val typ  : (unit -> C.sint) C.fptr C.T.typ
    val fptr : unit -> (unit -> C.sint) C.fptr
    val f    : unit -> MLRep.Signed.int
    val f'   : unit -> MLRep.Signed.int
end

\end{verbatim}
\end{minipage}
\\ \hline
{\tt void f4 (double, struct s*);}
&
\begin{minipage}{4in}
\begin{verbatim}

structure F_f4 : sig
    val typ  : (C.double *
                (ST_s.tag, C.rw) C.su_obj C.ptr'
                -> unit)
                    C.fptr C.T.typ
    val fptr : unit -> (C.double *
                        (ST_s.tag, C.rw) C.su_obj C.ptr'
                        -> unit) C.fptr
    val f    : MLRep.Real.real *
               (ST_s.tag, C.rw) C.su_obj C.ptr
               -> unit
    val f'   : MLRep.Real.real *
               (ST_s.tag, C.rw) C.su_obj C.ptr'
               -> unit
end

\end{verbatim}
\end{minipage}
\end{tabular}
\end{center}
\end{small}

\begin{small}
\begin{center}
\begin{tabular}{c|c}
C declaration & signature of ML-side representation \\ \hline\hline
\begin{minipage}{2in}
\begin{verbatim}
struct s *f5 (float);
/* s incomplete */
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure F_f5 : sig
    val typ  : (C.float
                -> (ST_s.tag, C.rw) C.su_obj C.ptr')
                    C.fptr C.T.typ
    val fptr : unit -> (C.float
                       -> (ST_s.tag, C.rw) C.su_obj C.ptr')
                           C.fptr
    val f'   : MLRep.Real.real ->
               (ST_s.tag, C.rw) C.su_obj C.ptr'
end

\end{verbatim}
\end{minipage}
\\ \hline
\begin{minipage}{2in}
\begin{verbatim}
struct t *f6 (float);
/* t complete */
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure F_f6 : sig
    val typ  : (C.float
                -> (S_t.tag, C.rw) C.su_obj C.ptr')
                    C.fptr C.T.typ
    val fptr : unit -> (C.float
                       -> (S_t.tag, C.rw) C.su_obj C.ptr')
                           C.fptr
    val f    : MLRep.Real.real ->
               (S_t.tag, C.rw) C.su_obj C.ptr
    val f'   : MLRep.Real.real ->
               (S_t.tag, C.rw) C.su_obj C.ptr'
end

\end{verbatim}
\end{minipage}
\\ \hline
\begin{minipage}{2in}
\begin{verbatim}
struct t f7 (int, double);
/* t complete */
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure F_f7 : sig
    val typ  : ((S_t.tag, C.rw) C.su_obj' *
                C.sint * C.double
                -> (S_t.tag, C.rw) C.su_obj')
                    C.fptr C.T.typ
    val fptr : unit -> ((S_t.tag, C.rw) C.su_obj' *
                        C.sint * C.double
                        -> (S_t.tag, C.rw) C.su_obj')
                            C.fptr
    val f    : (S_t.tag, C.rw) C.su_obj *
               MLRep.Signed.int *
               MLRep.Real.real
               -> (S_t.tag, C.rw) C.su_obj
    val f'   : (S_t.tag, C.rw) C.su_obj' *
               MLRep.Signed.int *
               MLRep.Real.real
               -> (S_t.tag, C.rw) C.su_obj'
end

\end{verbatim}
\end{minipage}
\end{tabular}
\end{center}
\end{small}

\subsection{Type definitions ({\tt typedef})}

In C a {\tt typedef} declaration associates a type name $t$ with a
type $t_C$.  On the ML side, $t$ is represented by an ML structure
{\tt T\_$t$}.  This structure contains a type abbreviation {\tt t} for
the ML encoding of $t_C$ and, provided $t_C$ is not {\em incomplete},
a value {\tt typ} of type {\tt t C.T.typ} with run-time type
information regarding $t_C$.

\subsubsection*{Examples}

\begin{small}
\begin{center}
\begin{tabular}{c|c}
C declaration & signature of ML-side representation \\ \hline\hline
\begin{minipage}{2in}
\begin{verbatim}
typedef int t1;
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure T_t1 : sig
    type t   = C.sint
    val typ  : t C.T.typ
end

\end{verbatim}
\end{minipage}
\\ \hline
\begin{minipage}{2in}
\begin{verbatim}
typedef struct s t2;
/* s incomplete */
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure T_t2 : sig
    type t  = ST_s.tag C.su
end

\end{verbatim}
\end{minipage}
\\ \hline
\begin{minipage}{2in}
\begin{verbatim}
typedef struct s *t3;
/* s incomplete */
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure T_t3 : sig
    type t  = (ST_s.tag, C.rw) C.su_obj C.ptr
end

\end{verbatim}
\end{minipage}
\\ \hline
\begin{minipage}{2in}
\begin{verbatim}
typedef struct t t4;
/* t complete */
\end{verbatim}
\end{minipage}
&
\begin{minipage}{4in}
\begin{verbatim}

structure T_t4 : sig
    type t  = ST_t.tag C.su
    val typ : t T.typ
end

\end{verbatim}
\end{minipage}
\end{tabular}
\end{center}
\end{small}

\subsection{{\tt struct} and {\tt union}}
 
The type identity of a named C {\tt struct} (or {\tt union}) is
provided by a unique ML {\em tag} type.  There is a 1-1 correspondence
between C tag names $t$ for {\tt struct}s on one side and ML tag types
$s_t$ on the other.  An analogous correspondence exists between C tag
names $t$ for {\tt union}s and ML tag types $u_t$.  Notice that these
correspondences are {\em independent of the actual declaration} of the
C {\tt struct} or {\tt union} in question.

A C type of the form {\tt struct $t$} is represented in ML as {\tt
  $s_t$ C.su}, a type of the form {\tt union $t$} as {\tt $u_t$ C.su}.
For example, this means that a heavy-weight non-constant memory object
of C type {\tt struct $t$} has ML type {\tt ($s_t$ C.su, C.rw) C.obj}
which can be abbreviated to {\tt ($s_t$, C.rw) C.su\_obj}.

All ML types {\tt ($\tau$ C.su, $\zeta$) C.obj} are originally
completely abstract: they does not come with any operations that could
be applied to their values.  In C, the operations to be applied to a
{\tt struct}- or {\tt union}-value is field selection.  Field
selection {\em does} depend on the actual C declaration, so it is
{\gentool}'s job to generate a set of ML-side field-accessors that
correspond to field-access operations in C.

Each field is represented by a function mapping a memory object of the
{\tt struct}- or {\tt union}-type to an object of the respective field
type.  Let {\tt int i;} and {\tt const double d;} be fields of some
{\tt struct t} and let {\tt tag} be the ML tag type corresponding to
{\tt t}.  Here are the types of the (heavy-weight) access functions
for {\tt i} and {\tt d}:

\begin{small}
\begin{center}
\begin{tabular}{l@{~~~~$\leadsto$~~~~}l}
{\tt int i;} &
  {\tt val f\_i : (tag C.su, 'c) C.obj -> (C.sint, 'c) C.obj} \\
{\tt const double d;} &
  {\tt val f\_d : (tag C.su, 'c) C.obj -> (C.double, C.ro) C.obj}
\end{tabular}
\end{center}
\end{small}

\noindent Notice how each field access function is polymorphic in the
{\tt const} property of the argument object.  For fields declared {\tt
  const}, the result always uses {\tt C.ro} while for ordinary fields
the argument's type is used---reflecting the idea that a field is
considered writable if it has not been declared {\tt const} and, at
the same time, the enclosing {\tt struct} or {\tt union} is writable.

\subsubsection*{Incomplete declarations}

If the {\tt struct} or {\tt union} is incomplete (i.e., if only its
tag $t$ is known), then {\gentool} will merely generate an ML structure
(called {\tt ST\_$t$} for {\tt struct} and {\tt UT\_$t$} for {\tt
  union}) with a single type {\tt tag} that is an abbreviation for the
library-defined type that corresponds to tag $t$.

\subsubsection*{Complete declarations}

If the {\tt struct} or {\tt union} with tag $t$ is complete, then
{\gentool} will generate an ML structure (called {\tt S\_$t$} for {\tt
  struct} and {\tt U\_$t$} for {\tt union}) which contains at least:
\begin{description}\setlength{\itemsep}{0pt}
\item[{\tt type tag}] --- an abbreviation for the library-defined type
  that corresponds to $t$
\item[{\tt val size}] --- a value representing information about the
  size of memory objects of this {\tt struct}- or {\tt union}-type.
  The ML type of {\tt size} is {\tt tag C.su C.S.size}.
\item[{\tt val typ}] --- a value representing run-time type
  information corresponding to this {\tt struct}- or {\tt union}-type.
  The ML type of {\tt typ} is {\tt tag C.su C.T.typ}.
\end{description}
In addition to this, there will be a light-weight access function {\tt
  f\_$f$'} for each field or bitfield $f$ of the {\tt struct} or {\tt
  union}.

If $f$ is a regular field, then {\tt f\_$f$'} maps a value of type
{\tt (tag C.su, $\zeta$) C.obj'} to a value of type {\tt (${\tau}_f$,
  ${\zeta}_f$) C.obj'} where ${\tau}_f$ is the ML type encoding the C
type of field $f$ and where {\tt ${\zeta}_f =$ C.rw} when $f$ was
declared {\tt const} or ${\zeta}_f = \zeta$ otherwise.

If $f$ is a bitfield, then the result type of {\tt f\_$f$'} is either
{\tt ${\zeta}_f$ C.sbf} or {\tt ${\zeta}_f$ C.ubf}, depending on
whether the bitfield's C type is {\tt signed} or {\tt unsigned}.

For every field $f$ that is either a regular field of complete type or
a bitfield there is also a heavy-weight access function {\tt f\_$f$}
which maps {\tt (tag C.su, $\zeta$) C.obj} to {\tt (${\tau}_f$,
  ${\zeta}_f$) C.obj}, {\tt ${\zeta}_f$ C.sbf}, or {\tt ${\zeta}_f$
  C.ubf}.

\subsection{Enumerations ({\tt enum})}

...

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\appendix
%\input{A-syntax}

\bibliography{blume,appel,ml}

\end{document}

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0