\section{The MLRISC Machine Description Language}
\subsection{ Overview }
\newdef{MDGen} is a machine description language
is designed to automate
various mundane and error prone tasks in developing a back-end for
MLRISC. Currently, to target a new
architecture the programmer must provide the following set of modules
written in Standard ML:
\begin{itemize}
\item \codehref{instructions/cells.sig}{CELLS} --
the properties of the register set and (some part of) memory hierarchy.
\item \codehref{instructions/instructions.sig}{INSTRUCTIONS} --
the concrete instruction set representation.
\item \codehref{instructions/insnProps.sig}{INSNS_PROPERTIES} --
properties of the instructions.
\item \codehref{instructions/shuffle.sig}{SHUFFLE} --
methods to emit linearized code from parallel copies.
\item \codehref{emit/instruction-emitter.sig}{ASSEMBLER} --
the assembler
\item \codehref{emit/instruction-emitter.sig}{MC} --
the machine code emitter
\item \codehref{../backpatch/sdi-jumps.sig}{ SDI_JUMPS } --
methods for resolving span-dependent instructions.
\item DELAY_SLOTS_PROPERTIES
-- machine properties for delay slot filling, if a machine
architecture contains branch delay slots or load delay slots.
\item \codehref{../SSA/ssaProps.sig}{ SSA_PROPERTIES } --
semantics properties for performing optimizations in Static Single
Assignment form.
\end{itemize}
In general, writing a backend is tedious even with
SML's abstraction capabilities.
Furthermore, the machine description is procedural in natural
and must be checked by hand.
\subsection{ What is in MDGen? }
The MDGen tool simplifies the process of developing a new MLRISC backend.
MDGen provides the following:
\begin{itemize}
\item A representation description language for specifying the
machine encoding of the instruction set,
using an extension of ML's algebraic datatype facility.
\item A semantics description language for specifying the abstract semantics
of the instructions.
\end{itemize}
Both sub-languages are based on ML's syntax and semantics, so
they should be readily familiar to all MLRISC users.
A backend developer can specify a new machine architecture using the MDGen
language, and in turn, the MDGen tool generates ML modules that are
required by the MLRISC system.
The basic concepts of MDGen are inspired largely from
Norman Ramsey's
New Jersey Machine Code Tool Kit and
Ramsey and Davidson's
Lambda RTL
\subsection{A Sample Description}
Here we present a sample MDGen description, using the Alpha as an example.
We highlight all keywords in the MDGen language
in. A typical machine description
is structured as follows:
\begin{SML}
architecture Alpha =
struct
name "Alpha"
superscalar
little endian
lowercase assembly
\href{#cells}{Storage cells and locations}
\href{#encoding}{Instruction encoding formats specification}
\href{#instruction}{Instruction definition}
end
\end{SML}
Here, we declare that the Alpha is a superscalar machine using
little endian encoding. Furthermore, assembly output should be displayed
in lowercase-- this is for personal esthetic reasons only; most assemblers
are case insensitive.
\subsubsection{ Specifying Storage Cells and Locations }
A cell is an abstract resource location
for holding data values. On typical machines, the types of
cells include general purpose registers, floating point registers,
and condition code registers.
The \sml{storage} declaration defines different
cellkinds. MLRISC requires the
cellkinds \sml{GP}, \sml{FP}, \sml{CC} to be defined.
These are the cellkinds for general purpose registers, floating point
registers and condition code registers.
In the following sequence of declarations, a few things are defined:
\begin{itemize}
\item The cellkinds \sml{GP, FP, CC} are defined.
Furthermore, the cellkinds \sml{MEM, CTRL}, which stand
for memory and control (dependence), are also implicitly defined.
\item The \sml{assembly as} clauses specify how a specific cell type is
to be displayed. Here, we specify that register 30, the
stack pointer, should be displayed specially as \sml{$sp}.
\item The \sml{in cellset} clause, when attached, tells MDGen that
the associated cellkind should be part of the
\href{cellset.html}{ cellset }. The clause \sml{in cellset GP}
tells MDGen that the a cell of type \sml{CC} should be treated
the same as a \sml{GP}
\item The \sml{locations} declarations define a few abbreviations:
\sml{stackptrR} is the stack pointer, \sml{asmTmpR} is
the assembly temporary, \sml{fasmTmp} is the floating point
assembly temporary etc.
\end{itemize}
\begin{SML}
storage
GP = 32 cells of 64 bits in cellset called "register"
assembly as (fn 30 => "$sp"
| r => "$"^Int.toString r)
| FP = 32 cells of 64 bits in cellset called "floating point register"
assembly as (fn f => "f"^Int.toString f)
| CC = cells of 64 bits in cellset GP called "condition code register"
assembly as "cc"
locations
stackptrR = $GP[30]
and asmTmpR = $GP[28]
and fasmTmp = $FP[30]
and GPReg r = $GP[r]
and FPReg f = $GP[f]
\end{SML}