# SCM Repository

# View of /sml/trunk/src/MLRISC/Doc/latex/mltree.tex

Parent Directory | Revision Log

Revision

File size: 38077 byte(s)

**547**- (**download**) (**as text**) (**annotate**)*Fri Feb 25 23:54:34 2000 UTC*(19 years, 7 months ago) by*leunga*File size: 38077 byte(s)

New documentation files for MLRISC. This version includes MLTREE extensions.

\section{The MLTREE Language} \newdef{MLTree} is the register transfer language used in the MLRISC system. It serves two important purposes: \image{MLTree}{pictures/png/mlrisc-ir.png}{align=right} \begin{enumerate} \item As an intermediate representation for a compiler front-end to talk to the MLRISC system, \item As specifications for instruction semantics \end{enumerate} The latter is needed for optimizations which require precise knowledge of such; for example, algebraic simplification and constant folding. MLTree is a low-level \newdef{typed} language: all operations are typed by its width or precision. Operations on floating point, integer, and condition code are also segregated, to prevent accidental misuse. MLTree is also \emph{tree-oriented} so that it is possible to write efficient MLTree transformation routines that uses SML pattern matching. Here are a few examples of MLTree statements. \begin{SML} MV(32,t, ADDT(32, MULT(32,REG(32,b),REG(32,b)), MULT(32, MULT(32,LI(4),REG(32,a)),REG(32,c)))) \end{SML} computes \sml{t := b*b + 4*a*c}, all in 32-bit precision and overflow trap enabled; while \begin{SML} MV(32,t, ADD(32, CVTI2I(32,SIGN_EXTEND,8, LOAD(8, ADD(32,REG(32,a),REG(32,i)))))) \end{SML} loads the byte in address \sml{a+i} and sign extend it to a 32-bit value. The statement \begin{SML} IF([],CMP(64,GE,REG(64,a),LI 0), MV(64, t, REG(64, a)), MV(64, t, NEG(64, REG(64, a))) ) \end{SML} in more traditional form means: \begin{verbatim} if a >= 0 then t := a else t := -a \end{verbatim} This example can be also expressed in a few different ways: \begin{enumerate} \item With the conditional move construct described in Section~\ref{sec:cond-move}: \begin{SML} MV(64, t, COND(CMP(64, GE, REG(64, a)), REG(64, a), NEG(64, REG(64, a)))) \end{SML} \item With explicit branching using the conditional branch construct \verb|BCC|: \begin{SML} MV(64, t, REG(64, a)); BCC([], CMP(64, GE, REG(64, a)), L1); MV(64, t, NEG(64, REG(64, a))); DEFINE L1; \end{SML} \end{enumerate} \subsection{The Definitions} MLTree is defined in the signature \mlrischref{mltree/mltree.sig}{\sml{MLTREE}} and the functor \mlrischref{mltree/mltree.sml}{\sml{MLTreeF}} The functor \sml{MLTreeF} is parameterized in terms of the label expression type, the client supplied region datatype, the instruction stream type, and the client defined MLTree extensions. \begin{SML} functor MLTreeF (structure LabelExp : \href{labelexp.html}{LABELEXP} structure Region : \href{regions.html}{REGION} structure Stream : \href{streams.html}{INSTRUCTION_STREAM} structure Extension : \mlrischref{mltree/mltree-extension.sig}{MLTREE_EXTENSION} ) : MLTREE \end{SML} \subsubsection{Basic Types} The basic types in MLTree are statements (\newtype{stm}) integer expressions (\newtype{rexp}), floating point expression (\newtype{fexp}), and conditional expressions (\newtype{ccexp}). Statements are evaluated for their effects, while expressions are evaluated for their value. (Some expressions could also have trapping effects. The semantics of traps are unspecified.) These types are parameterized by an extension type, which we can use to extend the set of MLTree operators. How this is used is described in Section~\ref{sec:mltree-extension}. References to registers are represented internally as integers, and are denoted as the type \sml{reg}. In addition, we use the types \sml{src} and \sml{dst} as abbreviations for source and destination registers. \begin{SML} type reg = int type src = reg type dst = reg \end{SML} All operators on MLTree are \emph{typed} by the number of bits that they work on. For example, 32-bit addition between \sml{a} and \sml{b} is written as \sml{ADD(32,a,b)}, while 64-bit addition between the same is written as \sml{ADD(64,a,b)}. Floating point operations are denoted in the same manner. For example, IEEE single-precision floating point add is written as \sml{FADD(32,a,b)}, while the same in double-precision is written as \sml{FADD(64,a,b)} Note that these types are low level. Higher level distinctions such as signed and unsigned integer value, are not distinguished by the type. Instead, operators are usually partitioned into signed and unsigned versions, and it is legal (and often useful!) to mix signed and unsigned operators in an expression. Currently, we don't provide a direct way to specify non-IEEE floating point together with IEEE floating point arithmetic. If this distinction is needed then it can be encoded using the extension mechanism described in Section~\ref{sec:mltree-extension}. We use the types \sml{ty} and \sml{fty} to stand for the number of bits in integer and floating point operations. \begin{SML} type ty = int type fty = int \end{SML} \subsubsection{The Basis} The signature \mlrischref{mltree/mltree-basis.sig}{MLTREE\_BASIS} defines the basic helper types used in the MLTREE signature. \begin{SML} signature MLTREE_BASIS = sig datatype cond = LT | LTU | LE | LEU | EQ | NE | GE | GEU | GT | GTU datatype fcond = ? | !<=> | == | ?= | !<> | !?>= | < | ?< | !>= | !?> | <= | ?<= | !> | !?<= | > | ?> | !<= | !?< | >= | ?>= | !< | !?= | <> | != | !? | <=> | ?<> datatype ext = SIGN_EXTEND | ZERO_EXTEND datatype rounding_mode = TO_NEAREST | TO_NEGINF | TO_POSINF | TO_ZERO type ty = int type fty = int end \end{SML} The most important of these are the types \newtype{cond} and \newtype{fcond}, which represent the set of integer and floating point comparisions. These types can be combined with the comparison constructors \verb|CMP| and \verb|FCMP| to form integer and floating point comparisions. \begin{Table}{|c|c|}{align=left} \hline Operator & Comparison \\ \hline \sml{LT} & Signed less than \\ \sml{LTU} & Unsigned less than \\ \sml{LE} & Signed less than or equal \\ \sml{LEU} & Unsigned less than or equal \\ \sml{EQ} & Equal \\ \sml{NE} & Not equal \\ \sml{GE} & Signed greater than or equal \\ \sml{GEU} & Unsigned greater than or equal \\ \sml{GT} & Signed greater than \\ \sml{GTU} & Unsigned greater than \\ \hline \end{Table} Floating point comparisons can be ``decoded'' as follows. In IEEE floating point, there are four different basic comparisons tests that we can performed given two numbers $a$ and $y$: \begin{description} \item[$a < b$] Is $a$ less than $b$? \item[$a = b$] Is $a$ equal to $b$? \item[$a > b$] Is $a$ greater than to $b$? \item[$a ? b$] Are $a$ and $b$ unordered (incomparable)? \end{description} Comparisons can be joined together. For example, given two double-precision floating point expressions $a$ and $b$, the expression \verb|FCMP(64,<=>,a,b)| asks whether $a$ is less than, equal to or greater than $b$, i.e.~whether $a$ and $b$ are comparable. The special symbol \verb|!| negates the meaning the of comparison. For example, \verb|FCMP(64,!>=,a,b)| means testing whether $a$ is less than or incomparable with $b$. \subsection{Integer Expressions} A reference to the $i$th integer register with an $n$-bit value is written as \sml{REG(}$n$,$i$\sml{)}. The operators \sml{LI}, \sml{LI32}, and \sml{LABEL}, \sml{CONST} are used to represent constant expressions of various forms. The sizes of these constants are inferred from context. \begin{SML} REG : ty * reg -> rexp LI : int -> rexp LI32 : Word32.word -> rexp LABEL : LabelExp.labexp -> rexp CONST : Constant.const -> rexp \end{SML} The following figure lists all the basic integer operators and their intuitive meanings. All operators except \sml{NOTB, NEG, NEGT} are binary and have the type \begin{SML} ty * rexp * rexp -> rexp \end{SML} The operators \sml{NOTB, NEG, NEGT} have the type \begin{SML} ty * rexp -> rexp \end{SML} \begin{tabular}{|l|l|} \hline \sml{ADD} & Twos complement addition \\ \sml{NEG} & negation \\ \sml{SUB} & Twos complement subtraction \\ \sml{MULS} & Signed multiplication \\ \sml{DIVS} & Signed division, round to zero (nontrapping) \\ \sml{QUOTS} & Signed division, round to negative infinity (nontrapping) \\ \sml{REMS} & Signed remainder (???) \\ \sml{MULU} & Unsigned multiplication \\ \sml{DIVU} & Unsigned division \\ \sml{REMU} & Unsigned remainder \\ \sml{NEGT} & signed negation, trap on overflow \\ \sml{ADDT} & Signed addition, trap on overflow \\ \sml{SUBT} & Signed subtraction, trap on overflow \\ \sml{MULT} & Signed multiplication, trap on overflow \\ \sml{DIVT} & Signed division, round to zero, trap on overflow or division by zero \\ \sml{QUOTT} & Signed division, round to negative infinity, trap on overflow or division by zero \\ \sml{REMT} & Signed remainder, trap on division by zero \\ \sml{ANDB} & bitwise and \\ \sml{ORB} & bitwise or \\ \sml{XORB} & bitwise exclusive or \\ \sml{NOTB} & ones complement \\ \sml{SRA} & arithmetic right shift \\ \sml{SRL} & logical right shift \\ \sml{SLL} & logical left shift \\ \hline\end{tabular} \subsubsection{Sign and Zero Extension} Sign extension and zero extension are written using the operator \sml{CVTI2I}. \sml{CVTI2I(}$m$,\sml{SIGN_EXTEND},$n$,$e$\sml{)} sign extends the $n$-bit value $e$ to an $m$-bit value, i.e. the $n-1$th bit is of $e$ is treated as the sign bit. Similarly, \sml{CVTI2I(}$m$,\sml{ZERO_EXTEND},$n$,$e$\sml{)} zero extends an $n$-bit value to an $m$-bit value. If $m \le n$, then \sml{CVTI2I(}$m$,\sml{SIGN_EXTEND},$n$,$e$\sml{)} = \sml{CVTI2I}($m$,\sml{ZERO_EXTEND},$n$,$e$\sml{)}. \begin{SML} datatype ext = SIGN_EXTEND | ZERO_EXTEND CVTI2I : ty * ext * ty * rexp -> rexp \end{SML} \subsubsection{Conditional Move} \label{sec:cond-move} Most new superscalar architectures incorporate conditional move instructions in their ISAs. Modern VLIW architectures also directly support full predication. Since branching (especially with data dependent branches) can introduce extra latencies in highly pipelined architectures, condtional moves should be used in place of short branch sequences. MLTree provide a conditional move instruction \sml{COND}, to make it possible to directly express conditional moves without using branches. \begin{SML} COND : ty * ccexp * rexp * rexp -> rexp \end{SML} Semantically, \sml{COND(}\emph{ty},\emph{cc},$a$,$b$\sml{)} means to evaluate \emph{cc}, and if \emph{cc} evaluates to true then the value of the entire expression is $a$; otherwise the value is $b$. Note that $a$ and $b$ are allowed to be \emph{eagerly} evaluated. In fact, we are allowed to evaluate to \emph{both} branches, one branch, or neither~\footnote{When possible.}. Various idioms of the \sml{COND} form are useful for expressing common constructs in many programming languages. For example, MLTree does not provide a primitive construct for converting an integer value \sml{x} to a boolean value (0 or 1). But using \sml{COND}, this is expressible as \sml{COND(32,CMP(32,NE,x,LI 0),LI 1,LI 0)}. SML/NJ represents the boolean values true and false as machine integers 3 and 1 respectively. To convert a boolean condition $e$ into an ML boolean value, we can use \begin{SML} COND(32,e,LI 3,LI 1) \end{SML} Common C idioms can be easily mapped into the \sml{COND} form. For example, \begin{itemize} \item \verb|if (e1) x = y| translates into \sml{MV(32,x,COND(32,e1,REG(32,y),REG(32,x)))} \item \begin{verbatim} x = e1; if (e2) x = y \end{verbatim} translates into \sml{MV(32,x,COND(32,e2,REG(32,y),e1))} \item \verb|x = e1 == e2| translates into \sml{MV(32,x,COND(32,CMP(32,EQ,e1,e2),LI 1,LI 0)} \item \verb|x = ! e| translates into \sml{MV(32,x,COND(32,CMP(32,NE,e,LI 0),LI 1,LI 0)} \item \verb|x = e ? y : z| translates into \sml{MV(32,x,COND(32,e,REG(32,y),REG(32,z)))}, and \item \verb|x = y < z ? y : z| translates into \begin{alltt} MV(32,x, COND(32, CMP(32,LT,REG(32,y),REG(32,z)), REG(32,y),REG(32,z))) \end{alltt} \end{itemize} In general, the \sml{COND} form should be used in place of MLTree's branching constructs whenever possible, since the former is usually highly optimized in various MLRISC backends. \subsubsection{Integer Loads} Integer loads are written using the constructor \verb|LOAD|. \begin{SML} LOAD : ty * rexp * Region.region -> rexp \end{SML} The client is required to specify a \href{regions.html}{region} that serves as aliasing information for the load. \subsubsection{Miscellaneous Integer Operators} An expression of the \sml{LET}($s$,$e$) evaluates the statement $s$ for its effect, and then return the value of expression $e$. \begin{SML} LET : stm * rexp -> rexp \end{SML} Since the order of evaluation is MLTree operators are \emph{unspecified} the use of this operator should be severely restricted to only \emph{side-effect}-free forms. \subsection{Floating Point Expressions} Floating registers are referenced using the term \sml{FREG}. The $i$th floating point register with type $n$ is written as \sml{FREG(}$n$,$i$\sml{)}. \begin{SML} FREG : fty * src -> fexp \end{SML} Built-in floating point operations include addition (\sml{FADD}), subtraction (\sml{FSUB}), multiplication (\sml{FMUL}), division (\sml{FDIV}), absolute value (\sml{FABS}), negation (\sml{FNEG}) and square root (\sml{FSQRT}). \begin{SML} FADD : fty * fexp * fexp -> fexp FSUB : fty * fexp * fexp -> fexp FMUL : fty * fexp * fexp -> fexp FDIV : fty * fexp * fexp -> fexp FABS : fty * fexp -> fexp FNEG : fty * fexp -> fexp FSQRT : fty * fexp -> fexp \end{SML} A special operator is provided for manipulating signs. To combine the sign of $a$ with the magnitude of $b$, we can write \sml{FCOPYSIGN(}$a$,$b$\sml{)}\footnote{What should happen if $a$ or $b$ is nan?}. \begin{SML} FCOPYSIGN : fty * fexp * fexp -> fexp \end{SML} To convert an $n$-bit signed integer $e$ into an $m$-bit floating point value, we can write \sml{CVTI2F(}$m$,$n$,$e$\sml{)}\footnote{What happen to unsigned integers?}. \begin{SML} CVTI2F : fty * ty * rexp -> fexp \end{SML} Similarly, to convert an $n$-bit floating point value $e$ to an $m$-bit floating point value, we can write \sml{CVTF2F(}$m$,$n$,$e$\sml{)}\footnote{ What is the rounding semantics?}. \begin{SML} CVTF2F : fty * fty * -> fexp \end{SML} \begin{SML} datatype rounding_mode = TO_NEAREST | TO_NEGINF | TO_POSINF | TO_ZERO CVTF2I : ty * rounding_mode * fty * fexp -> rexp \end{SML} \begin{SML} FLOAD : fty * rexp * Region.region -> fexp \end{SML} \subsection{Condition Expressions} Unlike languages like C, MLTree makes the distinction between condition expressions and integer expressions. This distinction is necessary for two purposes: \begin{itemize} \item It clarifies the proper meaning intended in a program, and \item It makes to possible for a MLRISC backend to map condition expressions efficiently onto various machine architectures with different condition code models. For example, architectures like the Intel x86, Sparc V8, and PowerPC contains dedicated condition code registers, which are read from and written to by branching and comparison instructions. On the other hand, architectures such as the Texas Instrument C6, PA RISC, Sparc V9, and Alpha does not include dedicated condition code registers. Conditional code registers in these architectures can be simulated by integer registers. \end{itemize} A conditional code register bit can be referenced using the constructors \sml{CC} and \sml{FCC}. Note that the \emph{condition} must be specified together with the condition code register. \begin{SML} CC : Basis.cond * src -> ccexp FCC : Basis.fcond * src -> ccexp \end{SML} For example, to test the \verb|Z| bit of the \verb|%psr| register on the Sparc architecture, we can used \sml{CC(EQ,SparcCells.psr)}. The comparison operators \sml{CMP} and \sml{FCMP} performs integer and floating point tests. Both of these are \emph{typed} by the precision in which the test must be performed under. \begin{SML} CMP : ty * Basis.cond * rexp * rexp -> ccexp FCMP : fty * Basis.fcond * fexp * fexp -> ccexp \end{SML} Condition code expressions may be combined with the following logical connectives, which have the obvious meanings. \begin{SML} TRUE : ccexp FALSE : ccexp NOT : ccexp -> ccexp AND : ccexp * ccexp -> ccexp OR : ccexp * ccexp -> ccexp XOR : ccexp * ccexp -> ccexp \end{SML} \subsection{Statements} Statement forms in MLTree includes assignments, parallel copies, jumps and condition branches, calls and returns, stores, sequencing, and annotation. \subsubsection{Assignments} Assignments are segregated among the integer, floating point and conditional code types. In addition, all assignments are \emph{typed} by the precision of destination register. \begin{SML} MV : ty * dst * rexp -> stm FMV : fty * dst * fexp -> stm CCMV : dst * ccexp -> stm \end{SML} \subsubsection{Parallel Copies} Special forms are provided for parallel copies for integer and floating point registers. It is important to emphasize that the semantics is that all assignments are performed in parallel. \begin{SML} COPY : ty * dst list * src list -> stm FCOPY : fty * dst list * src list -> stm \end{SML} \subsubsection{Jumps and Conditional Branches} Jumps and conditional branches in MLTree take two additional set of annotations. The first represents the \newdef{control flow} and is denoted by the type \sml{controlflow}. The second represent \newdef{control-dependence} and \newdef{anti-control-dependence} and is denoted by the type \sml{ctrl}. \begin{SML} type controlflow = Label.label list type ctrl = reg list \end{SML} Control flow annotation is simply a list of labels, which represents the set of possible targets of the associated jump. Control dependence annotations attached to a branch or jump instruction represents the new definition of \newdef{pseudo control dependence predicates}. These predicates have no associated dynamic semantics; rather they are used to constraint the set of potential code motion in an optimizer (more on this later). The primitive jumps and conditional branch forms are represented by the constructors \sml{JMP}, \sml{BCC}. \begin{SML} JMP : ctrl * rexp * controlflow -> stm BCC : ctrl * ccexp * Label.label -> stm \end{SML} In addition to \sml{JMP} and \sml{BCC}, there is a \emph{structured} if/then/else statement. \begin{SML} IF : ctrl * ccexp * stm * stm -> stm \end{SML} Semantically, \sml{IF}($c,x,y,z$) is identical to \begin{SML} BCC(\(c\), \(x\), L1) \(z\) JMP([], L2) DEFINE L1 \(y\) DEFINE L2 \end{SML} where \verb|L1| and \verb|L2| are new labels, as expected. Here's an example of how control dependence predicates are used. Consider the following MLTree statement: \begin{SML} IF([p], CMP(32, NE, REG(32, a), LI 0), MV(32, b, PRED(LOAD(32, m, ...)), p), MV(32, b, LOAD(32, n, ...))) \end{SML} In the first alternative of the \verb|IF|, the \verb|LOAD| expression is constrainted by the control dependence predicate \verb|p| defined in the \verb|IF|, using the predicate constructor \verb|PRED|. These states that the load is \emph{control dependent} on the test of the branch, and thus it may not be legally hoisted above the branch without potentially violating the semantics of the program. For example, semantics violation may happen if the value of \verb|m| and \verb|a| is corrolated, and whenever \verb|a| = 0, the address in \verb|m| is not a legal address. Note that on architectures with speculative loads, the control dependence information can be used to guide the transformation of control dependent loads into speculative loads. Now in constrast, the \verb|LOAD| in the second alternative is not control dependent on the control dependent predicate \verb|p|, and thus it is safe and legal to hoist the load above the test, as in \begin{SML} MV(32, b, LOAD(32, n, ...)); IF([p], CMP(32, NE, REG(32, a), LI 0), MV(32, b, PRED(LOAD(32, m, ...)), p), SEQ [] ) \end{SML} Of course, such transformation is only performed if the optimizer phases think that it can benefit performance. Thus the control dependence information does \emph{not} directly specify any transformations, but it is rather used to indicate when aggressive code motions are legal and safe. \subsubsection{Calls and Returns} Calls and returns in MLTree are specified using the constructors \verb|CALL| and \verb|RET|, which have the following types. \begin{SML} CALL : rexp * controlflow * mlrisc * mlrisc * ctrl * ctrl * Region.region -> stm RET : ctrl * controlflow -> stm \end{SML} The \verb|CALL| form is particularly complex, and require some explanation. Basically the seven parameters are, in order: \begin{description} \item[address] of the called routine. \item[control flow] annotation for this call. This information specifies the potential targets of this call instruction. Currently this information is ignored but will be useful for interprocedural optimizations in the future. \item[definition and use] These lists specify the list of potential definition and uses during the execution of the call. Definitions and uses are represented as the type \newtype{mlrisc} list. The contructors for this type is: \begin{SML} CCR : ccexp -> mlrisc GPR : rexp -> mlrisc FPR : fexp -> mlrisc \end{SML} \item[definition of control and anti-control dependence] These two lists specifies definitions of control and anti-control dependence. \item[region] annotation for the call, which summarizes the set of potential memory references during execution of the call. \end{description} The matching return statement constructor \verb|RET| has two arguments. These are: \begin{description} \item[anti-control dependence] This parameter represents the set of anti-control dependence predicates defined by the return statement. \item[control flow] This parameter specifies the set of matching procedure entry points of this return. For example, suppose we have a procedure with entry points \verb|f| and \verb|f'|. Then the MLTree statements \begin{verbatim} f: ... JMP L1 f': ... L1: ... RET ([], [f, f']) \end{verbatim} \noindent can be used to specify that the return is either from the entries \verb|f| or \verb|f'|. \end{description} \subsubsection{Stores} Stores to integer and floating points are specified using the constructors \verb|STORE| and \verb|FSTORE|. \begin{SML} STORE : ty * rexp * rexp * Region.region -> stm FSTORE : fty * rexp * fexp * Region.region -> stm \end{SML} The general form is \begin{SML} STORE(\(width\), \(address\), \(data\), \(region\)) \end{SML} Stores for condition codes are not provided. \subsubsection{Miscelleneous Statements} Other useful statement forms of MLTree are for sequencing (\verb|SEQ|), defining a local label (\verb|DEFINE|). \begin{SML} SEQ : stm list -> stm DEFINE : Label.label -> stm \end{SML} The constructor \sml{DEFINE L} has the same meaning as executing the method \sml{defineLabel L} in the \href{stream.html}{stream interface}. \subsection{Annotations} \href{annotations.html}{Annotations} are used as the generic mechanism for exchanging information between different phases of the MLRISC system, and between a compiler front end and the MLRISC back end. The following constructors can be used to annotate a MLTree term with an annotation: \begin{SML} MARK : rexp * Annotations.annotation -> rexp FMARK : fexp * Annotations.annotation -> fexp CCMARK : ccexp * Annotations.annotation -> ccexp ANNOTATION : stm * Annotations.annotation -> stm \end{SML} \subsection{Extending the MLTree Language} \label{sec:mltree-extension} The following constructors are available for extending the MLTree language with client defined types: \begin{SML} EXT : sext -> stm REXT : ty * rext -> rexp FEXT : fty * fext -> fexp CCEXT : ty * ccext -> ccexp \end{SML} The extension types \sml{sext, rext, fext, ccext} are defined in terms of the client supplied extension types: \begin{SML} type sext = (stm, rexp, fexp, ccexp) Extension.sx type rext = (stm, rexp, fexp, ccexp) Extension.rx type fext = (stm, rexp, fexp, ccexp) Extension.fx type ccext = (stm, rexp, fexp, ccexp) Extension.ccx \end{SML} Here is an example of how these extension constructors may be used. Suppose you are in the process of writing a compiler for a digital signal processing(\newdef{DSP}) programming language using the MLRISC framework. This wonderful language that you are developing allows the programmer to specify high level looping and iteration, and aggregation constructs that are common in DSP applications. Furthermore, since saturated and fixed point arithmetic are common constructs in DSP applications, the language and consequently the compiler should directly support these operators. For simplicity, we would like to have a unified intermediate representation that can be used to directly represent high level constructs in our language, and low level constructs that are already present in MLTree. Since, MLTree does not directly support these constructs, it seems that it is not possible to use MLRISC for such a compiler infrastructure without substantial rewrite of the core components. This is where the extension mechanism comes in. Let us suppose that for illustration that we would like to implement high level looping and aggregation constructs such as \begin{verbatim} for i := lower bound ... upper bound body x := sum{i := lower bound ... upper bound} expression \end{verbatim} together with saturated arithmetic mentioned above. Here is a first attempt: \begin{SML} structure DSPMLTreeExtension struct structure Basis = MLTreeBasis datatype ('s,'r,'f,'c) sx = FOR of Basis.var * 'r * 'r * 's and ('s,'r,'f,'c) rx = SUM of Basis.var * 'r * 'r * 'r | SADD of 'r * 'r | SSUB of 'r * 'r | SMUL of 'r * 'r | SDIV of 'r * 'r type ('s,'r,'f,'c) fx = unit type ('s,'r,'f,'c) ccx = unit end structure DSPMLTree : MLTreeF (structure Extension = DSPMLTreeExtension ... ) \end{SML} In the above signature, we have defined two new datatypes \newtype{sx} and \newtype{rx} that are used for representing the DSP statement and integer expression extensions. Integer expression extensions include the high level sum construct, and the low levels saturated arithmetic operators. The recursive type definition is necessary to ``inject'' these new constructors into the basic MLTree definition. The following is an example of how these new constructors that we have defined can be used. Suppose the source program in our DSP language is: \begin{verbatim} for i := a ... b { s := sadd(s, table[i]); } \end{verbatim} \noindent where \verb|sadd| is the saturated add operator. For simplication, let us also assume that all operations and addresses are in 32-bits. Then the translation of the above into our extended DSP-MLTree could be: \begin{SML} EXT(FOR(\(i\), REG(32, \(a\)), REG(32, \(b\)), MV(32, \(s\), REXT(32, SADD(REG(32, \(s\)), LOAD(32, ADD(32, REG(32, \(table\)), SLL(32, REG(32, \(i\)), LI 2)), \(region\))))) )) \end{SML} One potential short coming of our DSP extension to MLTree is that the extension does not allow any further extensions. This restriction may be entirely satisfactory if DSP-MLTree is only used in your compiler applications and no where else. However, if DSP-MLTree is intended to be an extension library for MLRISC, then we must build in the flexibility for extension. This can be done in the same way as in the base MLTree definition, like this: \begin{SML} functor ExtensibleDSPMLTreeExtension (Extension : \mlrischref{mltree/mltree-extension.sig}{MLTREE_EXTENSION}) = struct structure Basis = MLTreeBasis structure Extension = Extension datatype ('s,'r,'f,'c) sx = FOR of Basis.var * 'r * 'r * 's | EXT of ('s,'r,'f,'c) Extension.sx and ('s,'r,'f,'c) rx = SUM of Basis.var * 'r * 'r * 'r | SADD of 'r * 'r | SSUB of 'r * 'r | SMUL of 'r * 'r | SDIV of 'r * 'r | REXT of ('s,'r,'f,'c) Extension.rx withtype ('s,'r,'f,'c) fx = ('s,'r,'f,'c) Extension.fx and ('s,'r,'f,'c) ccx = ('s,'r,'f,'c) Extension.ccx end \end{SML} As in MLTREE, we provide two new extension constructors \verb|EXT| and \verb|REXT| in the definition of \sml{DSP_MLTREE}, which can be used to further enhance the extended MLTREE language. \subsubsection{Compilation for MLTree Extensions} Of course, after extending the MLTree language, the client must provide an \emph{extension} module describing how to compile the extensions. How this is done is described in Section~\ref{sec:instrsel}. \subsection{MLTree Utilities} The \MLRISC{} system contains numerous utilities for working with MLTree datatypes. Some of the following utilizes are also useful for clients use: \begin{description} \item[MLTreeUtils] implements basic hashing, equality and pretty printing functions, \item[MLTreeFold] implements a fold function over the MLTree datatypes, \item[MLTreeRewrite] implements a generic rewriting engine, \item[MLTreeSimplify] implements a simplifier that performs algebraic simplification and constant folding. \end{description} \subsubsection{Hashing, Equality, Pretty Printing} The functor \mlrischref{mltree/mltree-utils.sml}{MLTreeUtils} provides the basic utilities for hashing an MLTree term, comparing two MLTree terms for equality and pretty printing. The hashing and comparision functions are useful for building hash tables using MLTree datatype as keys. The signature of the functor is: \begin{SML} signature \mlrischref{mltree/mltree-utils.sig}{MLTREE_UTILS} = sig structure T : MLTREE (* * Hashing *) val hashStm : T.stm -> word val hashRexp : T.rexp -> word val hashFexp : T.fexp -> word val hashCCexp : T.ccexp -> word (* * Equality *) val eqStm : T.stm * T.stm -> bool val eqRexp : T.rexp * T.rexp -> bool val eqFexp : T.fexp * T.fexp -> bool val eqCCexp : T.ccexp * T.ccexp -> bool val eqMlriscs : T.mlrisc list * T.mlrisc list -> bool (* * Pretty printing *) val show : (string list * string list) -> T.printer val stmToString : T.stm -> string val rexpToString : T.rexp -> string val fexpToString : T.fexp -> string val ccexpToString : T.ccexp -> string end functor \mlrischref{mltree/mltree-utils.sml}{MLTreeUtils} (structure T : MLTREE (* Hashing extensions *) val hashSext : T.hasher -> T.sext -> word val hashRext : T.hasher -> T.rext -> word val hashFext : T.hasher -> T.fext -> word val hashCCext : T.hasher -> T.ccext -> word (* Equality extensions *) val eqSext : T.equality -> T.sext * T.sext -> bool val eqRext : T.equality -> T.rext * T.rext -> bool val eqFext : T.equality -> T.fext * T.fext -> bool val eqCCext : T.equality -> T.ccext * T.ccext -> bool (* Pretty printing extensions *) val showSext : T.printer -> T.sext -> string val showRext : T.printer -> T.ty * T.rext -> string val showFext : T.printer -> T.fty * T.fext -> string val showCCext : T.printer -> T.ty * T.ccext -> string ) : MLTREE_UTILS = \end{SML} The types \sml{hasher}, \sml{equality}, and \sml{printer} represent functions for hashing, equality and pretty printing. These are defined as: \begin{SML} type hasher = \{stm : T.stm -> word, rexp : T.rexp -> word, fexp : T.fexp -> word, ccexp : T.ccexp -> word \} type equality = \{ stm : T.stm * T.stm -> bool, rexp : T.rexp * T.rexp -> bool, fexp : T.fexp * T.fexp -> bool, ccexp : T.ccexp * T.ccexp -> bool \} type printer = \{ stm : T.stm -> string, rexp : T.rexp -> string, fexp : T.fexp -> string, ccexp : T.ccexp -> string, dstReg : T.ty * T.var -> string, srcReg : T.ty * T.var -> string \} \end{SML} For example, to instantiate a \sml{Utils} module for our \sml{DSPMLTree}, we can write: \begin{SML} structure U = MLTreeUtils (structure T = DSPMLTree fun hashSext \{stm, rexp, fexp, ccexp\} (FOR(i, a, b, s)) = Word.fromIntX i + rexp a + rexp b + stm s and hashRext \{stm, rexp, fexp, ccexp\} e = (case e of SUM(i,a,b,c) => Word.fromIntX i + rexp a + rexp b + rexp c | SADD(a,b) => rexp a + rexp b | SSUB(a,b) => 0w12 + rexp a + rexp b | SMUL(a,b) => 0w123 + rexp a + rexp b | SDIV(a,b) => 0w1245 + rexp a + rexp b ) fun hashFext _ _ = 0w0 fun hashCCext _ _ = 0w0 fun eqSext \{stm, rexp, fexp, ccexp\} (FOR(i, a, b, s), FOR(i', a', b', s')) = i=i' andalso rexp(a,a') andalso rexp(b,b') andalso stm(s,s') fun eqRext \{stm, rexp, fexp, ccexp\} (e,e') = (case (e,e') of (SUM(i,a,b,c),SUM(i',a',b',c')) => i=i' andalso rexp(a,a') andalso rexp(b,b') andalso stm(c,c') | (SADD(a,b),SADD(a',b')) => rexp(a,a') andalso rexp(b,b') | (SSUB(a,b),SSUB(a',b')) => rexp(a,a') andalso rexp(b,b') | (SMUL(a,b),SMUL(a',b')) => rexp(a,a') andalso rexp(b,b') | (SDIV(a,b),SDIV(a',b')) => rexp(a,a') andalso rexp(b,b') | _ => false ) fun eqFext _ _ = true fun eqCCext _ _ = true fun showSext \{stm, rexp, fexp, ccexp, dstReg, srcReg\} (FOR(i, a, b, s)) = "for("^dstReg i^":="^rexp a^".."^rexp b^")"^stm s fun ty t = "."^Int.toString t fun showRext \{stm, rexp, fexp, ccexp, dstReg, srcReg\} e = (case (t,e) of SUM(i,a,b,c) => "sum"^ty t^"("^dstReg i^":="^rexp a^".."^rexp b^")"^rexp c | SADD(a,b) => "sadd"^ty t^"("rexp a^","^rexp b^")" | SSUB(a,b) => "ssub"^ty t^"("rexp a^","^rexp b^")" | SMUL(a,b) => "smul"^ty t^"("rexp a^","^rexp b^")" | SDIV(a,b) => "sdiv"^ty t^"("rexp a^","^rexp b^")" ) fun showFext _ _ = "" fun showCCext _ _ = "" ) \end{SML} \subsubsection{MLTree Fold} The functor \mlrischref{mltree/mltree-fold.sml}{MLTreeFold} provides the basic functionality for implementing various forms of aggregation function over the MLTree datatypes. Its signature is \begin{SML} signature \mlrischref{mltree/mltree-fold.sig}{MLTREE_FOLD} = sig structure T : MLTREE val fold : 'b folder -> 'b folder end functor \mlrischref{mltree/mltree-fold.sml}{MLTreeFold} (structure T : MLTREE (* Extension mechnism *) val sext : 'b T.folder -> T.sext * 'b -> 'b val rext : 'b T.folder -> T.ty * T.rext * 'b -> 'b val fext : 'b T.folder -> T.fty * T.fext * 'b -> 'b val ccext : 'b T.folder -> T.ty * T.ccext * 'b -> 'b ) : MLTREE_FOLD = \end{SML} The type \newtype{folder} is defined as: \begin{SML} type 'b folder = \{ stm : T.stm * 'b -> 'b, rexp : T.rexp * 'b -> 'b, fexp : T.fexp * 'b -> 'b, ccexp : T.ccexp * 'b -> 'b \} \end{SML} \subsubsection{MLTree Rewriting} The functor \mlrischref{mltree/mltree-rewrite.sml}{MLTreeRewrite} implements a generic term rewriting engine which is useful for performing various transformations on MLTree terms. Its signature is \begin{SML} signature \mlrischref{mltree/mltree-rewrite.sig}{MLTREE_REWRITE} = sig structure T : MLTREE val rewrite : (* User supplied transformations *) \{ rexp : (T.rexp -> T.rexp) -> (T.rexp -> T.rexp), fexp : (T.fexp -> T.fexp) -> (T.fexp -> T.fexp), ccexp : (T.ccexp -> T.ccexp) -> (T.ccexp -> T.ccexp), stm : (T.stm -> T.stm) -> (T.stm -> T.stm) \} -> T.rewriters end functor \mlrischref{mltre/mltree-rewrite.sml}{MLTreeRewrite} (structure T : MLTREE (* Extension *) val sext : T.rewriter -> T.sext -> T.sext val rext : T.rewriter -> T.rext -> T.rext val fext : T.rewriter -> T.fext -> T.fext val ccext : T.rewriter -> T.ccext -> T.ccext ) : MLTREE_REWRITE = \end{SML} The type \newtype{rewriter} is defined in signature \mlrischref{mltree/mltree.sig}{MLTREE} as: \begin{SML} type rewriter = \{ stm : T.stm -> T.stm, rexp : T.rexp -> T.rexp, fexp : T.fexp -> T.fexp, ccexp : T.ccexp -> T.ccexp \} \end{SML} \subsubsection{MLTree Simplifier} The functor \mlrischref{mltree/mltree-simplify.sml}{MLTreeSimplify} implements algebraic simplification and constant folding for MLTree. Its signature is: \begin{SML} signature \mlrischref{mltree/mltree-simplify.sig}{MLTREE_SIMPLIFIER} = sig structure T : MLTREE val simplify : { addressWidth : int } -> T.simplifier end functor \mlrischref{mltree/mltree-simplify.sml}{MLTreeSimplifier} (structure T : MLTREE (* Extension *) val sext : T.rewriter -> T.sext -> T.sext val rext : T.rewriter -> T.rext -> T.rext val fext : T.rewriter -> T.fext -> T.fext val ccext : T.rewriter -> T.ccext -> T.ccext ) : MLTREE_SIMPLIFIER = \end{SML} Where type \newdef{simplifier} is defined in signature \mlrischref{mltree/mltree.sig}{MLTREE} as: \begin{SML} type simplifier = \{ stm : T.stm -> T.stm, rexp : T.rexp -> T.rexp, fexp : T.fexp -> T.fexp, ccexp : T.ccexp -> T.ccexp \} \end{SML}

root@smlnj-gforge.cs.uchicago.edu | ViewVC Help |

Powered by ViewVC 1.0.0 |