The MLRISC IR

Introduction

In this section we will describe the MLRISC intermediate representation.

Control Flow Graph

The control flow graph is the main view of the IR. A control flow graph satisfies the following signature:
 signature CONTROL_FLOW_GRAPH = sig
   structure I : INSTRUCTIONS
   structure B : BLOCK_NAMES
   structure P : PSEUDO_OPS
   structure C : CELLS
   structure W : FIXED_POINT 
      sharing I.C = C

   {\it definition of type} weight
   {\it definition of type} block_kind
   {\it definition of type} data
   {\it definition of type} block
   {\it definition of type} edge_kind
   {\it definition of type} edge_info
   {\it definition of type} cfg
   {\it operations on a cfg} 
 end
The following structures nested within a CFG: The type weight below is used in execution frequency annotations:
   type weight = W.fixed_point
There are a few different kinds of basic blocks, described by the type block_kind below:
   datatype block_kind = 
       START          
     | STOP           
     | FUNCTION_ENTRY 
     | NORMAL         
     | HYPERBLOCK     
A basic block is defined as the datatype block, defined below:
   and data = LABEL  of Label.label
            | PSEUDO of P.pseudo_op

   and block = 
      BLOCK of
      {  id          : int,                       
         kind        : block_kind,                 {\it(* block kind *)}
         name        : B.name,                     {\it(* block name *)}
         freq        : weight ref,                 {\it(* execution frequency *) }
         data        : data list ref,              {\it(* data preceding block *) }
         labels      : Label.label list ref,       {\it(* labels on blocks *) }
         insns       : I.instruction list ref,     {\it(* in rev order *)}
         annotations : Annotations.annotations ref {\it(* annotations *)}
      }
Edges in a CFG are annotated with the type edge_info, defined below:
   and edge_kind = ENTRY           
                 | EXIT            
                 | JUMP            
                 | FALLSTHRU       
                 | BRANCH of bool  
                 | SWITCH of int   
                 | SIDEEXIT of int 

   and edge_info = 
       EDGE of { k : edge_kind,                  {\it(* edge kind *)}
                 w : weight ref,                 {\it(* edge freq *)}
                 a : Annotations.annotations ref {\it(* annotations *)}
               }
Type cfg below defines a control flow graph:
   type edge = edge_info edge
   type node = block node

   datatype info = 
       INFO of { regmap      : C.regmap,
                 annotations : Annotations.annotations ref,
                 firstBlock  : int ref,
                 reorder     : bool ref
               }
   type cfg = (block,edge_info,info) graph
\paragraph{Low-level Interface} The following subsection describes the low-level interface to a CFG. These functions should be used with care since they do not always maintain high-level structural invariants imposed on the representation. In general, higher level interfaces exist so knowledge of this interface is usually not necessary for customizing MLRISC. Various kinds of annotations on basic blocks are defined below:
   exception LIVEOUT of C.cellset    
   exception CHANGED of unit -> unit
   exception CHANGEDONCE of unit -> unit
The annotation LIVEOUT is used record live-out information on an escaping block. The annotations CHANGED and CHANGEDONCE are used internally for maintaining views on a CFG. These should not be used directly. The following are low-level functions for building new basic blocks. The functions newXXX build empty basic blocks of a specific type. The function defineLabel returns a label to a basic block; and if one does not exist then a new label will be generated automatically. The functions emit and show_block are low-level routines for displaying a basic block.
   val newBlock          : int * B.name -> block      
   val newStart          : int -> block               
   val newStop           : int -> block               
   val newFunctionEntry  : int -> block               
   val copyBlock         : int * block -> block       
   val defineLabel       : block -> Label.label       
   val emit              : C.regmap -> block -> unit  
   val show_block        : C.regmap -> block -> string 
Methods for building a CFG are listed as follows:
   val cfg      : info -> cfg    
   val new      : C.regmap -> cfg
   val subgraph : cfg -> cfg     
   val init     : cfg -> unit    
   val changed  : cfg -> unit   
   val removeEdge : cfg -> edge -> unit
Again, these methods should be used only with care. The following functions allow the user to extract low-level information from a flowgraph. Function regmap returns the current register map. Function regmap returns a function that lookups the current register map. Function liveOut returns liveOut information from a block; it returns the empty cellset if the block is not an escaping block. Function fallsThruFrom takes a node id v and locates the block u (if any) that flows into v without going through a branch instruction. Similarly, the function fallsThruTo takes a node id u and locates the block (if any) that u flows into with going through a branch instruction. If u falls through to v in any feasible code layout u must preceed v.
   val regmap    : cfg -> C.regmap
   val reglookup : cfg -> C.register -> C.register
   val liveOut   : block -> C.cellset
   val fallsThruFrom : cfg * node_id -> node_id option
   val fallsThruTo   : cfg * node_id -> node_id option
To support graph viewing of a CFG, the following low-level primitives are provided:
   val viewStyle      : cfg -> (block,edge_info,info) GraphLayout.style
   val viewLayout     : cfg -> GraphLayout.layout
   val headerText     : block -> string
   val footerText     : block -> string
   val subgraphLayout : { cfg : cfg, subgraph : cfg } -> GraphLayout.layout
Finally, a miscellany function for control dependence graph building.
 
   val cdgEdge : edge_info -> bool 

IR

The MLRISC intermediate representation is a composite view of various compiler data structures, including the control flow graph, (post-)dominator trees, control dependence graph, and loop nesting tree. Basic compiler optimizations in MLRISC operate on this data structure; advance optimizations operate on more complex representations which use this representation as the base layer. \begin{wrapfigure}{r}{4.5in} \begin{Boxit} \psfig{figure=Figures/mlrisc-ir.eps,width=4.5in} \end{Boxit} \caption{The MLRISC IR} \end{wrapfigure} This IR provides a few additional functionalities: The signature of the IR is listed in Figure~\ref{fig:mlrisc-ir}. \begin{Figure}
 signature MLRISC_IR = sig
   structure I    : INSTRUCTIONS
   structure CFG  : CONTROL_FLOW_GRAPH
   structure Dom  : DOMINATOR_TREE
   structure CDG  : CONTROL_DEPENDENCE_GRAPH
   structure Loop : LOOP_STRUCTURE
   structure Util : CFG_UTIL
      sharing Util.CFG = CFG
      sharing CFG.I = I 
      sharing Loop.Dom = CDG.Dom = Dom
  
   type cfg  = CFG.cfg  
   type IR   = CFG.cfg  
   type dom  = (CFG.block,CFG.edge_info,CFG.info) Dom.dominator_tree
   type pdom = (CFG.block,CFG.edge_info,CFG.info) Dom.postdominator_tree
   type cdg  = (CFG.block,CFG.edge_info,CFG.info) CDG.cdg
   type loop = (CFG.block,CFG.edge_info,CFG.info) Loop.loop_structure
 
   val dom   : IR -> dom
   val pdom  : IR -> pdom
   val cdg   : IR -> cdg
   val loop  : IR -> loop

   val changed : IR -> unit  
   val memo : (IR -> 'facet) -> IR -> 'facet
   val addLayout : string -> (IR -> GraphLayout.layout) -> unit
   val view : string -> IR -> unit      
   val views : string list -> IR -> unit      
   val viewSubgraph : IR -> cfg -> unit 
 end
\caption{\label{fig:mlrisc-ir}MLRISC IR} \end{Figure} The following facets are predefined: dominator, post-dominator tree, control dependence graph and loop nesting structure. The functions dom, pdom, cdg, loop are facet extraction methods that compute up-to-date views of these facets. The following protocol is used for facets: To view an IR, the functions view, views or viewSubgraph can be used. They have the following interpretation:

Building a CFG

There are two basic methods of building a CFG: The first method requires you to perform instruction selection from a compiler front-end, but allows you to bypass all other \MLRISC{} phases if desired. The second method allows you to take advantage of various \MLRISC's instruction selection modules currently available. We describe these methods in this section. \paragraph{Directly from Instructions} Signature CODE_EMITTER below describes an abstract emitter interface for accepting a linear stream of instructions from a source and perform a sequence of actions based on this stream\footnote{Unlike the signature {\tt EMITTER\_NEW} or {\tt FLOWGRAPH\_GEN}, it has the advantage that it is not tied into any form of specific flowgraph representation.}.
 signature CODE_EMITTER = sig 
   structure I : INSTRUCTIONS
   structure C : CELLS
   structure P : PSEUDO_OPS
   structure B : BLOCK_NAMES
      sharing I.C = C

   type emitter =
   {  defineLabel : Label.label -> unit,   
      entryLabel  : Label.label -> unit,   
      exitBlock   : C.cellset -> unit,    
      pseudoOp    : P.pseudo\_op -> unit,  
      emitInstr   : I.instruction -> unit, 
      blockName   : B.name -> unit,       
      comment     : string -> unit,        
      init        : int -> unit,           
      finish      : unit -> unit   
   } 
 end
The code emitter interface has the following informal protocol. \begin{methods} init(n) & Initializes the emitter and signals that the back-end should allocate space for n bytes of machine code. The number is ignored for non-machine code back-ends. \\ defineLabel(l) & Defines a new label l at the current position.\\ entryLabel(l) & Defines a new entry label l at the current position. An entry label defines an entry point into the current flow graph. Note that multiple entry points are allowed\\ exitBlock(c) & Defines an exit at the current position. The cellset c represents the live-out information \\ pseudOp(p) & Emits an pseudo op p at the current position \\ emitInstr(i) & Emits an instruction i at the current position \\ blockName(b) & Changes the block name to b \\ comment(msg) & Emits a comment msg at the current position \\ finish & Signals that the use of the emitter is finished. The emitter is free to perform its post-processing functions. When this is finished the CFG is built. \end{methods} The functor ControlFlowGraphGenFn below can be used to create a CFG builder that uses the CODE_EMITTER interface.
 signature CONTROL_FLOW_GRAPH_GEN = sig
   structure CFG     : CONTROL_FLOW_GRAPH
   structure Emitter : CODE_EMITTER
       sharing Emitter.I = CFG.I
       sharing Emitter.P = CFG.P
   val emitter : CFG.cfg -> Emitter.emitter
 end
 functor ControlFlowGraphGenFn
    (structure CFG     : CONTROL_FLOW_GRAPH
     structure Emitter : CODE_EMITTER
     structure P       : INSN_PROPERTIES
         sharing CFG.I = Emitter.I = P.I
         sharing CFG.P = Emitter.P
         sharing CFG.B = Emitter.B
    ) : CONTROL_FLOW_GRAPH_GEN
\paragraph{Cluster to CFG} The core \MLRISC{} system implements many instruction selection front-ends. The result of an instruction selection module is a linear code layout block called a cluster. The functor Cluster2CFGFn below generates a translator that translates a cluster into a CFG:
 signature CLUSTER2CFG = sig
   structure CFG : CONTROL_FLOW_GRAPH
   structure F   : FLOWGRAPH
      sharing CFG.I = F.I
      sharing CFG.P = F.P
      sharing CFG.B = F.B
   val cluster2cfg : F.cluster -> CFG.cfg
 end 
 functor Cluster2CFGFn
   (structure CFG : CONTROL_FLOW_GRAPH 
    structure F   : FLOWGRAPH
    structure P   : INSN_PROPERTIES
       sharing CFG.I = F.I = P.I 
       sharing CFG.P = F.P
       sharing CFG.B = F.B
   ) : CLUSTER2CFG 
\paragraph{CFG to Cluster} The basic \MLRISC{} system also implements many back-end functions such as register allocation, assembly output and machine code output. These modules all utilize the cluster representation. The functor CFG2ClusterFn below generates a translator that converts a CFG into a cluster. With the previous functor, the CFG and the cluster presentation can be freely inter-converted.
 signature CFG2CLUSTER = sig
   structure CFG : CONTROL_FLOW_GRAPH
   structure F   : FLOWGRAPH
      sharing CFG.I = F.I
      sharing CFG.P = F.P
      sharing CFG.B = F.B
   val cfg2cluster : { cfg : CFG.cfg, relayout : bool } -> F.cluster
 end 
 functor CFG2ClusterFn
   (structure CFG  : CONTROL_FLOW_GRAPH
    structure F    : FLOWGRAPH
       sharing CFG.I = F.I
       sharing CFG.P = F.P
       sharing CFG.B = F.B
    val patchBranch : {instr:CFG.I.instruction, backwards:bool} -> 
                         CFG.I.instruction list
   ) : CFG2CLUSTER
When a CFG originates from a cluster, we try to preserve the same code layout through out all optimizations when possible. The function cfg2cluster takes an optional flag that specifies we should force the recomputation of the code layout of a control flow graph when translating a CFG back into a cluster.

Basic CFG Transformations

Basic CFG transformations are implemented in the functor CFGUtilFn. These transformations include splitting edges, merging edges, removing unreachable code and tail duplication.
   functor CFGUtilFn
      (structure CFG : CONTROL_FLOW_GRAPH
       structure P   : INSN_PROPERTIES
          sharing P.I = CFG.I
      ) : CFG_UTIL
The signature of CFGUtilFn is defined below:
 signature CFG_UTIL = sig
    structure CFG : CONTROL_FLOW_GRAPH
    val updateJumpLabel : CFG.cfg -> node_id -> unit
    val mergeEdge       : CFG.cfg -> CFG.edge -> bool
    val eliminateJump   : CFG.cfg -> node_id -> bool
    val insertJump      : CFG.cfg -> node_id -> bool
    val splitEdge  : CFG.cfg -> { edge : CFG.edge, jump : bool }
                      -> { edge : CFG.edge, node : CFG.node }
    val isMerge        : CFG.cfg -> node_id -> bool
    val isSplit        : CFG.cfg -> node_id -> bool
    val hasSideExits   : CFG.cfg -> node_id -> bool
    val isCriticalEdge : CFG.cfg -> CFG.edge -> bool
    val splitAllCriticalEdges : CFG.cfg -> unit
    val ceed : CFG.cfg -> node_id * node_id -> bool
    val tailDuplicate : CFG.cfg -> { subgraph : CFG.cfg, root : node_id } 
                                -> { nodes : CFG.node list, 
                                     edges : CFG.edge list } 
    val removeUnreachableCode : CFG.cfg -> unit
    val mergeAllEdges : CFG.cfg -> unit
 end
These functions perform the following functions:

Dataflow Analysis

MLRISC provides a simple customizable module for performing iterative dataflow analysis. A dataflow analyzer has the following signature:
 signature DATAFLOW_ANALYZER = sig
   structure CFG : CONTROL_FLOW_GRAPH
   type dataflow_info
   val analyze : CFG.cfg * dataflow_info -> dataflow_info
 end
A dataflow problem is described by the signature DATAFLOW_PROBLEM, described below:
 signature DATAFLOW_PROBLEM = sig
   structure CFG : CONTROL_FLOW_GRAPH
   type domain
   type dataflow_info
   val forward   : bool
   val bot       : domain
   val ==        : domain * domain -> bool
   val join      : domain list -> domain
   val prologue  : CFG.cfg * dataflow_info ->
                       CFG.block node ->
                           { input    : domain,
                             output   : domain,
                             transfer : domain -> domain
                           }
   val epilogue  : CFG.cfg * dataflow_info ->
                       { node   : CFG.block node,
                         input  : domain,
                         output : domain
                       } -> unit
 end
This description contains the following items To generate a new dataflow analyzer from a dataflow problem, the functor DataflowFn can be used:
 functor DataflowFn(P : DATAFLOW_PROBLEM) : DATAFLOW_ANALYZER =

Static Branch Prediction

Branch Optimizations


Allen Leung

Click to toggle
does not end with </html> tag
does not end with </body> tag
The output has ended thus: E="-2"> <ADDRESS> <A HREF="mailto:leunga@cs.nyu.edu">Allen Leung</A></ADDRESS> <BR> </BODY> </HTML>