Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Tracker SCM

SCM Repository

[smlnj] View of /sml/branches/SMLNJ/src/MLRISC/alpha32/alpha32.sml
ViewVC logotype

View of /sml/branches/SMLNJ/src/MLRISC/alpha32/alpha32.sml

Parent Directory Parent Directory | Revision Log Revision Log

Revision 17 - (download) (annotate)
Wed Mar 11 21:00:18 1998 UTC (24 years, 6 months ago)
File size: 27621 byte(s)
This commit was manufactured by cvs2svn to create branch 'SMLNJ'.
(* alpha32.sml 
 * COPYRIGHT (c) 1995 AT&T Bell Laboratories.
 * generates machine code from the mltree.

(** NOTE: f29 and f30 are used as temporaries.
 **       r31 is always zero.
functor Alpha32
  (structure Alpha32Instr : ALPHA32INSTR
   structure Alpha32MLTree : MLTREE
   structure Flowgen : FLOWGRAPH_GEN
   structure PseudoInstrs : ALPHA32_PSEUDO_INSTR 
     sharing Alpha32Instr.Region = Alpha32MLTree.Region
     sharing Flowgen.I = PseudoInstrs.I =  Alpha32Instr
     sharing Flowgen.T=Alpha32MLTree 
     sharing Alpha32MLTree.Constant = Alpha32Instr.Constant) : MLTREECOMP = 
  structure F = Flowgen
  structure T = Alpha32MLTree
  structure R = Alpha32MLTree.Region
  structure I = Alpha32Instr
  structure C = Alpha32Instr.C
  structure LE = LabelExp
  structure W32 = Word32

       Trap Shadows, Floating Exceptions, and Denormalized
	Numbers on the DEC Alpha

		Andrew W. Appel and Lal George
		  Nov 28, 1995

  See section of the Alpha Architecture Reference Manual.

  The Alpha has imprecise exceptions, meaning that if a floating
  point instruction raises an IEEE exception, the exception may
  not interrupt the processor until several successive instructions have
  completed.  ML, on the other hand, may want a "precise" model
  of floating point exceptions.

  Furthermore, the Alpha hardware does not support denormalized numbers
  (for "gradual underflow").  Instead, underflow always rounds to zero.
  However, each floating operation (add, mult, etc.) has a trapping
  variant that will raise an exception (imprecisely, of course) on
  underflow; in that case, the instruction will produce a zero result
  AND an exception will occur.  In fact, there are several variants
  of each instruction; three variants of MULT are:

  MULT  s1,s2,d       truncate denormalized result to zero; no exception
  MULT/U  s1,s2,d     truncate denormalized result to zero; raise UNDERFLOW
  MULT/SU  s1,s2,d    software completion, producing denormalized result

  The hardware treats the MULT/U and MULT/SU instructions identically,
  truncating a denormalized result to zero and raising the UNDERFLOW
  exception.  But the operating system, on an UNDERFLOW exception,
  examines the faulting instruction to see if it's an /SU form, and if so,
  recalculates s1*s2, puts the right answer in d, and continues,
  all without invoking the user's signal handler.

  Because most machines compute with denormalized numbers in hardware,
  to maximize portability of SML programs, we use the MULT/SU form.
  (and ADD/SU, SUB/SU, etc.)  But to use this form successfully,
  certain rules have to be followed.  Basically, d cannot be the same
  register as s1 or s2, because the opsys needs to be able to 
  recalculate the operation using the original contents of s1 and s2,
  and the MULT/SU instruction will overwrite d even if it traps.

  More generally, we may want to have a sequence of floating-point
  instructions.  The rules for such a sequence are:

  1. The sequence should end with a TRAPB (trap barrier) instruction.
     (This could be relaxed somewhat, but certainly a TRAPB would
      be a good idea sometime before the next branch instruction or
      update of an ML reference variable, or any other ML side effect.)
  2. No instruction in the sequence should destroy any operand of itself
     or of any previous instruction in the sequence.
  3. No two instructions in the sequence should write the same destination

  We can achieve these conditions by the following trick in the
  Alpha code generator.  Each instruction in the sequence will write
  to a different temporary; this is guaranteed by the translation from
  ML-RISC.  At the beginning of the sequence, we will put a special
  pseudo-instruction (we call it DEFFREG) that "defines" the destination
  register of the arithmetic instruction.  If there are K arithmetic
  instructions in the sequence, then we'll insert K DEFFREG instructions
  all at the beginning of the sequence.
  Then, each arithop will not only "define" its destination temporary
  but will "use" it as well.  When all these instructions are fed to
  the liveness analyzer, the resulting interference graph will then
  have inteference edges satisfying conditions 2 and 3 above.

  Of course, DEFFREG doesn't actually generate any code.  In our model
  of the Alpha, every instruction generates exactly 4 bytes of code
  except the "span-dependent" ones.  Therefore, we'll specify DEFFREG
  as a span-dependent instruction whose minimum and maximum sizes are zero.

  At the moment, we do not group arithmetic operations into sequences;
  that is, each arithop will be preceded by a single DEFFREG and
  followed by a TRAPB.  To avoid the cost of all those TRAPB's, we
  should improve this when we have time.  Warning:  Don't put more 
  than 31 instructions in the sequence, because they're all required
  to write to different destination registers!  

  What about multiple traps?  For example, suppose a sequence of
  instructions produces an Overflow and  a Divide-by-Zero exception?
  ML would like to know only about the earliest trap, but the hardware
  will report BOTH traps to the operating system.  However, as long
  as the rules above are followed (and the software-completion versions
  of the arithmetic instructions are used), the operating system will
  have enough information to know which instruction produced the
  trap.  It is very probable that the operating system will report ONLY
  the earlier trap to the user process, but I'm not sure.

  For a hint about what the operating system is doing in its own
  trap-handler (with software completion), see section 6.3.2 of
  "OpenVMS Alpha Software" (Part II of the Alpha Architecture
  Manual).  This stuff should apply to Unix (OSF1) as well as VMS.


  fun error msg = MLRiscErrorMsg.impossible ("Alpha32." ^ msg)

  val itow = Word.fromInt
  val wtoi = Word.toIntX

  val emit = F.emitInstr

  fun newReg () = C.newReg()
  fun newFreg() = C.newFreg()

  val zeroR = 31
  val zeroOp = I.REGop zeroR
  val zeroEA = I.Direct zeroR
  val argReg0 = 16

  val mem = R.memory

  fun cond T.LT  = I.CC_LT    | cond T.LTU = I.CC_LTU
    | cond T.LE  = I.CC_LE    | cond T.LEU = I.CC_LEU
    | cond T.EQ  = I.CC_EQ    | cond T.NEQ = I.CC_NEQ
    | cond T.GE  = I.CC_GE    | cond T.GEU = I.CC_GEU
    | cond T.GT  = I.CC_GT    | cond T.GTU = I.CC_GTU

  fun swapcc I.CC_LT  = I.CC_GT    | swapcc I.CC_LTU = I.CC_GTU
    | swapcc I.CC_LE  = I.CC_GE    | swapcc I.CC_LEU = I.CC_GEU
    | swapcc I.CC_EQ  = I.CC_EQ    | swapcc I.CC_NEQ = I.CC_NEQ   
    | swapcc I.CC_GT  = I.CC_LT    | swapcc I.CC_GEU = I.CC_LEU
    | swapcc I.CC_GE  = I.CC_LE    | swapcc _ = error "swapcc"	

  (* NOTE: stack allocation must be multiples of 16 *)
  fun stackAllocate(n) = 
    I.OPERATE{oper=I.SUBL, ra=C.stackptrR, rb=I.IMMop n, rc=C.stackptrR}

  fun stackDeallocate(n) =
    I.OPERATE{oper=I.ADDL, ra=C.stackptrR, rb=I.IMMop n, rc=C.stackptrR}

  fun loadImmed(n, base, rd) = 
    if ~32768 <= n andalso n < 32768 then
      emit(I.LDA{r=rd, b=base, d=I.IMMop n})
    else let
        val w = itow n
	val hi = Word.~>>(w, 0w16)
	val lo = Word.andb(w, 0w65535)
	val (hi', lo') =  
	  if lo < 0w32768 then (hi, lo) else (hi+0w1, lo-0w65536)
	emit(I.LDA{r=rd, b=base, d=I.IMMop(wtoi lo')});
	emit(I.LDAH{r=rd, b=rd, d=I.IMMop(wtoi hi')})

  (* loadImmed32 is used to load int32 and word32 constants.
   * In either case we sign extend the 32-bit value. This is compatible 
   * with LDL which sign extends a 32-bit valued memory location.
  fun loadImmed32 (n, base, rd) = let
    val low = W32.andb(n, 0w65535)		(* unsigned low 16 bits *)
    val high = W32.div(n, 0w65536)			(* Sign-extend *)
    val (lowsgn, highsgn) =			
      if W32.<=(low, 0w32767) then (low, high) 
      else (W32.-(low,0w65536), W32.+(high,0w1))
    val highsgn' = W32.andb(highsgn, 0w65535)
    emit(I.LDA{r=rd, b=base, d=I.IMMop(W32.toIntX lowsgn)});
    if (highsgn' = 0w0) then ()
    else if highsgn' < 0w32768 then 
      emit(I.LDAH{r=rd, b=rd, d=I.IMMop(W32.toIntX highsgn)})
      emit(I.LDAH{r=rd, b=rd, 
		  d=I.IMMop(W32.toIntX (W32.-(highsgn, 0w65536)))})

  fun orderedFArith (exp1, exp2, T.LR) = (fregAction exp1, fregAction exp2)
    | orderedFArith (exp1, exp2, T.RL) = let
        val f2 = fregAction exp2
      in (fregAction exp1, f2)

  and stmAction exp = let
    fun fbranch(_, T.FCMP(cc, exp1, exp2, order), lab) = let
      val (f1, f2) = orderedFArith(exp1, exp2, order)
      fun bcc(cmp, br) = let
	val tmpR = newFreg()
	emit(I.FOPERATE{oper=cmp, fa=f1, fb=f2, fc=tmpR});
 	emit(I.FBRANCH(br, tmpR, lab))

      fun fall(cmp1, br1, cmp2, br2) = let
	val tmpR1 = newFreg()
	val tmpR2 = newFreg()
	val fallLab = Label.newLabel ""
	emit(I.FOPERATE{oper=cmp1, fa=f1, fb=f2, fc=tmpR1});
	emit(I.FBRANCH(br1, tmpR1, fallLab));
	emit(I.FOPERATE{oper=cmp2, fa=f1, fb=f2, fc=tmpR2});
	emit(I.FBRANCH(br2, tmpR2, lab));
	F.defineLabel fallLab

      fun bcc2(cmp1, br1, cmp2, br2) = (bcc(cmp1, br1); bcc(cmp2, br2))
      case cc
      of T.==  => bcc(I.CMPTEQ, I.BNE)
       | T.?<> => bcc(I.CMPTEQ, I.BEQ)
       | T.?   => bcc(I.CMPTUN, I.BNE)
       | T.<=> => bcc(I.CMPTUN, I.BEQ)
       | T.>   => fall(I.CMPTLE, I.BNE, I.CMPTUN, I.BEQ)
       | T.>=  => fall(I.CMPTLT, I.BNE, I.CMPTUN, I.BEQ)
       | T.?>  => bcc(I.CMPTLE, I.BEQ)
       | T.?>= => bcc(I.CMPTLT, I.BEQ)
       | T.<   => bcc(I.CMPTLT, I.BNE)
       | T.<=  => bcc(I.CMPTLE, I.BNE)
       | T.?<  => bcc2(I.CMPTLT, I.BNE, I.CMPTUN, I.BNE)
       | T.?<=  => bcc2(I.CMPTLE, I.BNE, I.CMPTUN, I.BNE)
       | T.<> => fall(I.CMPTEQ, I.BNE, I.CMPTUN, I.BEQ)
       | T.?= => bcc2(I.CMPTEQ, I.BNE, I.CMPTUN, I.BNE)

    fun branch(cond, exp1, exp2, lab, order) = let
	fun zapHi r = emit(I.OPERATE{oper=I.ZAP, ra=r, rb=I.IMMop 0xf0, rc=r})
	val tmpR = newReg()
	val (r1, o2) = 
	  case order 
	   of T.LR => (regAction exp1, opndAction exp2)
	    | T.RL => let val o2' = opndAction exp2
		  (regAction(exp1), o2')
	fun emitBr(cmp, br) = 
	  (emit(I.OPERATE{oper=cmp, ra=r1, rb=o2, rc=tmpR});
	   emit(I.BRANCH(br, tmpR, lab)))
	fun emitUnsignedBr(cmp, br) = 
	   (case (r1, o2)
	     of (r1, I.REGop r2) => (zapHi r1; zapHi r2; emitBr(cmp, br))
	      | (r1, o2) => (zapHi r1; emitBr(cmp, br))
	case cond 
	 of I.CC_LTU => emitUnsignedBr(I.CMPULT, I.BNE)
	  | I.CC_LEU => emitUnsignedBr(I.CMPULE, I.BNE)
	  | I.CC_GTU => emitUnsignedBr(I.CMPULE, I.BEQ)
	  | I.CC_GEU => emitUnsignedBr(I.CMPULT, I.BEQ)
	  | I.CC_LT  => emitBr(I.CMPLT,  I.BNE)
	  | I.CC_LE  => emitBr(I.CMPLE,  I.BNE)
	  | I.CC_GT  => emitBr(I.CMPLE,  I.BEQ)
	  | I.CC_GE  => emitBr(I.CMPLT,  I.BEQ)
	  | I.CC_EQ  => emitBr(I.CMPEQ,  I.BNE)
	  | I.CC_NEQ => emitBr(I.CMPEQ,  I.BEQ)
    fun copyTmp() = SOME(I.Direct(newReg()))
    fun fcopyTmp() = SOME(I.FDirect(newFreg()))
    case exp
     of T.JMP(T.LABEL(LE.LABEL lab), _) => emit(I.BRANCH(I.BR, zeroR, lab))
      | T.JMP(T.LABEL _, _) => error "JMP(T.LABEL _, _)"
      | T.JMP(exp, labs) => 
          emit(I.JMPL({r=C.asmTmpR, b=regAction exp, d=0}, labs))
      | T.BCC(_, T.CMP(T.NEQ, T.ANDB(exp, T.LI 1), T.LI 0, _), lab) => 
	  emit(I.BRANCH(I.BLBS, regAction exp, lab))
      | T.BCC(_, T.CMP(cc, exp, T.LI n, ord), lab) => 
          branch(cond cc, exp, T.LI n, lab, ord)
      | T.BCC(_, T.CMP(cc, T.LI n, exp, ord), lab) => 
          branch(swapcc(cond cc), exp, T.LI n, lab, ord)
      | T.BCC(_, T.CMP(cc, e1, e2, ord), lab) => 
	  branch(cond cc, e1, e2, lab, ord)
      | T.BCC(_, e, lab) => emit(I.BRANCH(I.BNE, ccAction e, lab))
      | T.FBCC arg  => fbranch arg
      | T.CALL(exp, def, use) => let
	  val pv = regAction exp
	  val returnPtrR = 26
	  fun live([],acc) = acc
	    | live(T.GPR(T.REG r)::regs,acc) = live(regs, C.addReg(r,acc))
	    | live(T.CCR(T.CC cc)::regs,acc) = live(regs, C.addReg(cc,acc))
	    | live(T.FPR(T.FREG f)::regs,acc) = live(regs, C.addFreg(f,acc))
	    | live(_::regs, acc) = live(regs, acc)
	  emit(I.JSR({r=returnPtrR, b=pv, d=0},
		     live(def, C.addReg(returnPtrR, C.empty)),
		     live(use, C.addReg(pv, C.empty))))
      | T.RET  => emit(I.JMPL({r=zeroR, b=26, d=0}, []))
      | T.STORE8(ea, r, region) => let
	  val rs = regAction r
	  val (rd, disp) = eaAction ea
	  val t1 = newReg()
	  val t2 = newReg()
	  val t3 = newReg()
	  app emit
	     [I.LOAD{ldOp=I.LDQ_U, r=t1, b=rd, d=disp,mem=mem},
	      I.LDA{r=t2, b=rd, d=disp},
	      I.OPERATE{oper=I.INSBL, ra=rs, rb=I.REGop(t2), rc=t3},
	      I.OPERATE{oper=I.MSKBL, ra=t1, rb=I.REGop(t2), rc=t1},
	      I.OPERATE{oper=I.BIS, ra=t1, rb=I.REGop(t3), rc=t1},
	      I.STORE{stOp=I.STQ_U, r=t1, b=rd, d=disp, mem=mem}]
      | T.STORE32(ea, r, region) => let
	  val (b, d) = eaAction ea
        in emit(I.STORE{stOp=I.STL, r=regAction r, b=b, d=d, mem=region})
      | T.STORED(ea, f, region) => let
	  val (b, d) = eaAction ea
        in emit(I.FSTORE{stOp=I.STT, r=fregAction f, b=b, d=d, mem=region})
      | T.STORECC(ea, cc, region) => error "stmAction.STORECC"
      | T.MV(rd, exp) => let
	  fun move(dst, src) = I.OPERATE{oper=I.BIS, ra=src, rb=zeroOp, rc=dst}
	  case exp
	  of T.REG(rs) => if rs = rd then () else emit(move(rd, rs))
	   | T.LI n => loadImmed(n, zeroR, rd)
	   | T.LI32 w => loadImmed32(w, zeroR, rd)
	   | _ => let val rs = regActionRd(exp, rd)
	      in if rs = rd then () else emit(move(rd,rs))
      | T.CCMV(cd, exp) => let
	  val cs = case exp of T.CC(r) => r | _ => ccActionCd(exp, cd)
	  if cs = cd then () 
	  else emit(I.OPERATE{oper=I.BIS, ra=cs, rb=zeroOp, rc=cd})
      | T.FMV(fd, exp) => let
	  fun fmove(dst, src) = I.FOPERATE{oper=I.CPYS, fa=src, fb=src, fc=dst}
	 case exp 
	 of T.FREG(fs) => 
	     if fs = fd then () else emit(fmove(fd, fs))
	  | _ => let 
	      val fs = fregActionFd(exp, fd)
	    in if fs = fd then () else emit(fmove(fd, fs))
      | T.COPY(rds as [_], rss) => 
	  emit(I.COPY{dst=rds, src=rss, impl=ref NONE, tmp=NONE})
      | T.COPY(rds, rss) => 
	  emit(I.COPY{dst=rds, src=rss, impl=ref NONE, tmp=copyTmp()})
      | T.FCOPY(fds as [_], fss)=>  
	  emit(I.FCOPY{dst=fds, src=fss, impl=ref NONE, tmp=NONE})
      | T.FCOPY(fds, fss) =>  
	  emit(I.FCOPY{dst=fds, src=fss, impl=ref NONE, tmp=fcopyTmp()})
  and ccAction(T.CC r) = r
    | ccAction(e) = ccActionCd(e, C.newCCreg())

  and ccActionCd(T.CC r, _) = r
      (* enough for now ... *)
    | ccActionCd(T.CMP(T.GTU, e1, e2, ord), cd) = let
        val (opnd, reg) = 
	   case ord
	     of T.LR => (opndAction e1, regAction e2)
	      | T.RL => let 
	            val r = regAction e2
		  in (opndAction e1, r)
	emit(I.OPERATE{oper=I.CMPULT, ra=reg, rb=opnd, rc=cd}); 
    | ccActionCd(T.LOADCC _, _) = error  "ccAction"

  and opndAction (T.LI value) =				
      if value <= 255 andalso value >= 0 then I.IMMop value
      else let
	  val tmpR = newReg()
	  loadImmed (value, zeroR, tmpR);
	  I.REGop tmpR
    | opndAction(T.LI32 value) =
      if Word32.<=(value, 0w255) then I.IMMop (Word32.toInt value)
      else let 
	  val tmpR = newReg () 
	  loadImmed32 (value, zeroR, tmpR);
	  I.REGop tmpR
    | opndAction(T.CONST const) = I.CONSTop (const)
    | opndAction exp = I.REGop (regAction exp)

  and reduceOpnd(I.IMMop i) = let
        val rd = newReg()
      in loadImmed(i, zeroR, rd); rd
    | reduceOpnd(I.REGop rd) = rd
    | reduceOpnd(I.CONSTop c) = regAction(T.CONST c)
    | reduceOpnd(I.LABop le) = regAction(T.LABEL le)
    | reduceOpnd _ = error "reduceOpnd"

  and orderedRR(e1, e2, T.LR) = (regAction e1, regAction e2)
    | orderedRR(e1, e2, T.RL) = let
	val r2 = regAction e2
      in (regAction e1, r2)

  and regAction (T.REG r) = r
    | regAction exp = regActionRd(exp, newReg())

  and arithOperands(e1, e2, T.LR) = (regAction e1, opndAction e2)
    | arithOperands(e1, e2, T.RL) = let
        val opnd = opndAction e2
      in (regAction e1, opnd)
  and regActionRd(exp, rd) = let

    fun orderedArith(oper, arith, e1, e2, ord) = let
      val (reg, opnd) = arithOperands(e1, e2, ord)
      emit(oper{oper=arith, ra=reg, rb=opnd, rc=rd});

    fun commOrderedArith(oper, arith, e1 ,e2, ord) = let
      fun f(e1 as T.LI _, e2) = orderedArith(oper, arith, e2, e1, ord)
	| f(e1 as T.LI32 _, e2) = orderedArith(oper, arith, e2, e1, ord)
	| f(e1, e2) = orderedArith(oper, arith, e1, e2, ord)
      f(e1, e2)

    fun orderedArithTrap arg = orderedArith arg before emit(I.TRAPB)
    fun commOrderedArithTrap arg = commOrderedArith arg before emit(I.TRAPB)

    fun orderedMullTrap (e1, e2, ord, rd) = let
      val (reg, opnd) = case ord
	of T.LR => (regAction e1, opndAction e2)
         | T.RL => let
	     val opnd = opndAction e2
	     (regAction e1, opnd)

      fun emitMulvImmed (reg, 0, rd) = 
	    emit(I.LDA{r=rd, b=zeroR, d=I.IMMop 0})
	| emitMulvImmed (reg, 1, rd) = 
	    emit(I.OPERATE{oper=I.ADDL, ra=reg, rb=zeroOp, rc=rd})
	| emitMulvImmed (reg, multiplier, rd) = let
	    fun log2 0w1 = 0 | log2 n = 1 + (log2 (Word.>> (n, 0w1)))

	    fun exp2 n = Word.<<(0w1, n)

	    fun bitIsSet (x,n) = Word.andb(x,exp2 n) <> 0w0

	    fun loop (~1) = ()
	      | loop n =
	        (if bitIsSet(itow multiplier, itow n) then
		   emit(I.OPERATEV{oper=I.ADDLV, ra=reg, rb=I.REGop rd, rc=rd})
		 else ();
		 if n>0 then 
		   emit(I.OPERATEV{oper=I.ADDLV, ra=rd, rb=I.REGop rd, rc=rd})
		 else ();
	         loop (n-1))
	    emit(I.OPERATEV{oper=I.ADDLV, ra=reg, rb=I.REGop reg, rc=rd});
	    loop ((log2 (itow multiplier)) - 1)
      case opnd 
       of (I.IMMop multiplier) => emitMulvImmed (reg, multiplier, rd)
	| _ => emit (I.OPERATEV{oper=I.MULLV , ra=reg, rb=opnd, rc=rd})

    fun opndToWord (T.LI i) = SOME (Word32.fromInt i)
      | opndToWord (T.LI32 w) = SOME w
      | opndToWord _ = NONE
    case exp
     of T.LI n              => (loadImmed(n, zeroR, rd); rd)
      | T.LI32 w            => (loadImmed32(w, zeroR, rd); rd)
      | T.LABEL le	    => (emit(I.LDA{r=rd, b=zeroR, d=I.LABop le}); rd)
      | T.CONST c	    => (emit(I.LDA{r=rd, b=zeroR, d=I.CONSTop(c)}); rd)
      | T.ADD(e, T.LABEL le)=> 
         (emit(I.LDA{r=rd, b=regAction(e), d=I.LABop(le)}); rd)
      | T.ADD(T.LI i, e)    => (loadImmed(i, regAction e, rd); rd)
      | T.ADD(e, T.LI i)    => (loadImmed(i, regAction e, rd); rd)
      | T.ADD(T.LI32 i, e)  => (loadImmed32(i, regAction e, rd); rd)
      | T.ADD(e, T.LI32 i)  => (loadImmed32(i, regAction e, rd); rd)
      | T.ADD(T.CONST c, e) => 
	 (emit(I.LDA{r=rd, b=regAction e, d=I.CONSTop c}); rd)
      | T.ADD(e, T.CONST c) => 
	 (emit(I.LDA{r=rd, b=regAction e, d=I.CONSTop c}); rd)
      | T.ADD(e1, e2)       => commOrderedArith(I.OPERATE, I.ADDL, e1, e2, T.LR)
      | T.SUB(e1, e2, ord)  => orderedArith(I.OPERATE, I.SUBL, e1, e2, ord)
      | T.MULU(e1, e2)	    => commOrderedArith(I.OPERATE, I.MULL, e1, e2, T.LR)
      | T.ADDT(e1, e2)	    =>
	  commOrderedArithTrap(I.OPERATEV, I.ADDLV, e1, e2, T.LR)
      | T.SUBT(e1, e2, ord) => orderedArithTrap(I.OPERATEV, I.SUBLV, e1, e2, ord)
      | T.MULT(e1, e2) 	    => orderedMullTrap(e1, e2, T.LR, rd)
      | T.ANDB(e1, e2)	    => let
          fun opndToByteMask (SOME (0wx0:Word32.word)) = SOME 0xf
	    | opndToByteMask (SOME 0wx000000ff) = SOME 0xe
	    | opndToByteMask (SOME 0wx0000ff00) = SOME 0xd
	    | opndToByteMask (SOME 0wx0000ffff) = SOME 0xc
	    | opndToByteMask (SOME 0wx00ff0000) = SOME 0xb
	    | opndToByteMask (SOME 0wx00ff00ff) = SOME 0xa
	    | opndToByteMask (SOME 0wx00ffff00) = SOME 0x9
	    | opndToByteMask (SOME 0wx00ffffff) = SOME 0x8
            | opndToByteMask (SOME 0wxff000000) = SOME 0x7
	    | opndToByteMask (SOME 0wxff0000ff) = SOME 0x6
	    | opndToByteMask (SOME 0wxff00ff00) = SOME 0x5
	    | opndToByteMask (SOME 0wxff00ffff) = SOME 0x4
	    | opndToByteMask (SOME 0wxffff0000) = SOME 0x3
	    | opndToByteMask (SOME 0wxffff00ff) = SOME 0x2
	    | opndToByteMask (SOME 0wxffffff00) = SOME 0x1
	    | opndToByteMask (SOME 0wxffffffff) = SOME 0x0
	    | opndToByteMask _ = NONE

	  val opndToMask = opndToByteMask o opndToWord
	  case (opndToMask e1, opndToMask e2) of
	    (SOME mask, _) =>
	      orderedArith(I.OPERATE, I.ZAP, e2, T.LI mask, T.LR)
	  | (_, SOME mask) => 
	      orderedArith(I.OPERATE, I.ZAP, e1, T.LI mask, T.LR)
	  | _ => commOrderedArith(I.OPERATE, I.AND, e1, e2, T.LR)
      | T.ORB(e1, e2) 	    => commOrderedArith(I.OPERATE, I.BIS, e1, e2, T.LR)
      | T.XORB(e1, e2)	    => commOrderedArith(I.OPERATE, I.XOR, e1, e2, T.LR)
      | T.SLL(e1, e2, ord)  => 
	  (case opndToWord e2 of
	     SOME 0w1 => let val r = T.REG (regAction e1)
               in orderedArith(I.OPERATE, I.ADDL, r, r, T.LR) end
	   | SOME 0w2 => orderedArith(I.OPERATE, I.S4ADDL, e1, T.LI 0, T.LR)
	   | SOME 0w3 => orderedArith(I.OPERATE, I.S8ADDL, e1, T.LI 0, T.LR)
	   | _ => (orderedArith(I.OPERATE, I.SLL, e1, e2, T.LR);
		   emit(I.OPERATE{oper=I.SGNXL, ra=rd, rb=zeroOp, rc=rd});
      | T.SRA(e1, e2, ord)  => let
	    val (reg, opnd) = (regAction e1, opndAction e2)
	    (* sign extend longword argument *)
	    emit(I.OPERATE{oper=I.SGNXL, ra=reg, rb=zeroOp, rc=reg});
	    emit(I.OPERATE{oper=I.SRA, ra=reg, rb=opnd, rc=rd});
      | T.SRL(e1, e2, ord)  => let
	  val (reg, opnd) = (regAction e1, opndAction e2)
	  emit(I.OPERATE{oper=I.ZAP, ra=reg, rb=I.IMMop 0xf0, rc=reg});
	  emit(I.OPERATE{oper=I.SRL, ra=reg, rb=opnd, rc=rd});
      | T.DIVT arg => let
	  val (reg, opnd) = arithOperands arg
	  app emit (PseudoInstrs.divl({ra=reg, rb=opnd, rc=rd}, reduceOpnd));
      | T.DIVU arg => let
	  val (reg, opnd) = arithOperands arg
	  app emit (PseudoInstrs.divlu({ra=reg, rb=opnd, rc=rd}, reduceOpnd));
      | T.LOAD32(exp, region) => let
	  val (b, d) = eaAction exp
	in emit(I.LOAD{ldOp=I.LDL, r=rd, b=b, d=d, mem=region}); rd
	(* Load and sign-extend a byte from a  non-aligned address  *)
      | T.LOAD8(exp, region) => let
	  val tmpR0 = newReg()
	  val tmpR1 = newReg()
	  val (rt, disp) = eaAction exp
	  emit(I.LOAD{ldOp=I.LDQ_U, r=tmpR0, b=rt, d=disp, mem=mem});
	  emit(I.LDA{r=tmpR1, b=rt, d=disp});
	  emit(I.OPERATE{oper=I.EXTBL, ra=tmpR0, rb=I.REGop tmpR1, rc=rd});
      | T.SEQ(e1, e2) => (stmAction e1; regAction e2)
      | _ => error "regAction"
  end (* regActionRd *)

  and eaAction exp = let
    fun makeEA(r, n) = 
      if ~32768 <= n andalso n <= 32767 then (r, I.IMMop n)
      else let
	  val tmpR = newReg()
	  val low = wtoi(Word.andb(itow n, 0w65535))(* unsigned low 16 bits *)
	  val high = n div 65536
	  val (lowsgn, highsgn) =			 (* Sign-extend *)
	    if low <= 32767 then (low, high) else (low -65536, high+1)
	  (emit(I.LDAH{r=tmpR, b=r, d=I.IMMop highsgn});
	   (tmpR, I.IMMop lowsgn))
    case exp
     of T.ADD(exp, T.LI n)    => makeEA(regAction exp,  n)
      | T.ADD(T.LI n, exp)    => makeEA(regAction exp,  n)
      | T.ADD(T.CONST c, exp) => (regAction exp, I.CONSTop(c))
      | T.ADD(exp, T.CONST c) => (regAction exp, I.CONSTop(c))
      | T.SUB(exp, T.LI n, _) => makeEA(regAction exp, ~n)
      | exp                   => makeEA(regAction exp,  0)
  end (* eaAction *)

  and fregAction (T.FREG f) = f
    | fregAction exp = fregActionFd(exp, newFreg())

  and fregActionFd(exp, fd) = let
    (* macho comment goes here *)
    fun doFloatArith(farith, e1, e2, fd, order) = let
      val (f1, f2) = orderedFArith(e1, e2, order)
      emit(I.DEFFREG fd);
      emit(I.FOPERATEV{oper=farith, fa=f1, fb=f2, fc=fd});
    case exp 
     of T.FREG f	     => f
      | T.FADDD(e1, e2)      => doFloatArith(I.ADDT, e1, e2, fd, T.LR)
      | T.FMULD(e1, e2)      => doFloatArith(I.MULT, e1, e2, fd, T.LR)
      | T.FSUBD(e1, e2, ord) => doFloatArith(I.SUBT, e1, e2, fd, ord)
      | T.FDIVD(e1, e2, ord) => doFloatArith(I.DIVT, e1, e2, fd, ord)
      | T.FABSD exp          => 
         (emit(I.FOPERATE{oper=I.CPYS, fa=zeroR, fb=fregAction exp, fc=fd}); fd)
      | T.FNEGD exp          => let
          val fs = fregAction exp 
	  emit(I.FOPERATE{oper=I.CPYSN, fa=fs, fb=fs, fc=fd});  fd
      | T.CVTI2D exp         => let
	  val opnd  = opndAction exp
	  app emit (PseudoInstrs.cvti2d({opnd=opnd, fd=fd},reduceOpnd));
      | T.LOADD(exp, region) => let
	  val (b, d) = eaAction exp
        in emit(I.FLOAD{ldOp=I.LDT, r=fd, b=b, d=d, mem=region}); fd
      | T.FSEQ(e1, e2)        => (stmAction e1; fregAction e2)

  fun mltreeComp mltree = let
    (* condition code registers are mapped onto general registers *)
    fun cc (T.CCR(T.CC cc)) = T.GPR(T.REG cc)
      | cc x = x

    fun mltc(T.PSEUDO_OP pOp)    = F.pseudoOp pOp
      | mltc(T.DEFINELABEL lab)  = F.defineLabel lab
      | mltc(T.ENTRYLABEL lab)   = F.entryLabel lab
      | mltc(T.ORDERED mlts)     = F.ordered mlts
      | mltc(T.BEGINCLUSTER)     = F.beginCluster()
      | mltc(T.CODE stms)        = app stmAction stms
      | mltc(T.ENDCLUSTER regmap)= F.endCluster regmap
      | mltc(T.ESCAPEBLOCK regs) = F.exitBlock (map cc regs)
  in mltc mltree

  val mlriscComp  = stmAction

 * $Log: alpha32.sml,v $
 * Revision 1.12  1998/02/16 13:57:41  george
 *   A register allocated temp is now associated with parallel COPYs
 *   instead of a dedicated register. The temp is used to break cycles.
 * Revision 1.11  1997/09/17 20:00:54  george
 *   The AND operation with certain bit patterns is implemented using ZAP,
 *   and shifting by 1, 2, and 3, is implemented using ADDL, S4ADDL, and S8ADDL.
 * 							-- ken cline
# Revision 1.10  1997/09/17  17:10:10  george
#   added support for MLRisc ENTRYLABEL
# Revision 1.9  1997/09/12  10:12:54  george
#   provided support for unsigned comparisons in sml/nj checklimit
# Revision 1.8  1997/08/30  02:09:35  george
#   Changed cvti2d to take an I.operand as argument instead of a register
# Revision 1.7  1997/08/29  11:19:43  george
#   removed code to generate div-by-zero traps
# Revision 1.6  1997/08/29  10:57:42  george
#   div, divlu, and cvti2d operations are now implemented using pseudo-
#   instructions instead of floating point registers. The latter approach
#   allocated space on the stack which could conflict with spills.
# Revision 1.5  1997/08/27  17:49:22  george
#    Addition of immediate constants is performed by LDO instead of ADD.
#    THis allows for larger constants.
# Revision 1.4  1997/07/28  20:03:42  george
#   Added support for regions
# Revision 1.3  1997/07/17  12:24:41  george
#   The regmap is now represented as an int map rather than using arrays.
# Revision 1.2  1997/07/10  03:59:43  george
#   Added translation for MLTree.ORDERED.
# Revision  1997/04/19  18:14:21  george
#   Version 109.27

ViewVC Help
Powered by ViewVC 1.0.0