31 |
|
|
32 |
for further information about these changes. |
for further information about these changes. |
33 |
|
|
34 |
MLRISC, and particularly the x86 back end have been modiefied extensively. |
MLRISC, and particularly the x86 back end have been modified extensively. |
35 |
|
|
36 |
There are a few updates to the SML/NJ Library |
There are a few updates to the SML/NJ Library |
37 |
|
|
77 |
MLRISC: |
MLRISC: |
78 |
====================================================================== |
====================================================================== |
79 |
|
|
80 |
Name: Allen Leung |
1. Register Allocator |
81 |
Date: 2000/03/10 02:20:00 |
|
82 |
Tag: leunga-20000310-fix_x86_asm_ra |
a. The interface and implementation of the register allocator have been |
83 |
Description: |
changed slightly to accommodate the possibility of skipping |
84 |
|
the register allocation phases completely and go directly to |
85 |
|
memory allocation. This is needed for C-- use. |
86 |
|
|
87 |
|
b. I've improved the spill propagation algorithm, using an approximation |
88 |
|
of maximal weighted independent sets. This affects only the x86 |
89 |
|
platform. |
90 |
|
|
91 |
|
2. MLTREE |
92 |
|
|
93 |
|
a. Renamed the constructor CALL in MLTREE by popular demand. |
94 |
|
|
95 |
More assembly output problems involving the indexed addressing mode |
3. X86 |
96 |
|
|
97 |
|
a. More assembly output problems involving the indexed addressing mode |
98 |
on the x86 have been found and corrected. Thanks to Fermin Reig for the |
on the x86 have been found and corrected. Thanks to Fermin Reig for the |
99 |
fix. |
fix. |
100 |
|
|
101 |
The interface and implementation of the register allocator have been changed |
b. x86Rewrite bug with MUL3 (found by Lal) |
|
slightly to accommodate the possibility to skip the register allocation |
|
|
phases completely and go directly to memory allocation. This is needed |
|
|
for C-- use. |
|
102 |
|
|
103 |
---------------------------------------------------------------------- |
c. Added the instructions FSTS, FSTL |
|
Name: Allen Leung |
|
|
Date: 2000/03/22 01:23:00 |
|
|
Tag: leunga-20000322-fix_x86_hppa_ra |
|
|
Description: |
|
104 |
|
|
105 |
1. X86 fixes/changes |
d. The old code generated for SETcc was completely wrong. |
106 |
|
The Intel optimization guide is VERY misleading. |
107 |
|
|
108 |
a. x86Rewrite bug with MUL3 (found by Lal) |
e. Various fixes related floating point, and extensions. |
109 |
b. Added the instructions FSTS, FSTL |
|
110 |
|
f. Things like |
111 |
|
|
112 |
|
jmp %eax |
113 |
|
jmp (%eax) |
114 |
|
|
115 |
|
are now output as |
116 |
|
|
117 |
|
jmp *%eax |
118 |
|
jmp *(%eax) |
119 |
|
|
120 |
2. PA-RISC fixes/changes |
g. Yet another fix for x86 assembly for idivl, imull, mull and friends. |
121 |
|
|
122 |
|
h. I've changed andl to testl in the floating point test sequence |
123 |
|
whenever appropriate. The Intel optimization guide states that |
124 |
|
testl is perferable to andl. |
125 |
|
|
126 |
|
4. Alpha |
127 |
|
|
128 |
|
a. Some extra patterns related to loads with signed/zero extension |
129 |
|
provided by Fermin. |
130 |
|
b. Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion. |
131 |
|
c. Added a new mode byteWordLoadStores to the functor parameter to Alpha() |
132 |
|
d. Added reassociation code for address computation. |
133 |
|
|
134 |
|
5. PA-RISC |
135 |
|
|
136 |
a. B label should not be a delay slot candidate! Why did this work? |
a. B label should not be a delay slot candidate! Why did this work? |
137 |
b. ADDT(32, REG(32, r), LI n) now generates one instruction instead of two, |
b. ADDT(32, REG(32, r), LI n) now generates one instruction instead of two, |
139 |
c. The assembly syntax for fstds and fstdd was wrong. |
c. The assembly syntax for fstds and fstdd was wrong. |
140 |
d. Added the composite instruction COMICLR/LDO, which is the immediate |
d. Added the composite instruction COMICLR/LDO, which is the immediate |
141 |
operand variant of COMCLR/LDO. |
operand variant of COMCLR/LDO. |
142 |
|
e. Long jumps in span dependence resolution used to depend on the existence |
143 |
|
of the base pointer in the SML/NJ runtime. |
144 |
|
|
145 |
|
A jump to a long label L was expanded into the following sequence: |
146 |
|
|
147 |
|
LDIL %hi(L-8192), %r29 |
148 |
|
LDO %lo(L-8192)(%r29), %r29 |
149 |
|
ADD %r29, baseptr, %r29 |
150 |
|
BV,n %r0(%r29) |
151 |
|
|
152 |
3. Generic MLRISC |
I've changed it so that the following sequence of instructions |
153 |
|
are generated, which doesn't mention the base pointer at all: |
154 |
|
|
155 |
|
BL,n L', %r29 /* branch and link, L' + 4 -> %r29 */ |
156 |
|
L': ADDIL L-(L'+4), %r29 /* Compute address of L */ |
157 |
|
BV,n %r0(%r29) /* Jump */ |
158 |
|
|
159 |
|
6. Generic MLRISC |
160 |
|
|
161 |
a. shuffle.sml rewritten to be slightly more efficient |
a. shuffle.sml rewritten to be slightly more efficient |
162 |
b. DIV bug in mltree-simplify fixed (found by Fermin) |
b. DIV bug in mltree-simplify fixed (found by Fermin) |
163 |
|
|
164 |
4. Register Allocator |
7. Assembly Output |
165 |
|
|
166 |
a. I now release the interference graph earlier during spilling. |
a. When generating assemby, resolve the value of client defined constants, |
167 |
May improve memory usage. |
instead of generating symbolic values. This is controlled by the |
168 |
|
new flag "asm-resolve-constants", which is default to true. |
169 |
|
|
170 |
---------------------------------------------------------------------- |
b. Added a new flag |
|
Name: Allen Leung |
|
|
Date: 2000/03/23 16:25:00 |
|
|
Tag: leunga-20000323-fix_x86_alpha |
|
|
Description: |
|
171 |
|
|
172 |
1. X86 fixes/changes |
"asm-indent-copies" (default to false) |
173 |
|
|
174 |
a. The old code generated for SETcc was completely wrong. |
When this flag is on, parallel copies will be indented an extra level. |
|
The Intel optimization guide is VERY misleading. |
|
175 |
|
|
|
2. ALPHA fixes/changes |
|
176 |
|
|
177 |
a. Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion. |
8. Machine Descriptions/Generation |
|
b. Added a new mode byteWordLoadStores to the functor parameter to Alpha() |
|
|
c. Added reassociation code for address computation. |
|
178 |
|
|
179 |
---------------------------------------------------------------------- |
a. The precedence parser was slightly broken when parsing infixr symbols. |
180 |
Name: Allen Leung |
b. The type generalizing code had the bound variables reversed, resulting |
181 |
Date: 2000/03/29 18:00:00 |
in a problem during arity raising. |
182 |
Tag: leunga-20000327-mlriscGen_hppa_alpha_x86 |
c. Various fixes in machine descriptions. |
|
Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz |
|
|
Description: |
|
183 |
|
|
184 |
This update contains *MAJOR* changes to the way code is generated from CPS |
====================================================================== |
185 |
in the module mlriscGen, and in various backend modules. |
CPS->MLRISC Code Generation |
186 |
|
====================================================================== |
187 |
|
|
188 |
CHANGES |
This release contains *MAJOR* changes to the way code is generated from CPS |
189 |
======= |
in the module mlriscGen, and in various backend modules. |
190 |
|
|
191 |
1. MLRiscGen: forward propagation fix. |
1. Forward propagation fix. |
192 |
|
|
193 |
There was a bug in forward propagation introduced at about the same time |
There was a bug in forward propagation introduced at about the same time |
194 |
as the MLRISC x86 backend, which prohibits coalescing to be |
as the MLRISC x86 backend, which prohibits coalescing to be |
197 |
Effect: speed up of loops in RISC architectures. |
Effect: speed up of loops in RISC architectures. |
198 |
By itself, this actually slowed down certain benchmarks on the x86. |
By itself, this actually slowed down certain benchmarks on the x86. |
199 |
|
|
200 |
2. MLRiscGen: forward propagating addresses from consing. |
2. Forward propagating addresses from consing. |
201 |
|
|
202 |
I've changed the way consing code is generated. Basically I separated |
I've changed the way consing code is generated. Basically I separated |
203 |
out the initialization part: |
out the initialization part: |
220 |
|
|
221 |
[(0,0), (0,0), .... (0,0)] |
[(0,0), (0,0), .... (0,0)] |
222 |
|
|
223 |
3. MLRiscGen: base pointer elimination. |
3. Base pointer elimination. |
224 |
|
|
225 |
As part of the linkage mechanism, we generate the sequence: |
As part of the linkage mechanism, we generate the sequence: |
226 |
|
|
238 |
|
|
239 |
Effect: Smaller code size. Speed up of most programs. |
Effect: Smaller code size. Speed up of most programs. |
240 |
|
|
|
4. Hppa back end |
|
|
|
|
|
Long jumps in span dependence resolution used to depend on the existence |
|
|
of the base pointer. |
|
|
|
|
|
A jump to a long label L was expanded into the following sequence: |
|
|
|
|
|
LDIL %hi(L-8192), %r29 |
|
|
LDO %lo(L-8192)(%r29), %r29 |
|
|
ADD %r29, baseptr, %r29 |
|
|
BV,n %r0(%r29) |
|
241 |
|
|
242 |
In the presence of change (3) above, this will not work. I've changed |
4. Frequency annotations |
|
it so that the following sequence of instructions are generated, which |
|
|
doesn't mention the base pointer at all: |
|
|
|
|
|
BL,n L', %r29 /* branch and link, L' + 4 -> %r29 */ |
|
|
L': ADDIL L-(L'+4), %r29 /* Compute address of L */ |
|
|
BV,n %r0(%r29) /* Jump */ |
|
|
|
|
|
5. Alpha back end |
|
|
|
|
|
New alpha instructions LDB/LDW have been added, as per Fermin's |
|
|
suggestions. This is unrelated to all other changes. |
|
|
|
|
|
6. X86 back end |
|
|
|
|
|
I've changed andl to testl in the floating point test sequence |
|
|
whenever appropriate. The Intel optimization guide states that |
|
|
testl is perferable to andl. |
|
|
|
|
|
7. RA (x86 only) |
|
|
|
|
|
I've improved the spill propagation algorithm, using an approximation |
|
|
of maximal weighted independent sets. This seems to be necessary to |
|
|
alleviate the negative effect in light of the slow down in (1). |
|
|
|
|
|
I'll write down the algorithm one of these days. |
|
|
|
|
|
8. MLRiscGen: frequencies |
|
243 |
|
|
244 |
I've added an annotation that states that all call gc blocks have zero |
I've added an annotation that states that all call gc blocks have zero |
245 |
execution frequencies. This improves register allocation on the x86. |
execution frequencies. This improves register allocation on the x86. |
356 |
---------------------------------------------------------------------------- |
---------------------------------------------------------------------------- |
357 |
Average 1.22% 3.36% |
Average 1.22% 3.36% |
358 |
|
|
|
---------------------------------------------------------------------- |
|
|
Name: Allen Leung |
|
|
Date: 2000/03/31 21:15:00 EST |
|
|
Tag: leunga-20000331-aliasing |
|
|
Description: |
|
359 |
|
|
360 |
|
|
361 |
|
Aliasing |
362 |
|
--------- |
363 |
This update contains a rewritten (and hopefully more correct) module |
This update contains a rewritten (and hopefully more correct) module |
364 |
for extracting aliasing information from CPS. |
for extracting aliasing information from CPS. |
365 |
|
|
378 |
The default of n is 3. |
The default of n is 3. |
379 |
|
|
380 |
---------------------------------------------------------------------- |
---------------------------------------------------------------------- |
|
Name: Allen Leung |
|
|
Date: 2000/04/02 21:17:00 EST |
|
|
Tag: leunga-20000402-mltree |
|
|
Description: |
|
|
|
|
|
1. Renamed the constructor CALL in MLTREE by popular demand. |
|
|
2. Added a bunch of files from my repository. These are currently |
|
|
used by other non-SMLNJ backends. |
|
|
|
|
|
---------------------------------------------------------------------- |
|
|
Name: Allen Leung |
|
|
Date: 2000/04/04 03:18:00 EST |
|
|
Tag: leunga-20000404-C--Moby |
|
|
Description: |
|
|
|
|
|
All of these fixes are related to C--, Moby, and my own optimization |
|
|
stuff; so they shouldn't affect SML/NJ. |
|
|
|
|
|
1. X86 |
|
|
|
|
|
Various fixes related floating point, and extensions. |
|
|
|
|
|
2. Alpha |
|
|
|
|
|
Some extra patterns related to loads with signed/zero extension |
|
|
provided by Fermin. |
|
|
|
|
|
3. Assembly |
|
|
|
|
|
When generating assemby, resolve the value of client defined constants, |
|
|
instead of generating symbolic values. This is controlled by the |
|
|
new flag "asm-resolve-constants", which is default to true. |
|
|
|
|
|
4. Machine Descriptions |
|
|
|
|
|
a. The precedence parser was slightly broken when parsing infixr symbols. |
|
|
b. The type generalizing code had the bound variables reversed, resulting |
|
|
in a problem during arity raising. |
|
|
c. Various fixes in machine descriptions. |
|
|
|
|
|
---------------------------------------------------------------------- |
|
|
Name: Allen Leung |
|
|
Date: 2000/04/04 19:39:00 EST |
|
|
Tag: leunga-20000404-x86-asm |
|
|
Description: |
|
|
|
|
|
1. Fixed a problem in X86 assembly. |
|
|
|
|
|
Things like |
|
|
|
|
|
jmp %eax |
|
|
jmp (%eax) |
|
|
|
|
|
should be output as |
|
|
|
|
|
jmp *%eax |
|
|
jmp *(%eax) |
|
|
|
|
|
2. Assembly output |
|
|
|
|
|
Added a new flag |
|
|
|
|
|
"asm-indent-copies" (default to false) |
|
|
|
|
|
When this flag is on, parallel copies will be indented an extra level. |
|
|
|
|
|
---------------------------------------------------------------------- |
|
|
Name: Allen Leung |
|
|
Date: 2000/04/06 00:36:00 EST |
|
|
Tag: leunga-20000406-peephole-x86-SSA |
|
|
Description: |
|
|
|
|
|
1. New Peephole code |
|
|
|
|
|
2. Minor improvement to X86 instruction selection |
|
|
|
|
|
3. Various fixes to SSA and machine description -> code translator |
|
|
|
|
|
---------------------------------------------------------------------- |
|
|
Name: Allen Leung |
|
|
Date: 2000/04/09 19:09:00 EST |
|
|
Tag: leunga-20000409-misc |
|
|
Description: |
|
|
|
|
|
1. Yet another fix for x86 assembly for idivl, imull, mull and friends. |
|
|
|
|
|
2. Miscellaneous improvements to MLRISC (unused in sml/nj) |
|
|
|
|
381 |
|
|
382 |
====================================================================== |
====================================================================== |
383 |
CM |
CM |