Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/NOTES/HISTORY
ViewVC logotype

Diff of /sml/trunk/NOTES/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 583, Thu Mar 23 21:52:30 2000 UTC revision 585, Wed Mar 29 23:55:35 2000 UTC
# Line 11  Line 11 
11  Date:  Date:
12  Tag: <post-commit CVS tag>  Tag: <post-commit CVS tag>
13  Description:  Description:
14    
15    ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2000/03/29 18:00:00
18    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
19    Description:
20    
21       This update contains *MAJOR* changes to the way code is generated from CPS
22    in the module mlriscGen, and in various backend modules.
23    
24    CHANGES
25    =======
26    
27    1. MLRiscGen: forward propagation fix.
28    
29       There was a bug in forward propagation introduced at about the same time
30       as the MLRISC x86 backend, which prohibits coalescing to be
31       performed effectively in loops.
32    
33       Effect: speed up of loops in RISC architectures.
34               By itself, this actually slowed down certain benchmarks on the x86.
35    
36    2. MLRiscGen:  forward propagating addresses from consing.
37    
38       I've changed the way consing code is generated.  Basically I separated
39       out the initialization part:
40    
41            store tag,   offset(allocptr)
42            store elem1, offset+4(allocptr)
43            store elem2, offset+8(allocptr)
44            ...
45            store elemn, offset+4n(allocptr)
46    
47       and the address computation part:
48    
49            celladdr <- offset+4+alloctpr
50    
51       and move the address computation part
52    
53       Effect:  register pressure is generally lower as a result.  This
54                makes compilation of certain expressions much faster, such as
55                long lists with non-trivial elements.
56    
57                 [(0,0), (0,0), .... (0,0)]
58    
59    3. MLRiscGen: base pointer elimination.
60    
61        As part of the linkage mechanism, we generate the sequence:
62    
63         L:  ...  <- start of the code fragment
64    
65         L1:
66             base pointer <- linkreg - L1 + L
67    
68         The base pointer was then used for computing relocatable addresses
69       in the code fragment.  Frequently (such as in lots of continuations)
70       this is not needed.  We now eliminate this sequence whenever possible.
71    
72         For compile time efficiency, I'm using a very stupid local heuristic.
73       But in general, this should be done as a control flow analysis.
74    
75       Effect:  Smaller code size.  Speed up of most programs.
76    
77    4. Hppa back end
78    
79         Long jumps in span dependence resolution used to depend on the existence
80      of the base pointer.
81    
82         A jump to a long label L was expanded into the following sequence:
83    
84          LDIL %hi(L-8192), %r29
85          LDO  %lo(L-8192)(%r29), %r29
86          ADD  %r29, baseptr, %r29
87          BV,n %r0(%r29)
88    
89         In the presence of change (3) above, this will not work.  I've changed
90       it so that the following sequence of instructions are generated, which
91       doesn't mention the base pointer at all:
92    
93             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
94        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
95             BV,n  %r0(%r29)          /* Jump */
96    
97    5. Alpha back end
98    
99          New alpha instructions LDB/LDW have been added, as per Fermin's
100       suggestions.   This is unrelated to all other changes.
101    
102    6. X86 back end
103    
104         I've changed andl to testl in the floating point test sequence
105         whenever appropriate.  The Intel optimization guide states that
106         testl is perferable to andl.
107    
108    7. RA (x86 only)
109    
110         I've improved the spill propagation algorithm, using an approximation
111       of maximal weighted independent sets.   This seems to be necessary to
112       alleviate the negative effect in light of the slow down in (1).
113    
114         I'll write down the algorithm one of these days.
115    
116    8. MLRiscGen: frequencies
117    
118         I've added an annotation that states that all call gc blocks have zero
119       execution frequencies.  This improves register allocation on the x86.
120    
121    BENCHMARKS
122    ==========
123    
124       I've only perform the comparison on 110.25.
125    
126       The platforms are:
127    
128        HPPA  A four processor HP machine (E9000) with 5G of memory.
129        X86   A 300Hhz Pentium II with 128M of memory, and
130        SPARC An Ultra sparc 2 with 512M of memory.
131    
132       I used the following parameters for the SML benchmarks:
133    
134                 @SMLalloc
135         HPPA    256k
136         SPARC   512k
137         X86     256k
138    
139    COMPILATION TIME
140    ----------------
141       Here are the numbers comparing the compilation times of the compilers.
142       I've only compared 110.25 compiling the new sources versus
143       a fixpoint version of the new compiler compiling the same.
144    
145                     110.25                                  New
146               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
147         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
148         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
149         X86    999s    315s       94006+130691   987s    296s    108877+141957
150    
151                   110.25         New
152                Code Size      Code Size
153         HPPA   8596736         8561421
154         SPARC  8974299         8785143
155         X86    9029180         8716783
156    
157       So in summary, things are at least as good as before.   Dramatic
158       reduction in compilation is obtained on the Sparc; I can't explain it,
159       but it is reproducible.  Perhaps someone should try to reproduce this
160       on their own machines.
161    
162    SML BENCHMARKS
163    --------------
164    
165        On the average, all benchmarks perform at least as well as before.
166    
167          HPPA         Compilation Time     Spill+Reload      Run Time
168                     110.25  New            110.25    New   110.25  New
169    
170          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
171              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
172       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
173                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
174        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
175             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
176               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
177              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
178         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
179             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
180            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
181      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
182                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
183             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
184                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
185               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
186      --------------------------------------------------------------------------
187       Average                     6.12%                                   4.09%
188    
189          SPARC        Compilation Time     Spill+Reload      Run Time
190                     110.25  New            110.25    New   110.25  New
191    
192          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
193              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
194       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
195                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
196        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
197             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
198               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
199              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
200         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
201             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
202            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
203      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
204                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
205             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
206                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
207               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
208      ----------------------------------------------------------------------------
209       Average                     6.94%                                    2.64%
210    
211          X86          Compilation Time     Spill+Reload      Run Time
212                     110.25  New            110.25    New   110.25  New
213    
214          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
215              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
216       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
217                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
218        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
219             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
220               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
221              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
222         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
223             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
224            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
225      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
226                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
227             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
228                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
229               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
230      ----------------------------------------------------------------------------
231       Average                     1.22%                                     3.36%
232    
233  ----------------------------------------------------------------------  ----------------------------------------------------------------------
234  Name: Allen Leung  Name: Allen Leung
235  Date: 2000/03/23 16:25:00  Date: 2000/03/23 16:25:00

Legend:
Removed from v.583  
changed lines
  Added in v.585

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0