Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 575, Fri Mar 10 02:55:58 2000 UTC revision 585, Wed Mar 29 23:55:35 2000 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2000/03/29 18:00:00
18    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
19    Description:
20    
21       This update contains *MAJOR* changes to the way code is generated from CPS
22    in the module mlriscGen, and in various backend modules.
23    
24    CHANGES
25    =======
26    
27    1. MLRiscGen: forward propagation fix.
28    
29       There was a bug in forward propagation introduced at about the same time
30       as the MLRISC x86 backend, which prohibits coalescing to be
31       performed effectively in loops.
32    
33       Effect: speed up of loops in RISC architectures.
34               By itself, this actually slowed down certain benchmarks on the x86.
35    
36    2. MLRiscGen:  forward propagating addresses from consing.
37    
38       I've changed the way consing code is generated.  Basically I separated
39       out the initialization part:
40    
41            store tag,   offset(allocptr)
42            store elem1, offset+4(allocptr)
43            store elem2, offset+8(allocptr)
44            ...
45            store elemn, offset+4n(allocptr)
46    
47       and the address computation part:
48    
49            celladdr <- offset+4+alloctpr
50    
51       and move the address computation part
52    
53       Effect:  register pressure is generally lower as a result.  This
54                makes compilation of certain expressions much faster, such as
55                long lists with non-trivial elements.
56    
57                 [(0,0), (0,0), .... (0,0)]
58    
59    3. MLRiscGen: base pointer elimination.
60    
61        As part of the linkage mechanism, we generate the sequence:
62    
63         L:  ...  <- start of the code fragment
64    
65         L1:
66             base pointer <- linkreg - L1 + L
67    
68         The base pointer was then used for computing relocatable addresses
69       in the code fragment.  Frequently (such as in lots of continuations)
70       this is not needed.  We now eliminate this sequence whenever possible.
71    
72         For compile time efficiency, I'm using a very stupid local heuristic.
73       But in general, this should be done as a control flow analysis.
74    
75       Effect:  Smaller code size.  Speed up of most programs.
76    
77    4. Hppa back end
78    
79         Long jumps in span dependence resolution used to depend on the existence
80      of the base pointer.
81    
82         A jump to a long label L was expanded into the following sequence:
83    
84          LDIL %hi(L-8192), %r29
85          LDO  %lo(L-8192)(%r29), %r29
86          ADD  %r29, baseptr, %r29
87          BV,n %r0(%r29)
88    
89         In the presence of change (3) above, this will not work.  I've changed
90       it so that the following sequence of instructions are generated, which
91       doesn't mention the base pointer at all:
92    
93             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
94        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
95             BV,n  %r0(%r29)          /* Jump */
96    
97    5. Alpha back end
98    
99          New alpha instructions LDB/LDW have been added, as per Fermin's
100       suggestions.   This is unrelated to all other changes.
101    
102    6. X86 back end
103    
104         I've changed andl to testl in the floating point test sequence
105         whenever appropriate.  The Intel optimization guide states that
106         testl is perferable to andl.
107    
108    7. RA (x86 only)
109    
110         I've improved the spill propagation algorithm, using an approximation
111       of maximal weighted independent sets.   This seems to be necessary to
112       alleviate the negative effect in light of the slow down in (1).
113    
114         I'll write down the algorithm one of these days.
115    
116    8. MLRiscGen: frequencies
117    
118         I've added an annotation that states that all call gc blocks have zero
119       execution frequencies.  This improves register allocation on the x86.
120    
121    BENCHMARKS
122    ==========
123    
124       I've only perform the comparison on 110.25.
125    
126       The platforms are:
127    
128        HPPA  A four processor HP machine (E9000) with 5G of memory.
129        X86   A 300Hhz Pentium II with 128M of memory, and
130        SPARC An Ultra sparc 2 with 512M of memory.
131    
132       I used the following parameters for the SML benchmarks:
133    
134                 @SMLalloc
135         HPPA    256k
136         SPARC   512k
137         X86     256k
138    
139    COMPILATION TIME
140    ----------------
141       Here are the numbers comparing the compilation times of the compilers.
142       I've only compared 110.25 compiling the new sources versus
143       a fixpoint version of the new compiler compiling the same.
144    
145                     110.25                                  New
146               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
147         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
148         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
149         X86    999s    315s       94006+130691   987s    296s    108877+141957
150    
151                   110.25         New
152                Code Size      Code Size
153         HPPA   8596736         8561421
154         SPARC  8974299         8785143
155         X86    9029180         8716783
156    
157       So in summary, things are at least as good as before.   Dramatic
158       reduction in compilation is obtained on the Sparc; I can't explain it,
159       but it is reproducible.  Perhaps someone should try to reproduce this
160       on their own machines.
161    
162    SML BENCHMARKS
163    --------------
164    
165        On the average, all benchmarks perform at least as well as before.
166    
167          HPPA         Compilation Time     Spill+Reload      Run Time
168                     110.25  New            110.25    New   110.25  New
169    
170          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
171              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
172       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
173                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
174        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
175             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
176               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
177              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
178         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
179             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
180            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
181      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
182                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
183             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
184                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
185               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
186      --------------------------------------------------------------------------
187       Average                     6.12%                                   4.09%
188    
189          SPARC        Compilation Time     Spill+Reload      Run Time
190                     110.25  New            110.25    New   110.25  New
191    
192          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
193              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
194       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
195                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
196        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
197             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
198               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
199              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
200         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
201             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
202            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
203      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
204                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
205             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
206                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
207               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
208      ----------------------------------------------------------------------------
209       Average                     6.94%                                    2.64%
210    
211          X86          Compilation Time     Spill+Reload      Run Time
212                     110.25  New            110.25    New   110.25  New
213    
214          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
215              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
216       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
217                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
218        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
219             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
220               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
221              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
222         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
223             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
224            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
225      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
226                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
227             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
228                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
229               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
230      ----------------------------------------------------------------------------
231       Average                     1.22%                                     3.36%
232    
233    ----------------------------------------------------------------------
234    Name: Allen Leung
235    Date: 2000/03/23 16:25:00
236    Tag: leunga-20000323-fix_x86_alpha
237    Description:
238    
239    1. X86 fixes/changes
240    
241       a.  The old code generated for SETcc was completely wrong.
242           The Intel optimization guide is VERY misleading.
243    
244    2. ALPHA fixes/changes
245    
246       a.  Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
247       b.  Added a new mode byteWordLoadStores to the functor parameter to Alpha()
248       c.  Added reassociation code for address computation.
249    
250    ----------------------------------------------------------------------
251    Name: Allen Leung
252    Date: 2000/03/22 01:23:00
253    Tag: leunga-20000322-fix_x86_hppa_ra
254    Description:
255    
256    1. X86 fixes/changes
257    
258       a.  x86Rewrite bug with MUL3 (found by Lal)
259       b.  Added the instructions FSTS, FSTL
260    
261    2. PA-RISC fixes/changes
262    
263       a.  B label should not be a delay slot candidate!  Why did this work?
264       b.  ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
265           as it should be.
266       c.  The assembly syntax for fstds and fstdd was wrong.
267       d.  Added the composite instruction COMICLR/LDO, which is the immediate
268           operand variant of COMCLR/LDO.
269    
270    3. Generic MLRISC
271    
272       a.  shuffle.sml rewritten to be slightly more efficient
273       b.  DIV bug in mltree-simplify fixed (found by Fermin)
274    
275    4. Register Allocator
276    
277       a.  I now release the interference graph earlier during spilling.
278           May improve memory usage.
279    
280    ----------------------------------------------------------------------
281    Name: Matthias Blume
282    Date: 2000/03/14 14:15:32
283    Tag: blume_main_v110p26p1_2
284    Description:
285    
286    1. Tools.registerStdShellCmdTool (from smlnj/cm/tool.cm) takes an
287    additional argument called "template" which is an optional string that
288    specifiel the layout of the tool command line.  See the CM manual for
289    explanation.
290    
291    2. A special-purpose tool can be "regisitered" by simply dropping the
292    corresponding <...>-tool.cm (and/or <...>-ext.cm) into the same
293    directory where the .cm file lives that uses this tool.  (The
294    behavior/misfeature until now was to look for the tool description
295    files in the current working directory.)  As before, tool description
296    files could also be anchored -- in which case they can live anywhere
297    they like.  Following the recent e-mail discussion, this change should
298    make it easier to have special-purpose tools that are shipped together
299    with the sources of the program that uses them.
300    
301    ----------------------------------------------------------------------
302    Name: Matthias Blume
303    Date: 2000/03/10 07:48:34
304    Tag: blume_main_v110p26p1_1
305    Description:
306    
307    I added a re-written version of Dave's fixpt script to src/system.
308    Changes relative to the original version:
309      - sh-ified (not everybody has ksh)
310      - automatically figures out which architecture it runs on
311      - uses ./makeml a bit more cleverly
312      - never invokes ./installml (and, thus, does not clobber your
313        good and working installation of sml in case something goes wrong)
314      - accepts max iteration count using option "-iter <n>"
315      - accepts a "base" name using option "-base <base>"
316    
317    It does not build any extraneous heap images but directly rebuilds
318    bin- and boot-hierarchies using makeml's "-rebuild" switch. Finally,
319    it can incorporate existing bin- and boot- hierarchies.  For example,
320    suppose the base is set to "sml" (which is the default).  Then it
321    successively builds
322    
323            sml.bin.<arch>-unix and sml.boot.<arch>-unix
324    then    sml1.bin.<arch>-unix and sml1.boot.<arch>-unix
325    then    sml2.bin.<arch>-unix and sml2.boot.<arch>-unix
326    ...
327    then    sml<n>.bin.<arch>-unix and sml<n>.boot.<arch>-unix
328    
329    and so on.  If any of these already exist, it will just use what's
330    there.  In particular, many people will have the initial set of bin
331    and boot files around, so this saves time for at least one full
332    rebuild.  Having sets of the form <base><k>.{bin,boot}.<arch>-unix for
333    <k>=1,2,... is normally not a good idea when invoking fixpt.  However,
334    they might be the result of an earlier partial run of fixpt (which
335    perhaps got accidentially killed).  In this case, fixpt will quickly
336    move through what exists before continuing where it left off earlier,
337    and, thus, saves a lot of time.
338    
339    ----------------------------------------------------------------------
340    Name: Allen Leung
341    Date: 00/03/10 02:20:00
342    Tag: leunga-20000310-fix_x86_asm_ra
343    Description:
344    
345    More assembly output problems involving the indexed addressing mode
346    on the x86 have been found and corrected. Thanks to Fermin Reig for the
347    fix.
348    
349    The interface and implementation of the register allocator have been changed
350    slightly to accommodate the possibility to skip the register allocation
351    phases completely and go directly to memory allocation.  This is needed
352    for C-- use.
353    
354    ----------------------------------------------------------------------
355  Name: Matthias Blume  Name: Matthias Blume
356  Date: 00/03/09 10:23:53  Date: 00/03/09 10:23:53
357  Tag: blume_main_v110p26p1_0  Tag: blume_main_v110p26p1_0

Legend:
Removed from v.575  
changed lines
  Added in v.585

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0