Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 572, Thu Mar 9 02:43:06 2000 UTC revision 586, Thu Mar 30 05:08:07 2000 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2000/03/29 18:00:00
18    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
19    Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz
20    Description:
21    
22       This update contains *MAJOR* changes to the way code is generated from CPS
23    in the module mlriscGen, and in various backend modules.
24    
25    CHANGES
26    =======
27    
28    1. MLRiscGen: forward propagation fix.
29    
30       There was a bug in forward propagation introduced at about the same time
31       as the MLRISC x86 backend, which prohibits coalescing to be
32       performed effectively in loops.
33    
34       Effect: speed up of loops in RISC architectures.
35               By itself, this actually slowed down certain benchmarks on the x86.
36    
37    2. MLRiscGen:  forward propagating addresses from consing.
38    
39       I've changed the way consing code is generated.  Basically I separated
40       out the initialization part:
41    
42            store tag,   offset(allocptr)
43            store elem1, offset+4(allocptr)
44            store elem2, offset+8(allocptr)
45            ...
46            store elemn, offset+4n(allocptr)
47    
48       and the address computation part:
49    
50            celladdr <- offset+4+alloctpr
51    
52       and move the address computation part
53    
54       Effect:  register pressure is generally lower as a result.  This
55                makes compilation of certain expressions much faster, such as
56                long lists with non-trivial elements.
57    
58                 [(0,0), (0,0), .... (0,0)]
59    
60    3. MLRiscGen: base pointer elimination.
61    
62        As part of the linkage mechanism, we generate the sequence:
63    
64         L:  ...  <- start of the code fragment
65    
66         L1:
67             base pointer <- linkreg - L1 + L
68    
69         The base pointer was then used for computing relocatable addresses
70       in the code fragment.  Frequently (such as in lots of continuations)
71       this is not needed.  We now eliminate this sequence whenever possible.
72    
73         For compile time efficiency, I'm using a very stupid local heuristic.
74       But in general, this should be done as a control flow analysis.
75    
76       Effect:  Smaller code size.  Speed up of most programs.
77    
78    4. Hppa back end
79    
80         Long jumps in span dependence resolution used to depend on the existence
81      of the base pointer.
82    
83         A jump to a long label L was expanded into the following sequence:
84    
85          LDIL %hi(L-8192), %r29
86          LDO  %lo(L-8192)(%r29), %r29
87          ADD  %r29, baseptr, %r29
88          BV,n %r0(%r29)
89    
90         In the presence of change (3) above, this will not work.  I've changed
91       it so that the following sequence of instructions are generated, which
92       doesn't mention the base pointer at all:
93    
94             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
95        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
96             BV,n  %r0(%r29)          /* Jump */
97    
98    5. Alpha back end
99    
100          New alpha instructions LDB/LDW have been added, as per Fermin's
101       suggestions.   This is unrelated to all other changes.
102    
103    6. X86 back end
104    
105         I've changed andl to testl in the floating point test sequence
106         whenever appropriate.  The Intel optimization guide states that
107         testl is perferable to andl.
108    
109    7. RA (x86 only)
110    
111         I've improved the spill propagation algorithm, using an approximation
112       of maximal weighted independent sets.   This seems to be necessary to
113       alleviate the negative effect in light of the slow down in (1).
114    
115         I'll write down the algorithm one of these days.
116    
117    8. MLRiscGen: frequencies
118    
119         I've added an annotation that states that all call gc blocks have zero
120       execution frequencies.  This improves register allocation on the x86.
121    
122    BENCHMARKS
123    ==========
124    
125       I've only perform the comparison on 110.25.
126    
127       The platforms are:
128    
129        HPPA  A four processor HP machine (E9000) with 5G of memory.
130        X86   A 300Hhz Pentium II with 128M of memory, and
131        SPARC An Ultra sparc 2 with 512M of memory.
132    
133       I used the following parameters for the SML benchmarks:
134    
135                 @SMLalloc
136         HPPA    256k
137         SPARC   512k
138         X86     256k
139    
140    COMPILATION TIME
141    ----------------
142       Here are the numbers comparing the compilation times of the compilers.
143       I've only compared 110.25 compiling the new sources versus
144       a fixpoint version of the new compiler compiling the same.
145    
146                     110.25                                  New
147               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
148         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
149         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
150         X86    999s    315s       94006+130691   987s    296s    108877+141957
151    
152                   110.25         New
153                Code Size      Code Size
154         HPPA   8596736         8561421
155         SPARC  8974299         8785143
156         X86    9029180         8716783
157    
158       So in summary, things are at least as good as before.   Dramatic
159       reduction in compilation is obtained on the Sparc; I can't explain it,
160       but it is reproducible.  Perhaps someone should try to reproduce this
161       on their own machines.
162    
163    SML BENCHMARKS
164    --------------
165    
166        On the average, all benchmarks perform at least as well as before.
167    
168          HPPA         Compilation Time     Spill+Reload      Run Time
169                     110.25  New            110.25    New   110.25  New
170    
171          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
172              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
173       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
174                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
175        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
176             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
177               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
178              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
179         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
180             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
181            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
182      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
183                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
184             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
185                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
186               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
187      --------------------------------------------------------------------------
188       Average                     6.12%                                   4.09%
189    
190          SPARC        Compilation Time     Spill+Reload      Run Time
191                     110.25  New            110.25    New   110.25  New
192    
193          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
194              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
195       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
196                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
197        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
198             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
199               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
200              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
201         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
202             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
203            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
204      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
205                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
206             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
207                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
208               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
209      ----------------------------------------------------------------------------
210       Average                     6.94%                                    2.64%
211    
212          X86          Compilation Time     Spill+Reload      Run Time
213                     110.25  New            110.25    New   110.25  New
214    
215          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
216              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
217       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
218                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
219        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
220             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
221               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
222              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
223         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
224             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
225            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
226      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
227                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
228             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
229                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
230               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
231      ----------------------------------------------------------------------------
232       Average                     1.22%                                     3.36%
233    
234    ----------------------------------------------------------------------
235    Name: Allen Leung
236    Date: 2000/03/23 16:25:00
237    Tag: leunga-20000323-fix_x86_alpha
238    Description:
239    
240    1. X86 fixes/changes
241    
242       a.  The old code generated for SETcc was completely wrong.
243           The Intel optimization guide is VERY misleading.
244    
245    2. ALPHA fixes/changes
246    
247       a.  Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
248       b.  Added a new mode byteWordLoadStores to the functor parameter to Alpha()
249       c.  Added reassociation code for address computation.
250    
251    ----------------------------------------------------------------------
252    Name: Allen Leung
253    Date: 2000/03/22 01:23:00
254    Tag: leunga-20000322-fix_x86_hppa_ra
255    Description:
256    
257    1. X86 fixes/changes
258    
259       a.  x86Rewrite bug with MUL3 (found by Lal)
260       b.  Added the instructions FSTS, FSTL
261    
262    2. PA-RISC fixes/changes
263    
264       a.  B label should not be a delay slot candidate!  Why did this work?
265       b.  ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
266           as it should be.
267       c.  The assembly syntax for fstds and fstdd was wrong.
268       d.  Added the composite instruction COMICLR/LDO, which is the immediate
269           operand variant of COMCLR/LDO.
270    
271    3. Generic MLRISC
272    
273       a.  shuffle.sml rewritten to be slightly more efficient
274       b.  DIV bug in mltree-simplify fixed (found by Fermin)
275    
276    4. Register Allocator
277    
278       a.  I now release the interference graph earlier during spilling.
279           May improve memory usage.
280    
281    ----------------------------------------------------------------------
282    Name: Matthias Blume
283    Date: 2000/03/14 14:15:32
284    Tag: blume_main_v110p26p1_2
285    Description:
286    
287    1. Tools.registerStdShellCmdTool (from smlnj/cm/tool.cm) takes an
288    additional argument called "template" which is an optional string that
289    specifiel the layout of the tool command line.  See the CM manual for
290    explanation.
291    
292    2. A special-purpose tool can be "regisitered" by simply dropping the
293    corresponding <...>-tool.cm (and/or <...>-ext.cm) into the same
294    directory where the .cm file lives that uses this tool.  (The
295    behavior/misfeature until now was to look for the tool description
296    files in the current working directory.)  As before, tool description
297    files could also be anchored -- in which case they can live anywhere
298    they like.  Following the recent e-mail discussion, this change should
299    make it easier to have special-purpose tools that are shipped together
300    with the sources of the program that uses them.
301    
302    ----------------------------------------------------------------------
303    Name: Matthias Blume
304    Date: 2000/03/10 07:48:34
305    Tag: blume_main_v110p26p1_1
306    Description:
307    
308    I added a re-written version of Dave's fixpt script to src/system.
309    Changes relative to the original version:
310      - sh-ified (not everybody has ksh)
311      - automatically figures out which architecture it runs on
312      - uses ./makeml a bit more cleverly
313      - never invokes ./installml (and, thus, does not clobber your
314        good and working installation of sml in case something goes wrong)
315      - accepts max iteration count using option "-iter <n>"
316      - accepts a "base" name using option "-base <base>"
317    
318    It does not build any extraneous heap images but directly rebuilds
319    bin- and boot-hierarchies using makeml's "-rebuild" switch. Finally,
320    it can incorporate existing bin- and boot- hierarchies.  For example,
321    suppose the base is set to "sml" (which is the default).  Then it
322    successively builds
323    
324            sml.bin.<arch>-unix and sml.boot.<arch>-unix
325    then    sml1.bin.<arch>-unix and sml1.boot.<arch>-unix
326    then    sml2.bin.<arch>-unix and sml2.boot.<arch>-unix
327    ...
328    then    sml<n>.bin.<arch>-unix and sml<n>.boot.<arch>-unix
329    
330    and so on.  If any of these already exist, it will just use what's
331    there.  In particular, many people will have the initial set of bin
332    and boot files around, so this saves time for at least one full
333    rebuild.  Having sets of the form <base><k>.{bin,boot}.<arch>-unix for
334    <k>=1,2,... is normally not a good idea when invoking fixpt.  However,
335    they might be the result of an earlier partial run of fixpt (which
336    perhaps got accidentially killed).  In this case, fixpt will quickly
337    move through what exists before continuing where it left off earlier,
338    and, thus, saves a lot of time.
339    
340    ----------------------------------------------------------------------
341    Name: Allen Leung
342    Date: 00/03/10 02:20:00
343    Tag: leunga-20000310-fix_x86_asm_ra
344    Description:
345    
346    More assembly output problems involving the indexed addressing mode
347    on the x86 have been found and corrected. Thanks to Fermin Reig for the
348    fix.
349    
350    The interface and implementation of the register allocator have been changed
351    slightly to accommodate the possibility to skip the register allocation
352    phases completely and go directly to memory allocation.  This is needed
353    for C-- use.
354    
355    ----------------------------------------------------------------------
356    Name: Matthias Blume
357    Date: 00/03/09 10:23:53
358    Tag: blume_main_v110p26p1_0
359    Description:
360    
361    * Complete re-organization of library names.  Many libraries have been
362    consolidated so that they share the same path anchor.  For example,
363    all MLRISC-related libraries are anchored at MLRISC, most libraries that
364    are SML/NJ-specific are under "smlnj".  Notice that names like
365    host-cmb.cm or host-compiler.cm no longer exist.  See system/README
366    for a complete description of the new naming scheme.  Quick reference:
367    
368       host-cmb.cm        -> smlnj/cmb.cm
369       host-compiler.cm   -> smlnj/compiler.cm
370       full-cm.cm         -> smlnj/cm.cm
371       <arch>-<os>.cm     -> smlnj/cmb/<arch>-<os>.cm
372       <arch>-compiler.cm -> smlnj/compiler/<arch>.cm
373    
374    * Bug fixes in CM.
375        - exceptions in user code are being passed through (i.e., reach top level)
376        - more bugs in paranoia mode fixed
377        - bug related to checking group owners fixed
378    
379    * New install.sh script that automagically fetches archive files:
380      The new file config/srcarchiveurl must contain the URL of the
381      (remote) directory that contains bin files (or other source archives).
382      If install.sh does not find the archive locally, it tries to get
383      it from that remote directory.
384      This should simplify installation further:  For machines that have
385      access to the internet, just fetch <version>-config.tgz, unpack it,
386      edit config/targets, and go (run config/install.sh).  The scipt will
387      fetch everything else that it might need all by itself.
388    
389      For CVS users, this mechanism is not relevant for source archives, but
390      it is convenient for getting new sets of binfiles.
391    
392      Archives should be tar files compressed with either gzip, compress, or
393      bzip2.  The script recognizes .tgz, .tar, tar.gz, tz, .tar.Z, and .tar.bz2.
394    
395    ----------------------------------------------------------------------
396  Name: Matthias Blume  Name: Matthias Blume
397  Date: 2000/03/07 04:01:04  Date: 2000/03/07 04:01:04
398  Tag: blume_main_v110_26_2  Tag: blume_main_v110_26_2

Legend:
Removed from v.572  
changed lines
  Added in v.586

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0