Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 578, Tue Mar 14 05:16:29 2000 UTC revision 590, Sat Apr 1 02:24:08 2000 UTC
# Line 11  Line 11 
11  Date:  Date:
12  Tag: <post-commit CVS tag>  Tag: <post-commit CVS tag>
13  Description:  Description:
14    
15    ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2000/03/31 21:15:00 EST
18    Tag: leunga-20000331-aliasing
19    Description:
20    
21    This update contains a rewritten (and hopefully more correct) module
22    for extracting aliasing information from CPS.
23    
24       To turn on this feature:
25    
26            Compiler.Control.CG.memDisambiguate := true
27    
28       To pretty print the region information with assembly
29    
30           Compiler.Control.MLRISC.getFlag "asm-show-region" := true;
31    
32       To control how many levels of aliasing information are printed, use:
33    
34           Compiler.Control.MLRISC.getInt "points-to-show-level" := n
35    
36       The default of n is 3.
37    
38    ----------------------------------------------------------------------
39    Name: David MacQueen
40    Date: 2000/03/31 11:15:00 EST
41    Tag: dbm-20000331-runtime_fix
42    Description:
43    
44    This update contains:
45    
46    1. runtime/c-lib/c-libraries.c
47       includes added in revision 1.2 caused compilation errors on hppa-hpux
48    
49    2. fix for bug 1556
50       system/Basis/Implementation/NJ/internal-signals.sml
51    
52    ----------------------------------------------------------------------
53    Name: Matthias Blume
54    Date: 2000/03/31 18:00:00 JST
55    Tag: blume_main_v110p26p2_1
56    Description:
57    
58    This update contains:
59    
60    1. A small change to CM's handling of stable libraries:
61       CM now maintains one "global" modmap that is used for all stable
62       libraries.  The use of such a global modmap maximizes sharing and
63       minimizes the need for re-traversing parts of environments during
64       modmap construction.  (However, this has minor impact since modmap
65       construction seems to account for just one percent or less of total
66       compile time.)
67    
68    2. I added a "genmap" phase to the statistics.  This is where I got the
69       "one percent" number (see above).
70    
71    3. CM's new tool parameter mechanism just became _even_ better. :)
72       - The parser understands named parameters and recursive options.
73       - The "make" and "shell" tools use these new features.
74         (This makes it a lot easier to cascade these tools.)
75       - There is a small syntax change: named parameters use a
76    
77           <name> : ( <option> ... )            or
78           <name> : <string>
79    
80         syntax.  Previously, named parameters were implemented in an
81         ad-hoc fashion by each tool individually (by parsing strings)
82         and had the form
83    
84           <name>=<string>
85    
86       See the CM manual for a full description of these issues.
87    
88    ----------------------------------------------------------------------
89    Name: Matthias Blume
90    Date: 2000/03/30 18:00:00 JST
91    Tag: blume_main_v110p26p2_0
92    Description:
93    
94    !!!!! WARNING !!!!!!
95    !!  New binfiles  !!
96    !!!!!!!!!!!!!!!!!!!!
97    
98    This update contains:
99    
100    1. Moderate changes to CM:
101    
102       - Changes to CM's tools mechanism.  In particular, it is now possible
103       to have tools that accept additional "command line" parameters
104       (specified in the .cm file at each instance where the tool's class is
105       used).
106    
107       This was done to accomodate the new "make" and "shell" tools which
108       facilitate fairly seemless hookup to portions of code managed using
109       Makefiles or Shell scripts.
110    
111       There are no classes "shared" or "private" anymore.  Instead, the
112       sharing annotation is now a parameter to the "sml" class.
113    
114       There is a bit of generic machinery for implementing one's own
115       tools that accept command-line parameters.  However, I am not yet fully
116       satisfied with that part, so expect changes here in the future.
117    
118       All existing tools are described in the CM manual.
119    
120       - Slightly better error handling.  (CM now surpresses many followup
121       error messages that tended to be more annoying than helpful.)
122    
123    2. Major changes to the compiler's static environment data structures.
124    
125       - no CMStaticEnv anymore.
126            - no CMEnv, no "BareEnvironment" (actually, _only_ BareEnvironment,
127              but it is called Environment), no conversions between different
128              kinds of static environments
129    
130       - There is still a notion of a "modmap", but such modmaps are generated
131         on demand at the time when they are needed.  This sounds slow, but I
132         sped up the code that generates modmaps enough for this not to lead to
133         a slowdown of the compiler (at least I didn't detect any).
134    
135       - To facilitate rapid modmap generation, static environments now
136         contain an (optional) "modtree" structure.  Modtree annotations are
137         constructed by the unpickler during unpickling.  (This means that
138         the elaborator does not have to worry about modtrees at all.)
139         Modtrees have the advantage that they are compositional in the same
140         way as the environment data structure itself is compositional.
141         As a result, modtrees never hang on to parts of an environment that
142         has already been rendered "stale" by filtering or rebinding.
143    
144       - I went through many, many trials and errors before arriving at the
145         current solution.  (The initial idea of "linkpaths" did not work.)
146         But the result of all this is that I have touched a lot of files that
147         depend on the "modules" and "types" data structures (most of the
148         elaborator). There were a lot of changes during my "linkpath" trials
149         that could have been reverted to their original state but weren't.
150         Please, don't be too harsh on me for messing with this code a bit more
151         than what was strictly necessary...  (I _did_ resist the tempation
152         of doing any "global reformatting" to avoid an untimely death at
153         Dave's hands. :)
154    
155       - One positive aspect of the previous point:  At least I made sure that
156         all files that I touched now compile without warnings (other than
157         "polyEqual").
158    
159       - compiler now tends to run "leaner" (i.e., ties up less memory in
160         redundant modmaps)
161    
162    ----------------------------------------------------------------------
163    Name: Allen Leung
164    Date: 2000/03/29 18:00:00
165    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
166    Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz
167    Description:
168    
169       This update contains *MAJOR* changes to the way code is generated from CPS
170    in the module mlriscGen, and in various backend modules.
171    
172    CHANGES
173    =======
174    
175    1. MLRiscGen: forward propagation fix.
176    
177       There was a bug in forward propagation introduced at about the same time
178       as the MLRISC x86 backend, which prohibits coalescing to be
179       performed effectively in loops.
180    
181       Effect: speed up of loops in RISC architectures.
182               By itself, this actually slowed down certain benchmarks on the x86.
183    
184    2. MLRiscGen:  forward propagating addresses from consing.
185    
186       I've changed the way consing code is generated.  Basically I separated
187       out the initialization part:
188    
189            store tag,   offset(allocptr)
190            store elem1, offset+4(allocptr)
191            store elem2, offset+8(allocptr)
192            ...
193            store elemn, offset+4n(allocptr)
194    
195       and the address computation part:
196    
197            celladdr <- offset+4+alloctpr
198    
199       and move the address computation part
200    
201       Effect:  register pressure is generally lower as a result.  This
202                makes compilation of certain expressions much faster, such as
203                long lists with non-trivial elements.
204    
205                 [(0,0), (0,0), .... (0,0)]
206    
207    3. MLRiscGen: base pointer elimination.
208    
209        As part of the linkage mechanism, we generate the sequence:
210    
211         L:  ...  <- start of the code fragment
212    
213         L1:
214             base pointer <- linkreg - L1 + L
215    
216         The base pointer was then used for computing relocatable addresses
217       in the code fragment.  Frequently (such as in lots of continuations)
218       this is not needed.  We now eliminate this sequence whenever possible.
219    
220         For compile time efficiency, I'm using a very stupid local heuristic.
221       But in general, this should be done as a control flow analysis.
222    
223       Effect:  Smaller code size.  Speed up of most programs.
224    
225    4. Hppa back end
226    
227         Long jumps in span dependence resolution used to depend on the existence
228      of the base pointer.
229    
230         A jump to a long label L was expanded into the following sequence:
231    
232          LDIL %hi(L-8192), %r29
233          LDO  %lo(L-8192)(%r29), %r29
234          ADD  %r29, baseptr, %r29
235          BV,n %r0(%r29)
236    
237         In the presence of change (3) above, this will not work.  I've changed
238       it so that the following sequence of instructions are generated, which
239       doesn't mention the base pointer at all:
240    
241             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
242        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
243             BV,n  %r0(%r29)          /* Jump */
244    
245    5. Alpha back end
246    
247          New alpha instructions LDB/LDW have been added, as per Fermin's
248       suggestions.   This is unrelated to all other changes.
249    
250    6. X86 back end
251    
252         I've changed andl to testl in the floating point test sequence
253         whenever appropriate.  The Intel optimization guide states that
254         testl is perferable to andl.
255    
256    7. RA (x86 only)
257    
258         I've improved the spill propagation algorithm, using an approximation
259       of maximal weighted independent sets.   This seems to be necessary to
260       alleviate the negative effect in light of the slow down in (1).
261    
262         I'll write down the algorithm one of these days.
263    
264    8. MLRiscGen: frequencies
265    
266         I've added an annotation that states that all call gc blocks have zero
267       execution frequencies.  This improves register allocation on the x86.
268    
269    BENCHMARKS
270    ==========
271    
272       I've only perform the comparison on 110.25.
273    
274       The platforms are:
275    
276        HPPA  A four processor HP machine (E9000) with 5G of memory.
277        X86   A 300Hhz Pentium II with 128M of memory, and
278        SPARC An Ultra sparc 2 with 512M of memory.
279    
280       I used the following parameters for the SML benchmarks:
281    
282                 @SMLalloc
283         HPPA    256k
284         SPARC   512k
285         X86     256k
286    
287    COMPILATION TIME
288    ----------------
289       Here are the numbers comparing the compilation times of the compilers.
290       I've only compared 110.25 compiling the new sources versus
291       a fixpoint version of the new compiler compiling the same.
292    
293                     110.25                                  New
294               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
295         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
296         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
297         X86    999s    315s       94006+130691   987s    296s    108877+141957
298    
299                   110.25         New
300                Code Size      Code Size
301         HPPA   8596736         8561421
302         SPARC  8974299         8785143
303         X86    9029180         8716783
304    
305       So in summary, things are at least as good as before.   Dramatic
306       reduction in compilation is obtained on the Sparc; I can't explain it,
307       but it is reproducible.  Perhaps someone should try to reproduce this
308       on their own machines.
309    
310    SML BENCHMARKS
311    --------------
312    
313        On the average, all benchmarks perform at least as well as before.
314    
315          HPPA         Compilation Time     Spill+Reload      Run Time
316                     110.25  New            110.25    New   110.25  New
317    
318          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
319              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
320       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
321                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
322        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
323             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
324               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
325              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
326         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
327             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
328            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
329      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
330                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
331             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
332                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
333               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
334      --------------------------------------------------------------------------
335       Average                     6.12%                                   4.09%
336    
337          SPARC        Compilation Time     Spill+Reload      Run Time
338                     110.25  New            110.25    New   110.25  New
339    
340          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
341              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
342       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
343                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
344        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
345             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
346               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
347              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
348         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
349             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
350            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
351      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
352                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
353             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
354                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
355               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
356      ----------------------------------------------------------------------------
357       Average                     6.94%                                    2.64%
358    
359          X86          Compilation Time     Spill+Reload      Run Time
360                     110.25  New            110.25    New   110.25  New
361    
362          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
363              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
364       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
365                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
366        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
367             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
368               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
369              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
370         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
371             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
372            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
373      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
374                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
375             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
376                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
377               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
378      ----------------------------------------------------------------------------
379       Average                     1.22%                                     3.36%
380    
381    ----------------------------------------------------------------------
382    Name: Allen Leung
383    Date: 2000/03/23 16:25:00
384    Tag: leunga-20000323-fix_x86_alpha
385    Description:
386    
387    1. X86 fixes/changes
388    
389       a.  The old code generated for SETcc was completely wrong.
390           The Intel optimization guide is VERY misleading.
391    
392    2. ALPHA fixes/changes
393    
394       a.  Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
395       b.  Added a new mode byteWordLoadStores to the functor parameter to Alpha()
396       c.  Added reassociation code for address computation.
397    
398    ----------------------------------------------------------------------
399    Name: Allen Leung
400    Date: 2000/03/22 01:23:00
401    Tag: leunga-20000322-fix_x86_hppa_ra
402    Description:
403    
404    1. X86 fixes/changes
405    
406       a.  x86Rewrite bug with MUL3 (found by Lal)
407       b.  Added the instructions FSTS, FSTL
408    
409    2. PA-RISC fixes/changes
410    
411       a.  B label should not be a delay slot candidate!  Why did this work?
412       b.  ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
413           as it should be.
414       c.  The assembly syntax for fstds and fstdd was wrong.
415       d.  Added the composite instruction COMICLR/LDO, which is the immediate
416           operand variant of COMCLR/LDO.
417    
418    3. Generic MLRISC
419    
420       a.  shuffle.sml rewritten to be slightly more efficient
421       b.  DIV bug in mltree-simplify fixed (found by Fermin)
422    
423    4. Register Allocator
424    
425       a.  I now release the interference graph earlier during spilling.
426           May improve memory usage.
427    
428  ----------------------------------------------------------------------  ----------------------------------------------------------------------
429  Name: Matthias Blume  Name: Matthias Blume
430  Date: 2000/03/14 14:15:32  Date: 2000/03/14 14:15:32

Legend:
Removed from v.578  
changed lines
  Added in v.590

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0