Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 578, Tue Mar 14 05:16:29 2000 UTC revision 589, Fri Mar 31 16:14:36 2000 UTC
# Line 11  Line 11 
11  Date:  Date:
12  Tag: <post-commit CVS tag>  Tag: <post-commit CVS tag>
13  Description:  Description:
14    
15    ----------------------------------------------------------------------
16    Name: David MacQueen
17    Date: 2000/03/31 11:15:00 EST
18    Tag: dbm-20000331-runtime_fix
19    Description:
20    
21    This update contains:
22    
23    1. runtime/c-lib/c-libraries.c
24       includes added in revision 1.2 caused compilation errors on hppa-hpux
25    
26    2. fix for bug 1556
27       system/Basis/Implementation/NJ/internal-signals.sml
28    
29    ----------------------------------------------------------------------
30    Name: Matthias Blume
31    Date: 2000/03/31 18:00:00 JST
32    Tag: blume_main_v110p26p2_1
33    Description:
34    
35    This update contains:
36    
37    1. A small change to CM's handling of stable libraries:
38       CM now maintains one "global" modmap that is used for all stable
39       libraries.  The use of such a global modmap maximizes sharing and
40       minimizes the need for re-traversing parts of environments during
41       modmap construction.  (However, this has minor impact since modmap
42       construction seems to account for just one percent or less of total
43       compile time.)
44    
45    2. I added a "genmap" phase to the statistics.  This is where I got the
46       "one percent" number (see above).
47    
48    3. CM's new tool parameter mechanism just became _even_ better. :)
49       - The parser understands named parameters and recursive options.
50       - The "make" and "shell" tools use these new features.
51         (This makes it a lot easier to cascade these tools.)
52       - There is a small syntax change: named parameters use a
53    
54           <name> : ( <option> ... )            or
55           <name> : <string>
56    
57         syntax.  Previously, named parameters were implemented in an
58         ad-hoc fashion by each tool individually (by parsing strings)
59         and had the form
60    
61           <name>=<string>
62    
63       See the CM manual for a full description of these issues.
64    
65    ----------------------------------------------------------------------
66    Name: Matthias Blume
67    Date: 2000/03/30 18:00:00 JST
68    Tag: blume_main_v110p26p2_0
69    Description:
70    
71    !!!!! WARNING !!!!!!
72    !!  New binfiles  !!
73    !!!!!!!!!!!!!!!!!!!!
74    
75    This update contains:
76    
77    1. Moderate changes to CM:
78    
79       - Changes to CM's tools mechanism.  In particular, it is now possible
80       to have tools that accept additional "command line" parameters
81       (specified in the .cm file at each instance where the tool's class is
82       used).
83    
84       This was done to accomodate the new "make" and "shell" tools which
85       facilitate fairly seemless hookup to portions of code managed using
86       Makefiles or Shell scripts.
87    
88       There are no classes "shared" or "private" anymore.  Instead, the
89       sharing annotation is now a parameter to the "sml" class.
90    
91       There is a bit of generic machinery for implementing one's own
92       tools that accept command-line parameters.  However, I am not yet fully
93       satisfied with that part, so expect changes here in the future.
94    
95       All existing tools are described in the CM manual.
96    
97       - Slightly better error handling.  (CM now surpresses many followup
98       error messages that tended to be more annoying than helpful.)
99    
100    2. Major changes to the compiler's static environment data structures.
101    
102       - no CMStaticEnv anymore.
103            - no CMEnv, no "BareEnvironment" (actually, _only_ BareEnvironment,
104              but it is called Environment), no conversions between different
105              kinds of static environments
106    
107       - There is still a notion of a "modmap", but such modmaps are generated
108         on demand at the time when they are needed.  This sounds slow, but I
109         sped up the code that generates modmaps enough for this not to lead to
110         a slowdown of the compiler (at least I didn't detect any).
111    
112       - To facilitate rapid modmap generation, static environments now
113         contain an (optional) "modtree" structure.  Modtree annotations are
114         constructed by the unpickler during unpickling.  (This means that
115         the elaborator does not have to worry about modtrees at all.)
116         Modtrees have the advantage that they are compositional in the same
117         way as the environment data structure itself is compositional.
118         As a result, modtrees never hang on to parts of an environment that
119         has already been rendered "stale" by filtering or rebinding.
120    
121       - I went through many, many trials and errors before arriving at the
122         current solution.  (The initial idea of "linkpaths" did not work.)
123         But the result of all this is that I have touched a lot of files that
124         depend on the "modules" and "types" data structures (most of the
125         elaborator). There were a lot of changes during my "linkpath" trials
126         that could have been reverted to their original state but weren't.
127         Please, don't be too harsh on me for messing with this code a bit more
128         than what was strictly necessary...  (I _did_ resist the tempation
129         of doing any "global reformatting" to avoid an untimely death at
130         Dave's hands. :)
131    
132       - One positive aspect of the previous point:  At least I made sure that
133         all files that I touched now compile without warnings (other than
134         "polyEqual").
135    
136       - compiler now tends to run "leaner" (i.e., ties up less memory in
137         redundant modmaps)
138    
139    ----------------------------------------------------------------------
140    Name: Allen Leung
141    Date: 2000/03/29 18:00:00
142    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
143    Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz
144    Description:
145    
146       This update contains *MAJOR* changes to the way code is generated from CPS
147    in the module mlriscGen, and in various backend modules.
148    
149    CHANGES
150    =======
151    
152    1. MLRiscGen: forward propagation fix.
153    
154       There was a bug in forward propagation introduced at about the same time
155       as the MLRISC x86 backend, which prohibits coalescing to be
156       performed effectively in loops.
157    
158       Effect: speed up of loops in RISC architectures.
159               By itself, this actually slowed down certain benchmarks on the x86.
160    
161    2. MLRiscGen:  forward propagating addresses from consing.
162    
163       I've changed the way consing code is generated.  Basically I separated
164       out the initialization part:
165    
166            store tag,   offset(allocptr)
167            store elem1, offset+4(allocptr)
168            store elem2, offset+8(allocptr)
169            ...
170            store elemn, offset+4n(allocptr)
171    
172       and the address computation part:
173    
174            celladdr <- offset+4+alloctpr
175    
176       and move the address computation part
177    
178       Effect:  register pressure is generally lower as a result.  This
179                makes compilation of certain expressions much faster, such as
180                long lists with non-trivial elements.
181    
182                 [(0,0), (0,0), .... (0,0)]
183    
184    3. MLRiscGen: base pointer elimination.
185    
186        As part of the linkage mechanism, we generate the sequence:
187    
188         L:  ...  <- start of the code fragment
189    
190         L1:
191             base pointer <- linkreg - L1 + L
192    
193         The base pointer was then used for computing relocatable addresses
194       in the code fragment.  Frequently (such as in lots of continuations)
195       this is not needed.  We now eliminate this sequence whenever possible.
196    
197         For compile time efficiency, I'm using a very stupid local heuristic.
198       But in general, this should be done as a control flow analysis.
199    
200       Effect:  Smaller code size.  Speed up of most programs.
201    
202    4. Hppa back end
203    
204         Long jumps in span dependence resolution used to depend on the existence
205      of the base pointer.
206    
207         A jump to a long label L was expanded into the following sequence:
208    
209          LDIL %hi(L-8192), %r29
210          LDO  %lo(L-8192)(%r29), %r29
211          ADD  %r29, baseptr, %r29
212          BV,n %r0(%r29)
213    
214         In the presence of change (3) above, this will not work.  I've changed
215       it so that the following sequence of instructions are generated, which
216       doesn't mention the base pointer at all:
217    
218             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
219        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
220             BV,n  %r0(%r29)          /* Jump */
221    
222    5. Alpha back end
223    
224          New alpha instructions LDB/LDW have been added, as per Fermin's
225       suggestions.   This is unrelated to all other changes.
226    
227    6. X86 back end
228    
229         I've changed andl to testl in the floating point test sequence
230         whenever appropriate.  The Intel optimization guide states that
231         testl is perferable to andl.
232    
233    7. RA (x86 only)
234    
235         I've improved the spill propagation algorithm, using an approximation
236       of maximal weighted independent sets.   This seems to be necessary to
237       alleviate the negative effect in light of the slow down in (1).
238    
239         I'll write down the algorithm one of these days.
240    
241    8. MLRiscGen: frequencies
242    
243         I've added an annotation that states that all call gc blocks have zero
244       execution frequencies.  This improves register allocation on the x86.
245    
246    BENCHMARKS
247    ==========
248    
249       I've only perform the comparison on 110.25.
250    
251       The platforms are:
252    
253        HPPA  A four processor HP machine (E9000) with 5G of memory.
254        X86   A 300Hhz Pentium II with 128M of memory, and
255        SPARC An Ultra sparc 2 with 512M of memory.
256    
257       I used the following parameters for the SML benchmarks:
258    
259                 @SMLalloc
260         HPPA    256k
261         SPARC   512k
262         X86     256k
263    
264    COMPILATION TIME
265    ----------------
266       Here are the numbers comparing the compilation times of the compilers.
267       I've only compared 110.25 compiling the new sources versus
268       a fixpoint version of the new compiler compiling the same.
269    
270                     110.25                                  New
271               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
272         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
273         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
274         X86    999s    315s       94006+130691   987s    296s    108877+141957
275    
276                   110.25         New
277                Code Size      Code Size
278         HPPA   8596736         8561421
279         SPARC  8974299         8785143
280         X86    9029180         8716783
281    
282       So in summary, things are at least as good as before.   Dramatic
283       reduction in compilation is obtained on the Sparc; I can't explain it,
284       but it is reproducible.  Perhaps someone should try to reproduce this
285       on their own machines.
286    
287    SML BENCHMARKS
288    --------------
289    
290        On the average, all benchmarks perform at least as well as before.
291    
292          HPPA         Compilation Time     Spill+Reload      Run Time
293                     110.25  New            110.25    New   110.25  New
294    
295          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
296              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
297       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
298                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
299        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
300             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
301               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
302              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
303         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
304             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
305            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
306      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
307                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
308             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
309                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
310               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
311      --------------------------------------------------------------------------
312       Average                     6.12%                                   4.09%
313    
314          SPARC        Compilation Time     Spill+Reload      Run Time
315                     110.25  New            110.25    New   110.25  New
316    
317          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
318              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
319       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
320                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
321        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
322             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
323               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
324              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
325         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
326             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
327            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
328      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
329                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
330             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
331                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
332               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
333      ----------------------------------------------------------------------------
334       Average                     6.94%                                    2.64%
335    
336          X86          Compilation Time     Spill+Reload      Run Time
337                     110.25  New            110.25    New   110.25  New
338    
339          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
340              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
341       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
342                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
343        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
344             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
345               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
346              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
347         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
348             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
349            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
350      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
351                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
352             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
353                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
354               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
355      ----------------------------------------------------------------------------
356       Average                     1.22%                                     3.36%
357    
358    ----------------------------------------------------------------------
359    Name: Allen Leung
360    Date: 2000/03/23 16:25:00
361    Tag: leunga-20000323-fix_x86_alpha
362    Description:
363    
364    1. X86 fixes/changes
365    
366       a.  The old code generated for SETcc was completely wrong.
367           The Intel optimization guide is VERY misleading.
368    
369    2. ALPHA fixes/changes
370    
371       a.  Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
372       b.  Added a new mode byteWordLoadStores to the functor parameter to Alpha()
373       c.  Added reassociation code for address computation.
374    
375    ----------------------------------------------------------------------
376    Name: Allen Leung
377    Date: 2000/03/22 01:23:00
378    Tag: leunga-20000322-fix_x86_hppa_ra
379    Description:
380    
381    1. X86 fixes/changes
382    
383       a.  x86Rewrite bug with MUL3 (found by Lal)
384       b.  Added the instructions FSTS, FSTL
385    
386    2. PA-RISC fixes/changes
387    
388       a.  B label should not be a delay slot candidate!  Why did this work?
389       b.  ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
390           as it should be.
391       c.  The assembly syntax for fstds and fstdd was wrong.
392       d.  Added the composite instruction COMICLR/LDO, which is the immediate
393           operand variant of COMCLR/LDO.
394    
395    3. Generic MLRISC
396    
397       a.  shuffle.sml rewritten to be slightly more efficient
398       b.  DIV bug in mltree-simplify fixed (found by Fermin)
399    
400    4. Register Allocator
401    
402       a.  I now release the interference graph earlier during spilling.
403           May improve memory usage.
404    
405  ----------------------------------------------------------------------  ----------------------------------------------------------------------
406  Name: Matthias Blume  Name: Matthias Blume
407  Date: 2000/03/14 14:15:32  Date: 2000/03/14 14:15:32

Legend:
Removed from v.578  
changed lines
  Added in v.589

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0