Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 577, Fri Mar 10 08:07:18 2000 UTC revision 588, Fri Mar 31 09:00:02 2000 UTC
# Line 11  Line 11 
11  Date:  Date:
12  Tag: <post-commit CVS tag>  Tag: <post-commit CVS tag>
13  Description:  Description:
14    
15    ----------------------------------------------------------------------
16    Name: Matthias Blume
17    Date: 2000/03/31 18:00:00 JST
18    Tag: blume_main_v110p26p2_1
19    Description:
20    
21    This update contains:
22    
23    1. A small change to CM's handling of stable libraries:
24       CM now maintains one "global" modmap that is used for all stable
25       libraries.  The use of such a global modmap maximizes sharing and
26       minimizes the need for re-traversing parts of environments during
27       modmap construction.  (However, this has minor impact since modmap
28       construction seems to account for just one percent or less of total
29       compile time.)
30    
31    2. I added a "genmap" phase to the statistics.  This is where I got the
32       "one percent" number (see above).
33    
34    3. CM's new tool parameter mechanism just became _even_ better. :)
35       - The parser understands named parameters and recursive options.
36       - The "make" and "shell" tools use these new features.
37         (This makes it a lot easier to cascade these tools.)
38       - There is a small syntax change: named parameters use a
39    
40           <name> : ( <option> ... )            or
41           <name> : <string>
42    
43         syntax.  Previously, named parameters were implemented in an
44         ad-hoc fashion by each tool individually (by parsing strings)
45         and had the form
46    
47           <name>=<string>
48    
49       See the CM manual for a full description of these issues.
50    
51    ----------------------------------------------------------------------
52    Name: Matthias Blume
53    Date: 2000/03/30 18:00:00 JST
54    Tag: blume_main_v110p26p2_0
55    Description:
56    
57    !!!!! WARNING !!!!!!
58    !!  New binfiles  !!
59    !!!!!!!!!!!!!!!!!!!!
60    
61    This update contains:
62    
63    1. Moderate changes to CM:
64    
65       - Changes to CM's tools mechanism.  In particular, it is now possible
66       to have tools that accept additional "command line" parameters
67       (specified in the .cm file at each instance where the tool's class is
68       used).
69    
70       This was done to accomodate the new "make" and "shell" tools which
71       facilitate fairly seemless hookup to portions of code managed using
72       Makefiles or Shell scripts.
73    
74       There are no classes "shared" or "private" anymore.  Instead, the
75       sharing annotation is now a parameter to the "sml" class.
76    
77       There is a bit of generic machinery for implementing one's own
78       tools that accept command-line parameters.  However, I am not yet fully
79       satisfied with that part, so expect changes here in the future.
80    
81       All existing tools are described in the CM manual.
82    
83       - Slightly better error handling.  (CM now surpresses many followup
84       error messages that tended to be more annoying than helpful.)
85    
86    2. Major changes to the compiler's static environment data structures.
87    
88       - no CMStaticEnv anymore.
89            - no CMEnv, no "BareEnvironment" (actually, _only_ BareEnvironment,
90              but it is called Environment), no conversions between different
91              kinds of static environments
92    
93       - There is still a notion of a "modmap", but such modmaps are generated
94         on demand at the time when they are needed.  This sounds slow, but I
95         sped up the code that generates modmaps enough for this not to lead to
96         a slowdown of the compiler (at least I didn't detect any).
97    
98       - To facilitate rapid modmap generation, static environments now
99         contain an (optional) "modtree" structure.  Modtree annotations are
100         constructed by the unpickler during unpickling.  (This means that
101         the elaborator does not have to worry about modtrees at all.)
102         Modtrees have the advantage that they are compositional in the same
103         way as the environment data structure itself is compositional.
104         As a result, modtrees never hang on to parts of an environment that
105         has already been rendered "stale" by filtering or rebinding.
106    
107       - I went through many, many trials and errors before arriving at the
108         current solution.  (The initial idea of "linkpaths" did not work.)
109         But the result of all this is that I have touched a lot of files that
110         depend on the "modules" and "types" data structures (most of the
111         elaborator). There were a lot of changes during my "linkpath" trials
112         that could have been reverted to their original state but weren't.
113         Please, don't be too harsh on me for messing with this code a bit more
114         than what was strictly necessary...  (I _did_ resist the tempation
115         of doing any "global reformatting" to avoid an untimely death at
116         Dave's hands. :)
117    
118       - One positive aspect of the previous point:  At least I made sure that
119         all files that I touched now compile without warnings (other than
120         "polyEqual").
121    
122       - compiler now tends to run "leaner" (i.e., ties up less memory in
123         redundant modmaps)
124    
125    ----------------------------------------------------------------------
126    Name: Allen Leung
127    Date: 2000/03/29 18:00:00
128    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
129    Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz
130    Description:
131    
132       This update contains *MAJOR* changes to the way code is generated from CPS
133    in the module mlriscGen, and in various backend modules.
134    
135    CHANGES
136    =======
137    
138    1. MLRiscGen: forward propagation fix.
139    
140       There was a bug in forward propagation introduced at about the same time
141       as the MLRISC x86 backend, which prohibits coalescing to be
142       performed effectively in loops.
143    
144       Effect: speed up of loops in RISC architectures.
145               By itself, this actually slowed down certain benchmarks on the x86.
146    
147    2. MLRiscGen:  forward propagating addresses from consing.
148    
149       I've changed the way consing code is generated.  Basically I separated
150       out the initialization part:
151    
152            store tag,   offset(allocptr)
153            store elem1, offset+4(allocptr)
154            store elem2, offset+8(allocptr)
155            ...
156            store elemn, offset+4n(allocptr)
157    
158       and the address computation part:
159    
160            celladdr <- offset+4+alloctpr
161    
162       and move the address computation part
163    
164       Effect:  register pressure is generally lower as a result.  This
165                makes compilation of certain expressions much faster, such as
166                long lists with non-trivial elements.
167    
168                 [(0,0), (0,0), .... (0,0)]
169    
170    3. MLRiscGen: base pointer elimination.
171    
172        As part of the linkage mechanism, we generate the sequence:
173    
174         L:  ...  <- start of the code fragment
175    
176         L1:
177             base pointer <- linkreg - L1 + L
178    
179         The base pointer was then used for computing relocatable addresses
180       in the code fragment.  Frequently (such as in lots of continuations)
181       this is not needed.  We now eliminate this sequence whenever possible.
182    
183         For compile time efficiency, I'm using a very stupid local heuristic.
184       But in general, this should be done as a control flow analysis.
185    
186       Effect:  Smaller code size.  Speed up of most programs.
187    
188    4. Hppa back end
189    
190         Long jumps in span dependence resolution used to depend on the existence
191      of the base pointer.
192    
193         A jump to a long label L was expanded into the following sequence:
194    
195          LDIL %hi(L-8192), %r29
196          LDO  %lo(L-8192)(%r29), %r29
197          ADD  %r29, baseptr, %r29
198          BV,n %r0(%r29)
199    
200         In the presence of change (3) above, this will not work.  I've changed
201       it so that the following sequence of instructions are generated, which
202       doesn't mention the base pointer at all:
203    
204             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
205        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
206             BV,n  %r0(%r29)          /* Jump */
207    
208    5. Alpha back end
209    
210          New alpha instructions LDB/LDW have been added, as per Fermin's
211       suggestions.   This is unrelated to all other changes.
212    
213    6. X86 back end
214    
215         I've changed andl to testl in the floating point test sequence
216         whenever appropriate.  The Intel optimization guide states that
217         testl is perferable to andl.
218    
219    7. RA (x86 only)
220    
221         I've improved the spill propagation algorithm, using an approximation
222       of maximal weighted independent sets.   This seems to be necessary to
223       alleviate the negative effect in light of the slow down in (1).
224    
225         I'll write down the algorithm one of these days.
226    
227    8. MLRiscGen: frequencies
228    
229         I've added an annotation that states that all call gc blocks have zero
230       execution frequencies.  This improves register allocation on the x86.
231    
232    BENCHMARKS
233    ==========
234    
235       I've only perform the comparison on 110.25.
236    
237       The platforms are:
238    
239        HPPA  A four processor HP machine (E9000) with 5G of memory.
240        X86   A 300Hhz Pentium II with 128M of memory, and
241        SPARC An Ultra sparc 2 with 512M of memory.
242    
243       I used the following parameters for the SML benchmarks:
244    
245                 @SMLalloc
246         HPPA    256k
247         SPARC   512k
248         X86     256k
249    
250    COMPILATION TIME
251    ----------------
252       Here are the numbers comparing the compilation times of the compilers.
253       I've only compared 110.25 compiling the new sources versus
254       a fixpoint version of the new compiler compiling the same.
255    
256                     110.25                                  New
257               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
258         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
259         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
260         X86    999s    315s       94006+130691   987s    296s    108877+141957
261    
262                   110.25         New
263                Code Size      Code Size
264         HPPA   8596736         8561421
265         SPARC  8974299         8785143
266         X86    9029180         8716783
267    
268       So in summary, things are at least as good as before.   Dramatic
269       reduction in compilation is obtained on the Sparc; I can't explain it,
270       but it is reproducible.  Perhaps someone should try to reproduce this
271       on their own machines.
272    
273    SML BENCHMARKS
274    --------------
275    
276        On the average, all benchmarks perform at least as well as before.
277    
278          HPPA         Compilation Time     Spill+Reload      Run Time
279                     110.25  New            110.25    New   110.25  New
280    
281          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
282              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
283       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
284                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
285        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
286             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
287               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
288              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
289         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
290             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
291            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
292      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
293                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
294             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
295                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
296               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
297      --------------------------------------------------------------------------
298       Average                     6.12%                                   4.09%
299    
300          SPARC        Compilation Time     Spill+Reload      Run Time
301                     110.25  New            110.25    New   110.25  New
302    
303          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
304              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
305       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
306                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
307        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
308             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
309               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
310              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
311         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
312             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
313            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
314      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
315                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
316             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
317                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
318               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
319      ----------------------------------------------------------------------------
320       Average                     6.94%                                    2.64%
321    
322          X86          Compilation Time     Spill+Reload      Run Time
323                     110.25  New            110.25    New   110.25  New
324    
325          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
326              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
327       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
328                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
329        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
330             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
331               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
332              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
333         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
334             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
335            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
336      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
337                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
338             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
339                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
340               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
341      ----------------------------------------------------------------------------
342       Average                     1.22%                                     3.36%
343    
344    ----------------------------------------------------------------------
345    Name: Allen Leung
346    Date: 2000/03/23 16:25:00
347    Tag: leunga-20000323-fix_x86_alpha
348    Description:
349    
350    1. X86 fixes/changes
351    
352       a.  The old code generated for SETcc was completely wrong.
353           The Intel optimization guide is VERY misleading.
354    
355    2. ALPHA fixes/changes
356    
357       a.  Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
358       b.  Added a new mode byteWordLoadStores to the functor parameter to Alpha()
359       c.  Added reassociation code for address computation.
360    
361    ----------------------------------------------------------------------
362    Name: Allen Leung
363    Date: 2000/03/22 01:23:00
364    Tag: leunga-20000322-fix_x86_hppa_ra
365    Description:
366    
367    1. X86 fixes/changes
368    
369       a.  x86Rewrite bug with MUL3 (found by Lal)
370       b.  Added the instructions FSTS, FSTL
371    
372    2. PA-RISC fixes/changes
373    
374       a.  B label should not be a delay slot candidate!  Why did this work?
375       b.  ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
376           as it should be.
377       c.  The assembly syntax for fstds and fstdd was wrong.
378       d.  Added the composite instruction COMICLR/LDO, which is the immediate
379           operand variant of COMCLR/LDO.
380    
381    3. Generic MLRISC
382    
383       a.  shuffle.sml rewritten to be slightly more efficient
384       b.  DIV bug in mltree-simplify fixed (found by Fermin)
385    
386    4. Register Allocator
387    
388       a.  I now release the interference graph earlier during spilling.
389           May improve memory usage.
390    
391    ----------------------------------------------------------------------
392    Name: Matthias Blume
393    Date: 2000/03/14 14:15:32
394    Tag: blume_main_v110p26p1_2
395    Description:
396    
397    1. Tools.registerStdShellCmdTool (from smlnj/cm/tool.cm) takes an
398    additional argument called "template" which is an optional string that
399    specifiel the layout of the tool command line.  See the CM manual for
400    explanation.
401    
402    2. A special-purpose tool can be "regisitered" by simply dropping the
403    corresponding <...>-tool.cm (and/or <...>-ext.cm) into the same
404    directory where the .cm file lives that uses this tool.  (The
405    behavior/misfeature until now was to look for the tool description
406    files in the current working directory.)  As before, tool description
407    files could also be anchored -- in which case they can live anywhere
408    they like.  Following the recent e-mail discussion, this change should
409    make it easier to have special-purpose tools that are shipped together
410    with the sources of the program that uses them.
411    
412  ----------------------------------------------------------------------  ----------------------------------------------------------------------
413  Name: Matthias Blume  Name: Matthias Blume
414  Date: 2000/03/10 07:48:34  Date: 2000/03/10 07:48:34

Legend:
Removed from v.577  
changed lines
  Added in v.588

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0