Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 572, Thu Mar 9 02:43:06 2000 UTC revision 589, Fri Mar 31 16:14:36 2000 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: David MacQueen
17    Date: 2000/03/31 11:15:00 EST
18    Tag: dbm-20000331-runtime_fix
19    Description:
20    
21    This update contains:
22    
23    1. runtime/c-lib/c-libraries.c
24       includes added in revision 1.2 caused compilation errors on hppa-hpux
25    
26    2. fix for bug 1556
27       system/Basis/Implementation/NJ/internal-signals.sml
28    
29    ----------------------------------------------------------------------
30    Name: Matthias Blume
31    Date: 2000/03/31 18:00:00 JST
32    Tag: blume_main_v110p26p2_1
33    Description:
34    
35    This update contains:
36    
37    1. A small change to CM's handling of stable libraries:
38       CM now maintains one "global" modmap that is used for all stable
39       libraries.  The use of such a global modmap maximizes sharing and
40       minimizes the need for re-traversing parts of environments during
41       modmap construction.  (However, this has minor impact since modmap
42       construction seems to account for just one percent or less of total
43       compile time.)
44    
45    2. I added a "genmap" phase to the statistics.  This is where I got the
46       "one percent" number (see above).
47    
48    3. CM's new tool parameter mechanism just became _even_ better. :)
49       - The parser understands named parameters and recursive options.
50       - The "make" and "shell" tools use these new features.
51         (This makes it a lot easier to cascade these tools.)
52       - There is a small syntax change: named parameters use a
53    
54           <name> : ( <option> ... )            or
55           <name> : <string>
56    
57         syntax.  Previously, named parameters were implemented in an
58         ad-hoc fashion by each tool individually (by parsing strings)
59         and had the form
60    
61           <name>=<string>
62    
63       See the CM manual for a full description of these issues.
64    
65    ----------------------------------------------------------------------
66    Name: Matthias Blume
67    Date: 2000/03/30 18:00:00 JST
68    Tag: blume_main_v110p26p2_0
69    Description:
70    
71    !!!!! WARNING !!!!!!
72    !!  New binfiles  !!
73    !!!!!!!!!!!!!!!!!!!!
74    
75    This update contains:
76    
77    1. Moderate changes to CM:
78    
79       - Changes to CM's tools mechanism.  In particular, it is now possible
80       to have tools that accept additional "command line" parameters
81       (specified in the .cm file at each instance where the tool's class is
82       used).
83    
84       This was done to accomodate the new "make" and "shell" tools which
85       facilitate fairly seemless hookup to portions of code managed using
86       Makefiles or Shell scripts.
87    
88       There are no classes "shared" or "private" anymore.  Instead, the
89       sharing annotation is now a parameter to the "sml" class.
90    
91       There is a bit of generic machinery for implementing one's own
92       tools that accept command-line parameters.  However, I am not yet fully
93       satisfied with that part, so expect changes here in the future.
94    
95       All existing tools are described in the CM manual.
96    
97       - Slightly better error handling.  (CM now surpresses many followup
98       error messages that tended to be more annoying than helpful.)
99    
100    2. Major changes to the compiler's static environment data structures.
101    
102       - no CMStaticEnv anymore.
103            - no CMEnv, no "BareEnvironment" (actually, _only_ BareEnvironment,
104              but it is called Environment), no conversions between different
105              kinds of static environments
106    
107       - There is still a notion of a "modmap", but such modmaps are generated
108         on demand at the time when they are needed.  This sounds slow, but I
109         sped up the code that generates modmaps enough for this not to lead to
110         a slowdown of the compiler (at least I didn't detect any).
111    
112       - To facilitate rapid modmap generation, static environments now
113         contain an (optional) "modtree" structure.  Modtree annotations are
114         constructed by the unpickler during unpickling.  (This means that
115         the elaborator does not have to worry about modtrees at all.)
116         Modtrees have the advantage that they are compositional in the same
117         way as the environment data structure itself is compositional.
118         As a result, modtrees never hang on to parts of an environment that
119         has already been rendered "stale" by filtering or rebinding.
120    
121       - I went through many, many trials and errors before arriving at the
122         current solution.  (The initial idea of "linkpaths" did not work.)
123         But the result of all this is that I have touched a lot of files that
124         depend on the "modules" and "types" data structures (most of the
125         elaborator). There were a lot of changes during my "linkpath" trials
126         that could have been reverted to their original state but weren't.
127         Please, don't be too harsh on me for messing with this code a bit more
128         than what was strictly necessary...  (I _did_ resist the tempation
129         of doing any "global reformatting" to avoid an untimely death at
130         Dave's hands. :)
131    
132       - One positive aspect of the previous point:  At least I made sure that
133         all files that I touched now compile without warnings (other than
134         "polyEqual").
135    
136       - compiler now tends to run "leaner" (i.e., ties up less memory in
137         redundant modmaps)
138    
139    ----------------------------------------------------------------------
140    Name: Allen Leung
141    Date: 2000/03/29 18:00:00
142    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
143    Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz
144    Description:
145    
146       This update contains *MAJOR* changes to the way code is generated from CPS
147    in the module mlriscGen, and in various backend modules.
148    
149    CHANGES
150    =======
151    
152    1. MLRiscGen: forward propagation fix.
153    
154       There was a bug in forward propagation introduced at about the same time
155       as the MLRISC x86 backend, which prohibits coalescing to be
156       performed effectively in loops.
157    
158       Effect: speed up of loops in RISC architectures.
159               By itself, this actually slowed down certain benchmarks on the x86.
160    
161    2. MLRiscGen:  forward propagating addresses from consing.
162    
163       I've changed the way consing code is generated.  Basically I separated
164       out the initialization part:
165    
166            store tag,   offset(allocptr)
167            store elem1, offset+4(allocptr)
168            store elem2, offset+8(allocptr)
169            ...
170            store elemn, offset+4n(allocptr)
171    
172       and the address computation part:
173    
174            celladdr <- offset+4+alloctpr
175    
176       and move the address computation part
177    
178       Effect:  register pressure is generally lower as a result.  This
179                makes compilation of certain expressions much faster, such as
180                long lists with non-trivial elements.
181    
182                 [(0,0), (0,0), .... (0,0)]
183    
184    3. MLRiscGen: base pointer elimination.
185    
186        As part of the linkage mechanism, we generate the sequence:
187    
188         L:  ...  <- start of the code fragment
189    
190         L1:
191             base pointer <- linkreg - L1 + L
192    
193         The base pointer was then used for computing relocatable addresses
194       in the code fragment.  Frequently (such as in lots of continuations)
195       this is not needed.  We now eliminate this sequence whenever possible.
196    
197         For compile time efficiency, I'm using a very stupid local heuristic.
198       But in general, this should be done as a control flow analysis.
199    
200       Effect:  Smaller code size.  Speed up of most programs.
201    
202    4. Hppa back end
203    
204         Long jumps in span dependence resolution used to depend on the existence
205      of the base pointer.
206    
207         A jump to a long label L was expanded into the following sequence:
208    
209          LDIL %hi(L-8192), %r29
210          LDO  %lo(L-8192)(%r29), %r29
211          ADD  %r29, baseptr, %r29
212          BV,n %r0(%r29)
213    
214         In the presence of change (3) above, this will not work.  I've changed
215       it so that the following sequence of instructions are generated, which
216       doesn't mention the base pointer at all:
217    
218             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
219        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
220             BV,n  %r0(%r29)          /* Jump */
221    
222    5. Alpha back end
223    
224          New alpha instructions LDB/LDW have been added, as per Fermin's
225       suggestions.   This is unrelated to all other changes.
226    
227    6. X86 back end
228    
229         I've changed andl to testl in the floating point test sequence
230         whenever appropriate.  The Intel optimization guide states that
231         testl is perferable to andl.
232    
233    7. RA (x86 only)
234    
235         I've improved the spill propagation algorithm, using an approximation
236       of maximal weighted independent sets.   This seems to be necessary to
237       alleviate the negative effect in light of the slow down in (1).
238    
239         I'll write down the algorithm one of these days.
240    
241    8. MLRiscGen: frequencies
242    
243         I've added an annotation that states that all call gc blocks have zero
244       execution frequencies.  This improves register allocation on the x86.
245    
246    BENCHMARKS
247    ==========
248    
249       I've only perform the comparison on 110.25.
250    
251       The platforms are:
252    
253        HPPA  A four processor HP machine (E9000) with 5G of memory.
254        X86   A 300Hhz Pentium II with 128M of memory, and
255        SPARC An Ultra sparc 2 with 512M of memory.
256    
257       I used the following parameters for the SML benchmarks:
258    
259                 @SMLalloc
260         HPPA    256k
261         SPARC   512k
262         X86     256k
263    
264    COMPILATION TIME
265    ----------------
266       Here are the numbers comparing the compilation times of the compilers.
267       I've only compared 110.25 compiling the new sources versus
268       a fixpoint version of the new compiler compiling the same.
269    
270                     110.25                                  New
271               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
272         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
273         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
274         X86    999s    315s       94006+130691   987s    296s    108877+141957
275    
276                   110.25         New
277                Code Size      Code Size
278         HPPA   8596736         8561421
279         SPARC  8974299         8785143
280         X86    9029180         8716783
281    
282       So in summary, things are at least as good as before.   Dramatic
283       reduction in compilation is obtained on the Sparc; I can't explain it,
284       but it is reproducible.  Perhaps someone should try to reproduce this
285       on their own machines.
286    
287    SML BENCHMARKS
288    --------------
289    
290        On the average, all benchmarks perform at least as well as before.
291    
292          HPPA         Compilation Time     Spill+Reload      Run Time
293                     110.25  New            110.25    New   110.25  New
294    
295          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
296              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
297       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
298                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
299        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
300             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
301               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
302              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
303         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
304             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
305            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
306      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
307                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
308             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
309                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
310               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
311      --------------------------------------------------------------------------
312       Average                     6.12%                                   4.09%
313    
314          SPARC        Compilation Time     Spill+Reload      Run Time
315                     110.25  New            110.25    New   110.25  New
316    
317          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
318              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
319       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
320                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
321        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
322             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
323               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
324              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
325         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
326             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
327            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
328      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
329                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
330             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
331                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
332               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
333      ----------------------------------------------------------------------------
334       Average                     6.94%                                    2.64%
335    
336          X86          Compilation Time     Spill+Reload      Run Time
337                     110.25  New            110.25    New   110.25  New
338    
339          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
340              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
341       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
342                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
343        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
344             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
345               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
346              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
347         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
348             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
349            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
350      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
351                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
352             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
353                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
354               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
355      ----------------------------------------------------------------------------
356       Average                     1.22%                                     3.36%
357    
358    ----------------------------------------------------------------------
359    Name: Allen Leung
360    Date: 2000/03/23 16:25:00
361    Tag: leunga-20000323-fix_x86_alpha
362    Description:
363    
364    1. X86 fixes/changes
365    
366       a.  The old code generated for SETcc was completely wrong.
367           The Intel optimization guide is VERY misleading.
368    
369    2. ALPHA fixes/changes
370    
371       a.  Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
372       b.  Added a new mode byteWordLoadStores to the functor parameter to Alpha()
373       c.  Added reassociation code for address computation.
374    
375    ----------------------------------------------------------------------
376    Name: Allen Leung
377    Date: 2000/03/22 01:23:00
378    Tag: leunga-20000322-fix_x86_hppa_ra
379    Description:
380    
381    1. X86 fixes/changes
382    
383       a.  x86Rewrite bug with MUL3 (found by Lal)
384       b.  Added the instructions FSTS, FSTL
385    
386    2. PA-RISC fixes/changes
387    
388       a.  B label should not be a delay slot candidate!  Why did this work?
389       b.  ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
390           as it should be.
391       c.  The assembly syntax for fstds and fstdd was wrong.
392       d.  Added the composite instruction COMICLR/LDO, which is the immediate
393           operand variant of COMCLR/LDO.
394    
395    3. Generic MLRISC
396    
397       a.  shuffle.sml rewritten to be slightly more efficient
398       b.  DIV bug in mltree-simplify fixed (found by Fermin)
399    
400    4. Register Allocator
401    
402       a.  I now release the interference graph earlier during spilling.
403           May improve memory usage.
404    
405    ----------------------------------------------------------------------
406    Name: Matthias Blume
407    Date: 2000/03/14 14:15:32
408    Tag: blume_main_v110p26p1_2
409    Description:
410    
411    1. Tools.registerStdShellCmdTool (from smlnj/cm/tool.cm) takes an
412    additional argument called "template" which is an optional string that
413    specifiel the layout of the tool command line.  See the CM manual for
414    explanation.
415    
416    2. A special-purpose tool can be "regisitered" by simply dropping the
417    corresponding <...>-tool.cm (and/or <...>-ext.cm) into the same
418    directory where the .cm file lives that uses this tool.  (The
419    behavior/misfeature until now was to look for the tool description
420    files in the current working directory.)  As before, tool description
421    files could also be anchored -- in which case they can live anywhere
422    they like.  Following the recent e-mail discussion, this change should
423    make it easier to have special-purpose tools that are shipped together
424    with the sources of the program that uses them.
425    
426    ----------------------------------------------------------------------
427    Name: Matthias Blume
428    Date: 2000/03/10 07:48:34
429    Tag: blume_main_v110p26p1_1
430    Description:
431    
432    I added a re-written version of Dave's fixpt script to src/system.
433    Changes relative to the original version:
434      - sh-ified (not everybody has ksh)
435      - automatically figures out which architecture it runs on
436      - uses ./makeml a bit more cleverly
437      - never invokes ./installml (and, thus, does not clobber your
438        good and working installation of sml in case something goes wrong)
439      - accepts max iteration count using option "-iter <n>"
440      - accepts a "base" name using option "-base <base>"
441    
442    It does not build any extraneous heap images but directly rebuilds
443    bin- and boot-hierarchies using makeml's "-rebuild" switch. Finally,
444    it can incorporate existing bin- and boot- hierarchies.  For example,
445    suppose the base is set to "sml" (which is the default).  Then it
446    successively builds
447    
448            sml.bin.<arch>-unix and sml.boot.<arch>-unix
449    then    sml1.bin.<arch>-unix and sml1.boot.<arch>-unix
450    then    sml2.bin.<arch>-unix and sml2.boot.<arch>-unix
451    ...
452    then    sml<n>.bin.<arch>-unix and sml<n>.boot.<arch>-unix
453    
454    and so on.  If any of these already exist, it will just use what's
455    there.  In particular, many people will have the initial set of bin
456    and boot files around, so this saves time for at least one full
457    rebuild.  Having sets of the form <base><k>.{bin,boot}.<arch>-unix for
458    <k>=1,2,... is normally not a good idea when invoking fixpt.  However,
459    they might be the result of an earlier partial run of fixpt (which
460    perhaps got accidentially killed).  In this case, fixpt will quickly
461    move through what exists before continuing where it left off earlier,
462    and, thus, saves a lot of time.
463    
464    ----------------------------------------------------------------------
465    Name: Allen Leung
466    Date: 00/03/10 02:20:00
467    Tag: leunga-20000310-fix_x86_asm_ra
468    Description:
469    
470    More assembly output problems involving the indexed addressing mode
471    on the x86 have been found and corrected. Thanks to Fermin Reig for the
472    fix.
473    
474    The interface and implementation of the register allocator have been changed
475    slightly to accommodate the possibility to skip the register allocation
476    phases completely and go directly to memory allocation.  This is needed
477    for C-- use.
478    
479    ----------------------------------------------------------------------
480    Name: Matthias Blume
481    Date: 00/03/09 10:23:53
482    Tag: blume_main_v110p26p1_0
483    Description:
484    
485    * Complete re-organization of library names.  Many libraries have been
486    consolidated so that they share the same path anchor.  For example,
487    all MLRISC-related libraries are anchored at MLRISC, most libraries that
488    are SML/NJ-specific are under "smlnj".  Notice that names like
489    host-cmb.cm or host-compiler.cm no longer exist.  See system/README
490    for a complete description of the new naming scheme.  Quick reference:
491    
492       host-cmb.cm        -> smlnj/cmb.cm
493       host-compiler.cm   -> smlnj/compiler.cm
494       full-cm.cm         -> smlnj/cm.cm
495       <arch>-<os>.cm     -> smlnj/cmb/<arch>-<os>.cm
496       <arch>-compiler.cm -> smlnj/compiler/<arch>.cm
497    
498    * Bug fixes in CM.
499        - exceptions in user code are being passed through (i.e., reach top level)
500        - more bugs in paranoia mode fixed
501        - bug related to checking group owners fixed
502    
503    * New install.sh script that automagically fetches archive files:
504      The new file config/srcarchiveurl must contain the URL of the
505      (remote) directory that contains bin files (or other source archives).
506      If install.sh does not find the archive locally, it tries to get
507      it from that remote directory.
508      This should simplify installation further:  For machines that have
509      access to the internet, just fetch <version>-config.tgz, unpack it,
510      edit config/targets, and go (run config/install.sh).  The scipt will
511      fetch everything else that it might need all by itself.
512    
513      For CVS users, this mechanism is not relevant for source archives, but
514      it is convenient for getting new sets of binfiles.
515    
516      Archives should be tar files compressed with either gzip, compress, or
517      bzip2.  The script recognizes .tgz, .tar, tar.gz, tz, .tar.Z, and .tar.bz2.
518    
519    ----------------------------------------------------------------------
520  Name: Matthias Blume  Name: Matthias Blume
521  Date: 2000/03/07 04:01:04  Date: 2000/03/07 04:01:04
522  Tag: blume_main_v110_26_2  Tag: blume_main_v110_26_2

Legend:
Removed from v.572  
changed lines
  Added in v.589

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0