Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 570, Wed Mar 8 17:30:13 2000 UTC revision 587, Thu Mar 30 09:01:52 2000 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name:  Name: Matthias Blume
17  Date:  Date: 2000/03/30 18:00:00 JST
18  Tag:  Tag: blume_main_v110p26p2_0
19  Description:  Description:
20    
21    !!!!! WARNING !!!!!!
22    !!  New binfiles  !!
23    !!!!!!!!!!!!!!!!!!!!
24    
25    This update contains:
26    
27    1. Moderate changes to CM:
28    
29       - Changes to CM's tools mechanism.  In particular, it is now possible
30       to have tools that accept additional "command line" parameters
31       (specified in the .cm file at each instance where the tool's class is
32       used).
33    
34       This was done to accomodate the new "make" and "shell" tools which
35       facilitate fairly seemless hookup to portions of code managed using
36       Makefiles or Shell scripts.
37    
38       There are no classes "shared" or "private" anymore.  Instead, the
39       sharing annotation is now a parameter to the "sml" class.
40    
41       There is a bit of generic machinery for implementing one's own
42       tools that accept command-line parameters.  However, I am not yet fully
43       satisfied with that part, so expect changes here in the future.
44    
45       All existing tools are described in the CM manual.
46    
47       - Slightly better error handling.  (CM now surpresses many followup
48       error messages that tended to be more annoying than helpful.)
49    
50    2. Major changes to the compiler's static environment data structures.
51    
52       - no CMStaticEnv anymore.
53            - no CMEnv, no "BareEnvironment" (actually, _only_ BareEnvironment,
54              but it is called Environment), no conversions between different
55              kinds of static environments
56    
57       - There is still a notion of a "modmap", but such modmaps are generated
58         on demand at the time when they are needed.  This sounds slow, but I
59         sped up the code that generates modmaps enough for this not to lead to
60         a slowdown of the compiler (at least I didn't detect any).
61    
62       - To facilitate rapid modmap generation, static environments now
63         contain an (optional) "modtree" structure.  Modtree annotations are
64         constructed by the unpickler during unpickling.  (This means that
65         the elaborator does not have to worry about modtrees at all.)
66         Modtrees have the advantage that they are compositional in the same
67         way as the environment data structure itself is compositional.
68         As a result, modtrees never hang on to parts of an environment that
69         has already been rendered "stale" by filtering or rebinding.
70    
71       - I went through many, many trials and errors before arriving at the
72         current solution.  (The initial idea of "linkpaths" did not work.)
73         But the result of all this is that I have touched a lot of files that
74         depend on the "modules" and "types" data structures (most of the
75         elaborator). There were a lot of changes during my "linkpath" trials
76         that could have been reverted to their original state but weren't.
77         Please, don't be too harsh on me for messing with this code a bit more
78         than what was strictly necessary...  (I _did_ resist the tempation
79         of doing any "global reformatting" to avoid an untimely death at
80         Dave's hands. :)
81    
82       - One positive aspect of the previous point:  At least I made sure that
83         all files that I touched now compile without warnings (other than
84         "polyEqual").
85    
86       - compiler now tends to run "leaner" (i.e., ties up less memory in
87         redundant modmaps)
88    
89    ----------------------------------------------------------------------
90    Name: Allen Leung
91    Date: 2000/03/29 18:00:00
92    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
93    Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz
94    Description:
95    
96       This update contains *MAJOR* changes to the way code is generated from CPS
97    in the module mlriscGen, and in various backend modules.
98    
99    CHANGES
100    =======
101    
102    1. MLRiscGen: forward propagation fix.
103    
104       There was a bug in forward propagation introduced at about the same time
105       as the MLRISC x86 backend, which prohibits coalescing to be
106       performed effectively in loops.
107    
108       Effect: speed up of loops in RISC architectures.
109               By itself, this actually slowed down certain benchmarks on the x86.
110    
111    2. MLRiscGen:  forward propagating addresses from consing.
112    
113       I've changed the way consing code is generated.  Basically I separated
114       out the initialization part:
115    
116            store tag,   offset(allocptr)
117            store elem1, offset+4(allocptr)
118            store elem2, offset+8(allocptr)
119            ...
120            store elemn, offset+4n(allocptr)
121    
122       and the address computation part:
123    
124            celladdr <- offset+4+alloctpr
125    
126       and move the address computation part
127    
128       Effect:  register pressure is generally lower as a result.  This
129                makes compilation of certain expressions much faster, such as
130                long lists with non-trivial elements.
131    
132                 [(0,0), (0,0), .... (0,0)]
133    
134    3. MLRiscGen: base pointer elimination.
135    
136        As part of the linkage mechanism, we generate the sequence:
137    
138         L:  ...  <- start of the code fragment
139    
140         L1:
141             base pointer <- linkreg - L1 + L
142    
143         The base pointer was then used for computing relocatable addresses
144       in the code fragment.  Frequently (such as in lots of continuations)
145       this is not needed.  We now eliminate this sequence whenever possible.
146    
147         For compile time efficiency, I'm using a very stupid local heuristic.
148       But in general, this should be done as a control flow analysis.
149    
150       Effect:  Smaller code size.  Speed up of most programs.
151    
152    4. Hppa back end
153    
154         Long jumps in span dependence resolution used to depend on the existence
155      of the base pointer.
156    
157         A jump to a long label L was expanded into the following sequence:
158    
159          LDIL %hi(L-8192), %r29
160          LDO  %lo(L-8192)(%r29), %r29
161          ADD  %r29, baseptr, %r29
162          BV,n %r0(%r29)
163    
164         In the presence of change (3) above, this will not work.  I've changed
165       it so that the following sequence of instructions are generated, which
166       doesn't mention the base pointer at all:
167    
168             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
169        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
170             BV,n  %r0(%r29)          /* Jump */
171    
172    5. Alpha back end
173    
174          New alpha instructions LDB/LDW have been added, as per Fermin's
175       suggestions.   This is unrelated to all other changes.
176    
177    6. X86 back end
178    
179         I've changed andl to testl in the floating point test sequence
180         whenever appropriate.  The Intel optimization guide states that
181         testl is perferable to andl.
182    
183    7. RA (x86 only)
184    
185         I've improved the spill propagation algorithm, using an approximation
186       of maximal weighted independent sets.   This seems to be necessary to
187       alleviate the negative effect in light of the slow down in (1).
188    
189         I'll write down the algorithm one of these days.
190    
191    8. MLRiscGen: frequencies
192    
193         I've added an annotation that states that all call gc blocks have zero
194       execution frequencies.  This improves register allocation on the x86.
195    
196    BENCHMARKS
197    ==========
198    
199       I've only perform the comparison on 110.25.
200    
201       The platforms are:
202    
203        HPPA  A four processor HP machine (E9000) with 5G of memory.
204        X86   A 300Hhz Pentium II with 128M of memory, and
205        SPARC An Ultra sparc 2 with 512M of memory.
206    
207       I used the following parameters for the SML benchmarks:
208    
209                 @SMLalloc
210         HPPA    256k
211         SPARC   512k
212         X86     256k
213    
214    COMPILATION TIME
215    ----------------
216       Here are the numbers comparing the compilation times of the compilers.
217       I've only compared 110.25 compiling the new sources versus
218       a fixpoint version of the new compiler compiling the same.
219    
220                     110.25                                  New
221               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
222         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
223         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
224         X86    999s    315s       94006+130691   987s    296s    108877+141957
225    
226                   110.25         New
227                Code Size      Code Size
228         HPPA   8596736         8561421
229         SPARC  8974299         8785143
230         X86    9029180         8716783
231    
232       So in summary, things are at least as good as before.   Dramatic
233       reduction in compilation is obtained on the Sparc; I can't explain it,
234       but it is reproducible.  Perhaps someone should try to reproduce this
235       on their own machines.
236    
237    SML BENCHMARKS
238    --------------
239    
240        On the average, all benchmarks perform at least as well as before.
241    
242          HPPA         Compilation Time     Spill+Reload      Run Time
243                     110.25  New            110.25    New   110.25  New
244    
245          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
246              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
247       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
248                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
249        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
250             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
251               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
252              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
253         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
254             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
255            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
256      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
257                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
258             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
259                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
260               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
261      --------------------------------------------------------------------------
262       Average                     6.12%                                   4.09%
263    
264          SPARC        Compilation Time     Spill+Reload      Run Time
265                     110.25  New            110.25    New   110.25  New
266    
267          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
268              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
269       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
270                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
271        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
272             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
273               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
274              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
275         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
276             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
277            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
278      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
279                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
280             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
281                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
282               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
283      ----------------------------------------------------------------------------
284       Average                     6.94%                                    2.64%
285    
286          X86          Compilation Time     Spill+Reload      Run Time
287                     110.25  New            110.25    New   110.25  New
288    
289          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
290              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
291       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
292                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
293        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
294             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
295               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
296              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
297         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
298             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
299            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
300      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
301                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
302             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
303                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
304               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
305      ----------------------------------------------------------------------------
306       Average                     1.22%                                     3.36%
307    
308    ----------------------------------------------------------------------
309    Name: Allen Leung
310    Date: 2000/03/23 16:25:00
311    Tag: leunga-20000323-fix_x86_alpha
312    Description:
313    
314    1. X86 fixes/changes
315    
316       a.  The old code generated for SETcc was completely wrong.
317           The Intel optimization guide is VERY misleading.
318    
319    2. ALPHA fixes/changes
320    
321       a.  Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
322       b.  Added a new mode byteWordLoadStores to the functor parameter to Alpha()
323       c.  Added reassociation code for address computation.
324    
325    ----------------------------------------------------------------------
326    Name: Allen Leung
327    Date: 2000/03/22 01:23:00
328    Tag: leunga-20000322-fix_x86_hppa_ra
329    Description:
330    
331    1. X86 fixes/changes
332    
333       a.  x86Rewrite bug with MUL3 (found by Lal)
334       b.  Added the instructions FSTS, FSTL
335    
336    2. PA-RISC fixes/changes
337    
338       a.  B label should not be a delay slot candidate!  Why did this work?
339       b.  ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
340           as it should be.
341       c.  The assembly syntax for fstds and fstdd was wrong.
342       d.  Added the composite instruction COMICLR/LDO, which is the immediate
343           operand variant of COMCLR/LDO.
344    
345    3. Generic MLRISC
346    
347       a.  shuffle.sml rewritten to be slightly more efficient
348       b.  DIV bug in mltree-simplify fixed (found by Fermin)
349    
350    4. Register Allocator
351    
352       a.  I now release the interference graph earlier during spilling.
353           May improve memory usage.
354    
355    ----------------------------------------------------------------------
356    Name: Matthias Blume
357    Date: 2000/03/14 14:15:32
358    Tag: blume_main_v110p26p1_2
359    Description:
360    
361    1. Tools.registerStdShellCmdTool (from smlnj/cm/tool.cm) takes an
362    additional argument called "template" which is an optional string that
363    specifiel the layout of the tool command line.  See the CM manual for
364    explanation.
365    
366    2. A special-purpose tool can be "regisitered" by simply dropping the
367    corresponding <...>-tool.cm (and/or <...>-ext.cm) into the same
368    directory where the .cm file lives that uses this tool.  (The
369    behavior/misfeature until now was to look for the tool description
370    files in the current working directory.)  As before, tool description
371    files could also be anchored -- in which case they can live anywhere
372    they like.  Following the recent e-mail discussion, this change should
373    make it easier to have special-purpose tools that are shipped together
374    with the sources of the program that uses them.
375    
376    ----------------------------------------------------------------------
377    Name: Matthias Blume
378    Date: 2000/03/10 07:48:34
379    Tag: blume_main_v110p26p1_1
380    Description:
381    
382    I added a re-written version of Dave's fixpt script to src/system.
383    Changes relative to the original version:
384      - sh-ified (not everybody has ksh)
385      - automatically figures out which architecture it runs on
386      - uses ./makeml a bit more cleverly
387      - never invokes ./installml (and, thus, does not clobber your
388        good and working installation of sml in case something goes wrong)
389      - accepts max iteration count using option "-iter <n>"
390      - accepts a "base" name using option "-base <base>"
391    
392    It does not build any extraneous heap images but directly rebuilds
393    bin- and boot-hierarchies using makeml's "-rebuild" switch. Finally,
394    it can incorporate existing bin- and boot- hierarchies.  For example,
395    suppose the base is set to "sml" (which is the default).  Then it
396    successively builds
397    
398            sml.bin.<arch>-unix and sml.boot.<arch>-unix
399    then    sml1.bin.<arch>-unix and sml1.boot.<arch>-unix
400    then    sml2.bin.<arch>-unix and sml2.boot.<arch>-unix
401    ...
402    then    sml<n>.bin.<arch>-unix and sml<n>.boot.<arch>-unix
403    
404    and so on.  If any of these already exist, it will just use what's
405    there.  In particular, many people will have the initial set of bin
406    and boot files around, so this saves time for at least one full
407    rebuild.  Having sets of the form <base><k>.{bin,boot}.<arch>-unix for
408    <k>=1,2,... is normally not a good idea when invoking fixpt.  However,
409    they might be the result of an earlier partial run of fixpt (which
410    perhaps got accidentially killed).  In this case, fixpt will quickly
411    move through what exists before continuing where it left off earlier,
412    and, thus, saves a lot of time.
413    
414    ----------------------------------------------------------------------
415    Name: Allen Leung
416    Date: 00/03/10 02:20:00
417    Tag: leunga-20000310-fix_x86_asm_ra
418    Description:
419    
420    More assembly output problems involving the indexed addressing mode
421    on the x86 have been found and corrected. Thanks to Fermin Reig for the
422    fix.
423    
424    The interface and implementation of the register allocator have been changed
425    slightly to accommodate the possibility to skip the register allocation
426    phases completely and go directly to memory allocation.  This is needed
427    for C-- use.
428    
429    ----------------------------------------------------------------------
430    Name: Matthias Blume
431    Date: 00/03/09 10:23:53
432    Tag: blume_main_v110p26p1_0
433    Description:
434    
435    * Complete re-organization of library names.  Many libraries have been
436    consolidated so that they share the same path anchor.  For example,
437    all MLRISC-related libraries are anchored at MLRISC, most libraries that
438    are SML/NJ-specific are under "smlnj".  Notice that names like
439    host-cmb.cm or host-compiler.cm no longer exist.  See system/README
440    for a complete description of the new naming scheme.  Quick reference:
441    
442       host-cmb.cm        -> smlnj/cmb.cm
443       host-compiler.cm   -> smlnj/compiler.cm
444       full-cm.cm         -> smlnj/cm.cm
445       <arch>-<os>.cm     -> smlnj/cmb/<arch>-<os>.cm
446       <arch>-compiler.cm -> smlnj/compiler/<arch>.cm
447    
448    * Bug fixes in CM.
449        - exceptions in user code are being passed through (i.e., reach top level)
450        - more bugs in paranoia mode fixed
451        - bug related to checking group owners fixed
452    
453    * New install.sh script that automagically fetches archive files:
454      The new file config/srcarchiveurl must contain the URL of the
455      (remote) directory that contains bin files (or other source archives).
456      If install.sh does not find the archive locally, it tries to get
457      it from that remote directory.
458      This should simplify installation further:  For machines that have
459      access to the internet, just fetch <version>-config.tgz, unpack it,
460      edit config/targets, and go (run config/install.sh).  The scipt will
461      fetch everything else that it might need all by itself.
462    
463      For CVS users, this mechanism is not relevant for source archives, but
464      it is convenient for getting new sets of binfiles.
465    
466      Archives should be tar files compressed with either gzip, compress, or
467      bzip2.  The script recognizes .tgz, .tar, tar.gz, tz, .tar.Z, and .tar.bz2.
468    
469    ----------------------------------------------------------------------
470    Name: Matthias Blume
471    Date: 2000/03/07 04:01:04
472    Tag: blume_main_v110_26_2
473    Description:
474    - size info in BOOTLIST
475         * no fixed upper limits for number of bootfiles or length of
476           bootfile names in runtime
477         * falling back to old behavior if no BOOTLIST size info found
478    - allocation size heuristics in .run-sml
479         * tries to read cache size from /proc/cpuinfo (this is important for
480            small-cache Celeron systems!)
481    - install.sh robustified
482    - CM manual updates
483    - paranoid mode
484         * no more CMB.deliver() (i.e., all done by CMB.make())
485         * can re-use existing sml.boot.* files
486         * init.cmi now treated as library
487         * library stamps for consistency checks
488    - sml.boot.<arch>-<os>/PIDMAP file
489         * This file is read by the CM startup code.  This is used to minimize
490           the amount of dynamic state that needs to be stowed away for the
491           purpose of sharing between interactive system and user code.
492    - CM.Anchor.anchor instead of CM.Anchor.{set,cancel}
493         * Upon request by Elsa.  Anchors now controlled by get-set-pair
494           like most other CM state variables.
495    - Compiler.CMSA eliminated
496         * No longer supported by CM anyway.
497    - fixed bugs in pickler that kept biting Stefan
498         * past refs to past refs (was caused by the possibility that
499           ad-hoc sharing is more discriminating than hash-cons sharing)
500         * integer overflow on LargeInt.minInt
501    - ml-{lex,yacc} build scripts now use new mechanism
502      for building standalone programs
503    - fixed several gcc -Wall warnings that were caused by missing header
504      files, missing initializations, etc., in runtime (not all warnings
505      eliminated, though)

Legend:
Removed from v.570  
changed lines
  Added in v.587

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0