Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 580, Wed Mar 22 06:33:52 2000 UTC revision 601, Thu Apr 6 04:38:14 2000 UTC
# Line 13  Line 13 
13  Description:  Description:
14  ----------------------------------------------------------------------  ----------------------------------------------------------------------
15  Name: Allen Leung  Name: Allen Leung
16    Date: 2000/04/06 00:36:00 EST
17    Tag: leunga-20000406-peephole-x86-SSA
18    Description:
19    
20    1.  New Peephole code
21    
22    2.  Minor improvement to X86 instruction selection
23    
24    3.  Various fixes to SSA and machine description -> code translator
25    
26    ----------------------------------------------------------------------
27    Name: Matthias Blume
28    Date: 2000/04/05 12:30:00 JST
29    Tag: blume_main_v110p26p2_3
30    Description:
31    
32    This update just merges three minor cosmetic updates to CM's sources
33    to get ready for the 110.27 code freeze on Friday.  No functionality
34    has changed.
35    
36    ----------------------------------------------------------------------
37    Name: Allen Leung
38    Date: 2000/04/04 19:39:00 EST
39    Tag: leunga-20000404-x86-asm
40    Description:
41    
42    1.  Fixed a problem in X86 assembly.
43    
44        Things like
45    
46           jmp %eax
47           jmp (%eax)
48    
49        should be output as
50    
51           jmp *%eax
52           jmp *(%eax)
53    
54    2.  Assembly output
55    
56          Added a new flag
57    
58              "asm-indent-copies" (default to false)
59    
60          When this flag is on, parallel copies will be indented an extra level.
61    
62    ----------------------------------------------------------------------
63    Name: Allen Leung
64    Date: 2000/04/04 03:18:00 EST
65    Tag: leunga-20000404-C--Moby
66    Description:
67    
68        All of these fixes are related to C--, Moby, and my own optimization
69        stuff; so they shouldn't affect SML/NJ.
70    
71    1.  X86
72    
73        Various fixes related floating point, and extensions.
74    
75    2.  Alpha
76    
77        Some extra patterns related to loads with signed/zero extension
78        provided by Fermin.
79    
80    3.  Assembly
81    
82        When generating assemby, resolve the value of client defined constants,
83        instead of generating symbolic values.  This is controlled by the
84        new flag "asm-resolve-constants", which is default to true.
85    
86    4.  Machine Descriptions
87    
88        a. The precedence parser was slightly broken when parsing infixr symbols.
89        b. The type generalizing code had the bound variables reversed, resulting
90           in a problem during arity raising.
91        c. Various fixes in machine descriptions.
92    
93    ----------------------------------------------------------------------
94    Name: Matthias Blume
95    Date: 2000/04/03 16:05:00 JST
96    Tag: blume_main_v110p26p2_2
97    Description:
98    
99    I eliminated coreEnv from compInfo.  Access to the "Core" structure is
100    now done via the ordinary static environment that is context to each
101    compilation unit.
102    
103    To this end, I arranged that instead of "structure Core" as "structure
104    _Core" is bound in the pervasive environment.  Core access is done via
105    _Core (which can never be accidentially rebound because _Core is not a
106    legal surface-syntax symbol).
107    
108    The current solution is much cleaner because the core environment is
109    now simply part of the pervasive environment which is part of every
110    compilation unit's context anyway.  In particular, this eliminates all
111    special-case handling that was necessary until now in order to deal
112    with dynamic and symbolic parts of the core environment.
113    
114    Remaining hackery (to bind the "magic" symbol _Core) is localized in the
115    compilation mananger's bootstrap compiler (actually: in the "init group"
116    handling).  See the comments in src/system/smlnj/init/init.cmi for
117    more details.
118    
119    I also tried to track down all mentions of "Core" (as string argument
120    to Symbol.strSymbol) in the compiler and replaced them with a
121    reference to the new CoreSym.coreSym.  Seems cleaner since the actual
122    name appears in one place only.
123    
124    Binfile and bootfile format have not changed, but the switchover from
125    the old "init.cmi" to the new one is a bit tricky, so I supplied new
126    bootfiles anyway.
127    
128    ----------------------------------------------------------------------
129    Name: Allen Leung
130    Date: 2000/04/02 21:17:00 EST
131    Tag: leunga-20000402-mltree
132    Description:
133    
134       1. Renamed the constructor CALL in MLTREE by popular demand.
135       2. Added a bunch of files from my repository.  These are currently
136          used by other non-SMLNJ backends.
137    
138    ----------------------------------------------------------------------
139    Name: Allen Leung
140    Date: 2000/03/31 21:15:00 EST
141    Tag: leunga-20000331-aliasing
142    Description:
143    
144    This update contains a rewritten (and hopefully more correct) module
145    for extracting aliasing information from CPS.
146    
147       To turn on this feature:
148    
149            Compiler.Control.CG.memDisambiguate := true
150    
151       To pretty print the region information with assembly
152    
153           Compiler.Control.MLRISC.getFlag "asm-show-region" := true;
154    
155       To control how many levels of aliasing information are printed, use:
156    
157           Compiler.Control.MLRISC.getInt "points-to-show-level" := n
158    
159       The default of n is 3.
160    
161    ----------------------------------------------------------------------
162    Name: David MacQueen
163    Date: 2000/03/31 11:15:00 EST
164    Tag: dbm-20000331-runtime_fix
165    Description:
166    
167    This update contains:
168    
169    1. runtime/c-lib/c-libraries.c
170       includes added in revision 1.2 caused compilation errors on hppa-hpux
171    
172    2. fix for bug 1556
173       system/Basis/Implementation/NJ/internal-signals.sml
174    
175    ----------------------------------------------------------------------
176    Name: Matthias Blume
177    Date: 2000/03/31 18:00:00 JST
178    Tag: blume_main_v110p26p2_1
179    Description:
180    
181    This update contains:
182    
183    1. A small change to CM's handling of stable libraries:
184       CM now maintains one "global" modmap that is used for all stable
185       libraries.  The use of such a global modmap maximizes sharing and
186       minimizes the need for re-traversing parts of environments during
187       modmap construction.  (However, this has minor impact since modmap
188       construction seems to account for just one percent or less of total
189       compile time.)
190    
191    2. I added a "genmap" phase to the statistics.  This is where I got the
192       "one percent" number (see above).
193    
194    3. CM's new tool parameter mechanism just became _even_ better. :)
195       - The parser understands named parameters and recursive options.
196       - The "make" and "shell" tools use these new features.
197         (This makes it a lot easier to cascade these tools.)
198       - There is a small syntax change: named parameters use a
199    
200           <name> : ( <option> ... )            or
201           <name> : <string>
202    
203         syntax.  Previously, named parameters were implemented in an
204         ad-hoc fashion by each tool individually (by parsing strings)
205         and had the form
206    
207           <name>=<string>
208    
209       See the CM manual for a full description of these issues.
210    
211    ----------------------------------------------------------------------
212    Name: Matthias Blume
213    Date: 2000/03/30 18:00:00 JST
214    Tag: blume_main_v110p26p2_0
215    Description:
216    
217    !!!!! WARNING !!!!!!
218    !!  New binfiles  !!
219    !!!!!!!!!!!!!!!!!!!!
220    
221    This update contains:
222    
223    1. Moderate changes to CM:
224    
225       - Changes to CM's tools mechanism.  In particular, it is now possible
226       to have tools that accept additional "command line" parameters
227       (specified in the .cm file at each instance where the tool's class is
228       used).
229    
230       This was done to accomodate the new "make" and "shell" tools which
231       facilitate fairly seemless hookup to portions of code managed using
232       Makefiles or Shell scripts.
233    
234       There are no classes "shared" or "private" anymore.  Instead, the
235       sharing annotation is now a parameter to the "sml" class.
236    
237       There is a bit of generic machinery for implementing one's own
238       tools that accept command-line parameters.  However, I am not yet fully
239       satisfied with that part, so expect changes here in the future.
240    
241       All existing tools are described in the CM manual.
242    
243       - Slightly better error handling.  (CM now surpresses many followup
244       error messages that tended to be more annoying than helpful.)
245    
246    2. Major changes to the compiler's static environment data structures.
247    
248       - no CMStaticEnv anymore.
249            - no CMEnv, no "BareEnvironment" (actually, _only_ BareEnvironment,
250              but it is called Environment), no conversions between different
251              kinds of static environments
252    
253       - There is still a notion of a "modmap", but such modmaps are generated
254         on demand at the time when they are needed.  This sounds slow, but I
255         sped up the code that generates modmaps enough for this not to lead to
256         a slowdown of the compiler (at least I didn't detect any).
257    
258       - To facilitate rapid modmap generation, static environments now
259         contain an (optional) "modtree" structure.  Modtree annotations are
260         constructed by the unpickler during unpickling.  (This means that
261         the elaborator does not have to worry about modtrees at all.)
262         Modtrees have the advantage that they are compositional in the same
263         way as the environment data structure itself is compositional.
264         As a result, modtrees never hang on to parts of an environment that
265         has already been rendered "stale" by filtering or rebinding.
266    
267       - I went through many, many trials and errors before arriving at the
268         current solution.  (The initial idea of "linkpaths" did not work.)
269         But the result of all this is that I have touched a lot of files that
270         depend on the "modules" and "types" data structures (most of the
271         elaborator). There were a lot of changes during my "linkpath" trials
272         that could have been reverted to their original state but weren't.
273         Please, don't be too harsh on me for messing with this code a bit more
274         than what was strictly necessary...  (I _did_ resist the tempation
275         of doing any "global reformatting" to avoid an untimely death at
276         Dave's hands. :)
277    
278       - One positive aspect of the previous point:  At least I made sure that
279         all files that I touched now compile without warnings (other than
280         "polyEqual").
281    
282       - compiler now tends to run "leaner" (i.e., ties up less memory in
283         redundant modmaps)
284    
285    ----------------------------------------------------------------------
286    Name: Allen Leung
287    Date: 2000/03/29 18:00:00
288    Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
289    Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz
290    Description:
291    
292       This update contains *MAJOR* changes to the way code is generated from CPS
293    in the module mlriscGen, and in various backend modules.
294    
295    CHANGES
296    =======
297    
298    1. MLRiscGen: forward propagation fix.
299    
300       There was a bug in forward propagation introduced at about the same time
301       as the MLRISC x86 backend, which prohibits coalescing to be
302       performed effectively in loops.
303    
304       Effect: speed up of loops in RISC architectures.
305               By itself, this actually slowed down certain benchmarks on the x86.
306    
307    2. MLRiscGen:  forward propagating addresses from consing.
308    
309       I've changed the way consing code is generated.  Basically I separated
310       out the initialization part:
311    
312            store tag,   offset(allocptr)
313            store elem1, offset+4(allocptr)
314            store elem2, offset+8(allocptr)
315            ...
316            store elemn, offset+4n(allocptr)
317    
318       and the address computation part:
319    
320            celladdr <- offset+4+alloctpr
321    
322       and move the address computation part
323    
324       Effect:  register pressure is generally lower as a result.  This
325                makes compilation of certain expressions much faster, such as
326                long lists with non-trivial elements.
327    
328                 [(0,0), (0,0), .... (0,0)]
329    
330    3. MLRiscGen: base pointer elimination.
331    
332        As part of the linkage mechanism, we generate the sequence:
333    
334         L:  ...  <- start of the code fragment
335    
336         L1:
337             base pointer <- linkreg - L1 + L
338    
339         The base pointer was then used for computing relocatable addresses
340       in the code fragment.  Frequently (such as in lots of continuations)
341       this is not needed.  We now eliminate this sequence whenever possible.
342    
343         For compile time efficiency, I'm using a very stupid local heuristic.
344       But in general, this should be done as a control flow analysis.
345    
346       Effect:  Smaller code size.  Speed up of most programs.
347    
348    4. Hppa back end
349    
350         Long jumps in span dependence resolution used to depend on the existence
351      of the base pointer.
352    
353         A jump to a long label L was expanded into the following sequence:
354    
355          LDIL %hi(L-8192), %r29
356          LDO  %lo(L-8192)(%r29), %r29
357          ADD  %r29, baseptr, %r29
358          BV,n %r0(%r29)
359    
360         In the presence of change (3) above, this will not work.  I've changed
361       it so that the following sequence of instructions are generated, which
362       doesn't mention the base pointer at all:
363    
364             BL,n  L', %r29           /* branch and link, L' + 4 -> %r29 */
365        L':  ADDIL L-(L'+4), %r29     /* Compute address of L */
366             BV,n  %r0(%r29)          /* Jump */
367    
368    5. Alpha back end
369    
370          New alpha instructions LDB/LDW have been added, as per Fermin's
371       suggestions.   This is unrelated to all other changes.
372    
373    6. X86 back end
374    
375         I've changed andl to testl in the floating point test sequence
376         whenever appropriate.  The Intel optimization guide states that
377         testl is perferable to andl.
378    
379    7. RA (x86 only)
380    
381         I've improved the spill propagation algorithm, using an approximation
382       of maximal weighted independent sets.   This seems to be necessary to
383       alleviate the negative effect in light of the slow down in (1).
384    
385         I'll write down the algorithm one of these days.
386    
387    8. MLRiscGen: frequencies
388    
389         I've added an annotation that states that all call gc blocks have zero
390       execution frequencies.  This improves register allocation on the x86.
391    
392    BENCHMARKS
393    ==========
394    
395       I've only perform the comparison on 110.25.
396    
397       The platforms are:
398    
399        HPPA  A four processor HP machine (E9000) with 5G of memory.
400        X86   A 300Hhz Pentium II with 128M of memory, and
401        SPARC An Ultra sparc 2 with 512M of memory.
402    
403       I used the following parameters for the SML benchmarks:
404    
405                 @SMLalloc
406         HPPA    256k
407         SPARC   512k
408         X86     256k
409    
410    COMPILATION TIME
411    ----------------
412       Here are the numbers comparing the compilation times of the compilers.
413       I've only compared 110.25 compiling the new sources versus
414       a fixpoint version of the new compiler compiling the same.
415    
416                     110.25                                  New
417               Total  Time in RA  Spill+Reload   Total  Time In RA Spill+Reload
418         HPPA   627s    116s        2684+3584     599s    95s       1003+1879
419         SPARC  892s    173s        2891+3870     708s    116s      1004+1880
420         X86    999s    315s       94006+130691   987s    296s    108877+141957
421    
422                   110.25         New
423                Code Size      Code Size
424         HPPA   8596736         8561421
425         SPARC  8974299         8785143
426         X86    9029180         8716783
427    
428       So in summary, things are at least as good as before.   Dramatic
429       reduction in compilation is obtained on the Sparc; I can't explain it,
430       but it is reproducible.  Perhaps someone should try to reproduce this
431       on their own machines.
432    
433    SML BENCHMARKS
434    --------------
435    
436        On the average, all benchmarks perform at least as well as before.
437    
438          HPPA         Compilation Time     Spill+Reload      Run Time
439                     110.25  New            110.25    New   110.25  New
440    
441          barnesHut  3.158  3.015  4.75%    1+1       0+0   2.980  2.922   2.00%
442              boyer  6.152  5.708  7.77%    0+0       0+0   0.218  0.213   2.34%
443       count-graphs  1.168  1.120  4.32%    0+0       0+0  22.705 23.073  -1.60%
444                fft  0.877  0.792 10.74%    1+3       1+3   0.602  0.587   2.56%
445        knuthBendix  3.180  2.857 11.32%    0+0       0+0   0.675  0.662   2.02%
446             lexgen  6.190  5.290 17.01%    0+0       0+0   0.913  0.788  15.86%
447               life  0.803  0.703 14.22%   25+25      0+0   0.153  0.140   9.52%
448              logic  2.048  2.007  2.08%    6+6       1+1   4.133  4.008   3.12%
449         mandelbrot  0.077  0.080 -4.17%    0+0       0+0   0.765  0.712   7.49%
450             mlyacc 22.932 20.937  9.53%  154+181    32+57  0.468  0.430   8.91%
451            nucleic  5.183  5.060  2.44%    2+2       0+0   0.125  0.120   4.17%
452      ratio-regions  3.357  3.142  6.84%    0+0       0+0  116.225 113.173 2.70%
453                ray  1.283  1.290 -0.52%    0+0       0+0   2.887  2.855   1.11%
454             simple  6.307  6.032  4.56%   28+30      5+7   3.705  3.658   1.28%
455                tsp  0.888  0.862  3.09%    0+0       0+0   7.040  6.893   2.13%
456               vliw 24.378 23.455  3.94%  106+127    25+45  2.758  2.707   1.91%
457      --------------------------------------------------------------------------
458       Average                     6.12%                                   4.09%
459    
460          SPARC        Compilation Time     Spill+Reload      Run Time
461                     110.25  New            110.25    New   110.25  New
462    
463          barnesHut  3.778  3.592  5.20%    2+2       0+0   3.648  3.453    5.65%
464              boyer  6.632  6.110  8.54%    0+0       0+0   0.258  0.242    6.90%
465       count-graphs  1.435  1.325  8.30%    0+0       0+0  33.672 34.737   -3.07%
466                fft  0.980  0.940  4.26%    3+9       2+6   0.838  0.827    1.41%
467        knuthBendix  3.590  3.138 14.39%    0+0       0+0   0.962  0.967   -0.52%
468             lexgen  6.593  6.072  8.59%    1+1       0+0   1.077  1.078   -0.15%
469               life  0.972  0.868 11.90%   26+26      0+0   0.143  0.140    2.38%
470              logic  2.525  2.387  5.80%    7+7       1+1   5.625  5.158    9.05%
471         mandelbrot  0.090  0.093 -3.57%    0+0       0+0   0.855  0.728   17.39%
472             mlyacc 26.732 23.827 12.19%  162+189    32+57  0.550  0.560   -1.79%
473            nucleic  6.233  6.197  0.59%    3+3       0+0   0.163  0.173   -5.77%
474      ratio-regions  3.780  3.507  7.79%    0+0       0+0 133.993 131.035   2.26%
475                ray  1.595  1.550  2.90%    1+1       0+0   3.440  3.418    0.63%
476             simple  6.972  6.487  7.48%   29+32      5+7   3.523  3.525   -0.05%
477                tsp  1.115  1.063  4.86%    0+0       0+0   7.393  7.265    1.77%
478               vliw 27.765 24.818 11.87%  110+135    25+45  2.265  2.135    6.09%
479      ----------------------------------------------------------------------------
480       Average                     6.94%                                    2.64%
481    
482          X86          Compilation Time     Spill+Reload      Run Time
483                     110.25  New            110.25    New   110.25  New
484    
485          barnesHut  5.530  5.420  2.03%  593+893   597+915   3.532  3.440   2.66%
486              boyer  8.768  7.747 13.19%  493+199   301+289   0.327  0.297  10.11%
487       count-graphs  2.040  2.010  1.49%  298+394   315+457  26.578 28.660  -7.26%
488                fft  1.327  1.302  1.92%  112+209   115+210   1.055  0.962   9.71%
489        knuthBendix  5.218  5.475 -4.69%  451+598   510+650   0.928  0.932  -0.36%
490             lexgen  9.970  9.623  3.60% 1014+841  1157+885   0.947  0.928   1.97%
491               life  1.183  1.183  0.00%  162+182   145+148   0.127  0.103  22.58%
492              logic  3.285  3.512 -6.45%  514+684   591+836   5.682  5.577   1.88%
493         mandelbrot  0.147  0.143  2.33%   38+41     33+54    0.703  0.690   1.93%
494             mlyacc 35.457 32.763  8.22% 3496+4564 3611+4860  0.552  0.550   0.30%
495            nucleic  7.100  6.888  3.07%  239+168   201+158   0.175  0.173   0.96%
496      ratio-regions  6.388  6.843 -6.65% 1182+257   981+300  120.142 120.345 -0.17%
497                ray  2.332  2.338 -0.29%  346+398   402+494   3.593  3.540   1.51%
498             simple  9.912  9.903  0.08% 1475+941  1579+1168  3.057  3.178  -3.83%
499                tsp  1.623  1.532  5.98%  266+200   250+211   8.045  7.878   2.12%
500               vliw 33.947 35.470 -4.29% 2629+2774 2877+3171  2.072  1.890   9.61%
501      ----------------------------------------------------------------------------
502       Average                     1.22%                                     3.36%
503    
504    ----------------------------------------------------------------------
505    Name: Allen Leung
506    Date: 2000/03/23 16:25:00
507    Tag: leunga-20000323-fix_x86_alpha
508    Description:
509    
510    1. X86 fixes/changes
511    
512       a.  The old code generated for SETcc was completely wrong.
513           The Intel optimization guide is VERY misleading.
514    
515    2. ALPHA fixes/changes
516    
517       a.  Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
518       b.  Added a new mode byteWordLoadStores to the functor parameter to Alpha()
519       c.  Added reassociation code for address computation.
520    
521    ----------------------------------------------------------------------
522    Name: Allen Leung
523  Date: 2000/03/22 01:23:00  Date: 2000/03/22 01:23:00
524  Tag: leunga-20000322-fix_x86_hppa_ra  Tag: leunga-20000322-fix_x86_hppa_ra
525  Description:  Description:

Legend:
Removed from v.580  
changed lines
  Added in v.601

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0