Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/HISTORY
ViewVC logotype

Annotation of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log


Revision 586 - (view) (download)

1 : dbm 570 This is the HISTORY file for the Yale SML/NJ CVS repository.
2 :    
3 :     An entry should be made for _every_ commit to the repository.
4 :     The entries in this file will be used when creating the README
5 :     for new versions, so keep that in mind when writing the
6 :     description.
7 :    
8 :     The form of an entry should be:
9 :    
10 :     Name:
11 :     Date:
12 :     Tag: <post-commit CVS tag>
13 :     Description:
14 : leunga 585
15 : leunga 576 ----------------------------------------------------------------------
16 : leunga 580 Name: Allen Leung
17 : leunga 585 Date: 2000/03/29 18:00:00
18 :     Tag: leunga-20000327-mlriscGen_hppa_alpha_x86
19 : leunga 586 Boot files (optional): ftp://react-ilp.cs.nyu.edu/leunga/110.26.1-sml.boot.x86-unix-20000330.tar.gz
20 : leunga 585 Description:
21 :    
22 :     This update contains *MAJOR* changes to the way code is generated from CPS
23 :     in the module mlriscGen, and in various backend modules.
24 :    
25 :     CHANGES
26 :     =======
27 :    
28 :     1. MLRiscGen: forward propagation fix.
29 :    
30 :     There was a bug in forward propagation introduced at about the same time
31 :     as the MLRISC x86 backend, which prohibits coalescing to be
32 :     performed effectively in loops.
33 :    
34 :     Effect: speed up of loops in RISC architectures.
35 :     By itself, this actually slowed down certain benchmarks on the x86.
36 :    
37 :     2. MLRiscGen: forward propagating addresses from consing.
38 :    
39 :     I've changed the way consing code is generated. Basically I separated
40 :     out the initialization part:
41 :    
42 :     store tag, offset(allocptr)
43 :     store elem1, offset+4(allocptr)
44 :     store elem2, offset+8(allocptr)
45 :     ...
46 :     store elemn, offset+4n(allocptr)
47 :    
48 :     and the address computation part:
49 :    
50 :     celladdr <- offset+4+alloctpr
51 :    
52 :     and move the address computation part
53 :    
54 :     Effect: register pressure is generally lower as a result. This
55 :     makes compilation of certain expressions much faster, such as
56 :     long lists with non-trivial elements.
57 :    
58 :     [(0,0), (0,0), .... (0,0)]
59 :    
60 :     3. MLRiscGen: base pointer elimination.
61 :    
62 :     As part of the linkage mechanism, we generate the sequence:
63 :    
64 :     L: ... <- start of the code fragment
65 :    
66 :     L1:
67 :     base pointer <- linkreg - L1 + L
68 :    
69 :     The base pointer was then used for computing relocatable addresses
70 :     in the code fragment. Frequently (such as in lots of continuations)
71 :     this is not needed. We now eliminate this sequence whenever possible.
72 :    
73 :     For compile time efficiency, I'm using a very stupid local heuristic.
74 :     But in general, this should be done as a control flow analysis.
75 :    
76 :     Effect: Smaller code size. Speed up of most programs.
77 :    
78 :     4. Hppa back end
79 :    
80 :     Long jumps in span dependence resolution used to depend on the existence
81 :     of the base pointer.
82 :    
83 :     A jump to a long label L was expanded into the following sequence:
84 :    
85 :     LDIL %hi(L-8192), %r29
86 :     LDO %lo(L-8192)(%r29), %r29
87 :     ADD %r29, baseptr, %r29
88 :     BV,n %r0(%r29)
89 :    
90 :     In the presence of change (3) above, this will not work. I've changed
91 :     it so that the following sequence of instructions are generated, which
92 :     doesn't mention the base pointer at all:
93 :    
94 :     BL,n L', %r29 /* branch and link, L' + 4 -> %r29 */
95 :     L': ADDIL L-(L'+4), %r29 /* Compute address of L */
96 :     BV,n %r0(%r29) /* Jump */
97 :    
98 :     5. Alpha back end
99 :    
100 :     New alpha instructions LDB/LDW have been added, as per Fermin's
101 :     suggestions. This is unrelated to all other changes.
102 :    
103 :     6. X86 back end
104 :    
105 :     I've changed andl to testl in the floating point test sequence
106 :     whenever appropriate. The Intel optimization guide states that
107 :     testl is perferable to andl.
108 :    
109 :     7. RA (x86 only)
110 :    
111 :     I've improved the spill propagation algorithm, using an approximation
112 :     of maximal weighted independent sets. This seems to be necessary to
113 :     alleviate the negative effect in light of the slow down in (1).
114 :    
115 :     I'll write down the algorithm one of these days.
116 :    
117 :     8. MLRiscGen: frequencies
118 :    
119 :     I've added an annotation that states that all call gc blocks have zero
120 :     execution frequencies. This improves register allocation on the x86.
121 :    
122 :     BENCHMARKS
123 :     ==========
124 :    
125 :     I've only perform the comparison on 110.25.
126 :    
127 :     The platforms are:
128 :    
129 :     HPPA A four processor HP machine (E9000) with 5G of memory.
130 :     X86 A 300Hhz Pentium II with 128M of memory, and
131 :     SPARC An Ultra sparc 2 with 512M of memory.
132 :    
133 :     I used the following parameters for the SML benchmarks:
134 :    
135 :     @SMLalloc
136 :     HPPA 256k
137 :     SPARC 512k
138 :     X86 256k
139 :    
140 :     COMPILATION TIME
141 :     ----------------
142 :     Here are the numbers comparing the compilation times of the compilers.
143 :     I've only compared 110.25 compiling the new sources versus
144 :     a fixpoint version of the new compiler compiling the same.
145 :    
146 :     110.25 New
147 :     Total Time in RA Spill+Reload Total Time In RA Spill+Reload
148 :     HPPA 627s 116s 2684+3584 599s 95s 1003+1879
149 :     SPARC 892s 173s 2891+3870 708s 116s 1004+1880
150 :     X86 999s 315s 94006+130691 987s 296s 108877+141957
151 :    
152 :     110.25 New
153 :     Code Size Code Size
154 :     HPPA 8596736 8561421
155 :     SPARC 8974299 8785143
156 :     X86 9029180 8716783
157 :    
158 :     So in summary, things are at least as good as before. Dramatic
159 :     reduction in compilation is obtained on the Sparc; I can't explain it,
160 :     but it is reproducible. Perhaps someone should try to reproduce this
161 :     on their own machines.
162 :    
163 :     SML BENCHMARKS
164 :     --------------
165 :    
166 :     On the average, all benchmarks perform at least as well as before.
167 :    
168 :     HPPA Compilation Time Spill+Reload Run Time
169 :     110.25 New 110.25 New 110.25 New
170 :    
171 :     barnesHut 3.158 3.015 4.75% 1+1 0+0 2.980 2.922 2.00%
172 :     boyer 6.152 5.708 7.77% 0+0 0+0 0.218 0.213 2.34%
173 :     count-graphs 1.168 1.120 4.32% 0+0 0+0 22.705 23.073 -1.60%
174 :     fft 0.877 0.792 10.74% 1+3 1+3 0.602 0.587 2.56%
175 :     knuthBendix 3.180 2.857 11.32% 0+0 0+0 0.675 0.662 2.02%
176 :     lexgen 6.190 5.290 17.01% 0+0 0+0 0.913 0.788 15.86%
177 :     life 0.803 0.703 14.22% 25+25 0+0 0.153 0.140 9.52%
178 :     logic 2.048 2.007 2.08% 6+6 1+1 4.133 4.008 3.12%
179 :     mandelbrot 0.077 0.080 -4.17% 0+0 0+0 0.765 0.712 7.49%
180 :     mlyacc 22.932 20.937 9.53% 154+181 32+57 0.468 0.430 8.91%
181 :     nucleic 5.183 5.060 2.44% 2+2 0+0 0.125 0.120 4.17%
182 :     ratio-regions 3.357 3.142 6.84% 0+0 0+0 116.225 113.173 2.70%
183 :     ray 1.283 1.290 -0.52% 0+0 0+0 2.887 2.855 1.11%
184 :     simple 6.307 6.032 4.56% 28+30 5+7 3.705 3.658 1.28%
185 :     tsp 0.888 0.862 3.09% 0+0 0+0 7.040 6.893 2.13%
186 :     vliw 24.378 23.455 3.94% 106+127 25+45 2.758 2.707 1.91%
187 :     --------------------------------------------------------------------------
188 :     Average 6.12% 4.09%
189 :    
190 :     SPARC Compilation Time Spill+Reload Run Time
191 :     110.25 New 110.25 New 110.25 New
192 :    
193 :     barnesHut 3.778 3.592 5.20% 2+2 0+0 3.648 3.453 5.65%
194 :     boyer 6.632 6.110 8.54% 0+0 0+0 0.258 0.242 6.90%
195 :     count-graphs 1.435 1.325 8.30% 0+0 0+0 33.672 34.737 -3.07%
196 :     fft 0.980 0.940 4.26% 3+9 2+6 0.838 0.827 1.41%
197 :     knuthBendix 3.590 3.138 14.39% 0+0 0+0 0.962 0.967 -0.52%
198 :     lexgen 6.593 6.072 8.59% 1+1 0+0 1.077 1.078 -0.15%
199 :     life 0.972 0.868 11.90% 26+26 0+0 0.143 0.140 2.38%
200 :     logic 2.525 2.387 5.80% 7+7 1+1 5.625 5.158 9.05%
201 :     mandelbrot 0.090 0.093 -3.57% 0+0 0+0 0.855 0.728 17.39%
202 :     mlyacc 26.732 23.827 12.19% 162+189 32+57 0.550 0.560 -1.79%
203 :     nucleic 6.233 6.197 0.59% 3+3 0+0 0.163 0.173 -5.77%
204 :     ratio-regions 3.780 3.507 7.79% 0+0 0+0 133.993 131.035 2.26%
205 :     ray 1.595 1.550 2.90% 1+1 0+0 3.440 3.418 0.63%
206 :     simple 6.972 6.487 7.48% 29+32 5+7 3.523 3.525 -0.05%
207 :     tsp 1.115 1.063 4.86% 0+0 0+0 7.393 7.265 1.77%
208 :     vliw 27.765 24.818 11.87% 110+135 25+45 2.265 2.135 6.09%
209 :     ----------------------------------------------------------------------------
210 :     Average 6.94% 2.64%
211 :    
212 :     X86 Compilation Time Spill+Reload Run Time
213 :     110.25 New 110.25 New 110.25 New
214 :    
215 :     barnesHut 5.530 5.420 2.03% 593+893 597+915 3.532 3.440 2.66%
216 :     boyer 8.768 7.747 13.19% 493+199 301+289 0.327 0.297 10.11%
217 :     count-graphs 2.040 2.010 1.49% 298+394 315+457 26.578 28.660 -7.26%
218 :     fft 1.327 1.302 1.92% 112+209 115+210 1.055 0.962 9.71%
219 :     knuthBendix 5.218 5.475 -4.69% 451+598 510+650 0.928 0.932 -0.36%
220 :     lexgen 9.970 9.623 3.60% 1014+841 1157+885 0.947 0.928 1.97%
221 :     life 1.183 1.183 0.00% 162+182 145+148 0.127 0.103 22.58%
222 :     logic 3.285 3.512 -6.45% 514+684 591+836 5.682 5.577 1.88%
223 :     mandelbrot 0.147 0.143 2.33% 38+41 33+54 0.703 0.690 1.93%
224 :     mlyacc 35.457 32.763 8.22% 3496+4564 3611+4860 0.552 0.550 0.30%
225 :     nucleic 7.100 6.888 3.07% 239+168 201+158 0.175 0.173 0.96%
226 :     ratio-regions 6.388 6.843 -6.65% 1182+257 981+300 120.142 120.345 -0.17%
227 :     ray 2.332 2.338 -0.29% 346+398 402+494 3.593 3.540 1.51%
228 :     simple 9.912 9.903 0.08% 1475+941 1579+1168 3.057 3.178 -3.83%
229 :     tsp 1.623 1.532 5.98% 266+200 250+211 8.045 7.878 2.12%
230 :     vliw 33.947 35.470 -4.29% 2629+2774 2877+3171 2.072 1.890 9.61%
231 :     ----------------------------------------------------------------------------
232 :     Average 1.22% 3.36%
233 :    
234 :     ----------------------------------------------------------------------
235 :     Name: Allen Leung
236 : leunga 583 Date: 2000/03/23 16:25:00
237 :     Tag: leunga-20000323-fix_x86_alpha
238 :     Description:
239 :    
240 :     1. X86 fixes/changes
241 :    
242 :     a. The old code generated for SETcc was completely wrong.
243 :     The Intel optimization guide is VERY misleading.
244 :    
245 :     2. ALPHA fixes/changes
246 :    
247 :     a. Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
248 :     b. Added a new mode byteWordLoadStores to the functor parameter to Alpha()
249 :     c. Added reassociation code for address computation.
250 :    
251 :     ----------------------------------------------------------------------
252 :     Name: Allen Leung
253 : leunga 580 Date: 2000/03/22 01:23:00
254 :     Tag: leunga-20000322-fix_x86_hppa_ra
255 :     Description:
256 :    
257 :     1. X86 fixes/changes
258 :    
259 :     a. x86Rewrite bug with MUL3 (found by Lal)
260 :     b. Added the instructions FSTS, FSTL
261 :    
262 :     2. PA-RISC fixes/changes
263 :    
264 :     a. B label should not be a delay slot candidate! Why did this work?
265 :     b. ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
266 :     as it should be.
267 :     c. The assembly syntax for fstds and fstdd was wrong.
268 :     d. Added the composite instruction COMICLR/LDO, which is the immediate
269 :     operand variant of COMCLR/LDO.
270 :    
271 :     3. Generic MLRISC
272 :    
273 :     a. shuffle.sml rewritten to be slightly more efficient
274 :     b. DIV bug in mltree-simplify fixed (found by Fermin)
275 :    
276 :     4. Register Allocator
277 :    
278 :     a. I now release the interference graph earlier during spilling.
279 :     May improve memory usage.
280 :    
281 :     ----------------------------------------------------------------------
282 : blume 577 Name: Matthias Blume
283 : blume 578 Date: 2000/03/14 14:15:32
284 :     Tag: blume_main_v110p26p1_2
285 :     Description:
286 :    
287 :     1. Tools.registerStdShellCmdTool (from smlnj/cm/tool.cm) takes an
288 :     additional argument called "template" which is an optional string that
289 :     specifiel the layout of the tool command line. See the CM manual for
290 :     explanation.
291 :    
292 :     2. A special-purpose tool can be "regisitered" by simply dropping the
293 :     corresponding <...>-tool.cm (and/or <...>-ext.cm) into the same
294 :     directory where the .cm file lives that uses this tool. (The
295 :     behavior/misfeature until now was to look for the tool description
296 :     files in the current working directory.) As before, tool description
297 :     files could also be anchored -- in which case they can live anywhere
298 :     they like. Following the recent e-mail discussion, this change should
299 :     make it easier to have special-purpose tools that are shipped together
300 :     with the sources of the program that uses them.
301 :    
302 :     ----------------------------------------------------------------------
303 :     Name: Matthias Blume
304 : blume 577 Date: 2000/03/10 07:48:34
305 :     Tag: blume_main_v110p26p1_1
306 :     Description:
307 :    
308 :     I added a re-written version of Dave's fixpt script to src/system.
309 :     Changes relative to the original version:
310 :     - sh-ified (not everybody has ksh)
311 :     - automatically figures out which architecture it runs on
312 :     - uses ./makeml a bit more cleverly
313 :     - never invokes ./installml (and, thus, does not clobber your
314 :     good and working installation of sml in case something goes wrong)
315 :     - accepts max iteration count using option "-iter <n>"
316 :     - accepts a "base" name using option "-base <base>"
317 :    
318 :     It does not build any extraneous heap images but directly rebuilds
319 :     bin- and boot-hierarchies using makeml's "-rebuild" switch. Finally,
320 :     it can incorporate existing bin- and boot- hierarchies. For example,
321 :     suppose the base is set to "sml" (which is the default). Then it
322 :     successively builds
323 :    
324 :     sml.bin.<arch>-unix and sml.boot.<arch>-unix
325 :     then sml1.bin.<arch>-unix and sml1.boot.<arch>-unix
326 :     then sml2.bin.<arch>-unix and sml2.boot.<arch>-unix
327 :     ...
328 :     then sml<n>.bin.<arch>-unix and sml<n>.boot.<arch>-unix
329 :    
330 :     and so on. If any of these already exist, it will just use what's
331 :     there. In particular, many people will have the initial set of bin
332 :     and boot files around, so this saves time for at least one full
333 :     rebuild. Having sets of the form <base><k>.{bin,boot}.<arch>-unix for
334 :     <k>=1,2,... is normally not a good idea when invoking fixpt. However,
335 :     they might be the result of an earlier partial run of fixpt (which
336 :     perhaps got accidentially killed). In this case, fixpt will quickly
337 :     move through what exists before continuing where it left off earlier,
338 :     and, thus, saves a lot of time.
339 :    
340 :     ----------------------------------------------------------------------
341 : leunga 576 Name: Allen Leung
342 :     Date: 00/03/10 02:20:00
343 :     Tag: leunga-20000310-fix_x86_asm_ra
344 :     Description:
345 : dbm 570
346 : leunga 576 More assembly output problems involving the indexed addressing mode
347 :     on the x86 have been found and corrected. Thanks to Fermin Reig for the
348 :     fix.
349 :    
350 :     The interface and implementation of the register allocator have been changed
351 :     slightly to accommodate the possibility to skip the register allocation
352 :     phases completely and go directly to memory allocation. This is needed
353 :     for C-- use.
354 :    
355 : dbm 570 ----------------------------------------------------------------------
356 : blume 572 Name: Matthias Blume
357 : blume 575 Date: 00/03/09 10:23:53
358 :     Tag: blume_main_v110p26p1_0
359 :     Description:
360 :    
361 :     * Complete re-organization of library names. Many libraries have been
362 :     consolidated so that they share the same path anchor. For example,
363 :     all MLRISC-related libraries are anchored at MLRISC, most libraries that
364 :     are SML/NJ-specific are under "smlnj". Notice that names like
365 :     host-cmb.cm or host-compiler.cm no longer exist. See system/README
366 :     for a complete description of the new naming scheme. Quick reference:
367 :    
368 :     host-cmb.cm -> smlnj/cmb.cm
369 :     host-compiler.cm -> smlnj/compiler.cm
370 :     full-cm.cm -> smlnj/cm.cm
371 :     <arch>-<os>.cm -> smlnj/cmb/<arch>-<os>.cm
372 :     <arch>-compiler.cm -> smlnj/compiler/<arch>.cm
373 :    
374 :     * Bug fixes in CM.
375 :     - exceptions in user code are being passed through (i.e., reach top level)
376 :     - more bugs in paranoia mode fixed
377 :     - bug related to checking group owners fixed
378 :    
379 :     * New install.sh script that automagically fetches archive files:
380 :     The new file config/srcarchiveurl must contain the URL of the
381 :     (remote) directory that contains bin files (or other source archives).
382 :     If install.sh does not find the archive locally, it tries to get
383 :     it from that remote directory.
384 :     This should simplify installation further: For machines that have
385 :     access to the internet, just fetch <version>-config.tgz, unpack it,
386 :     edit config/targets, and go (run config/install.sh). The scipt will
387 :     fetch everything else that it might need all by itself.
388 :    
389 :     For CVS users, this mechanism is not relevant for source archives, but
390 :     it is convenient for getting new sets of binfiles.
391 :    
392 :     Archives should be tar files compressed with either gzip, compress, or
393 :     bzip2. The script recognizes .tgz, .tar, tar.gz, tz, .tar.Z, and .tar.bz2.
394 :    
395 :     ----------------------------------------------------------------------
396 :     Name: Matthias Blume
397 : blume 572 Date: 2000/03/07 04:01:04
398 :     Tag: blume_main_v110_26_2
399 : dbm 570 Description:
400 : blume 572 - size info in BOOTLIST
401 :     * no fixed upper limits for number of bootfiles or length of
402 :     bootfile names in runtime
403 :     * falling back to old behavior if no BOOTLIST size info found
404 :     - allocation size heuristics in .run-sml
405 :     * tries to read cache size from /proc/cpuinfo (this is important for
406 :     small-cache Celeron systems!)
407 :     - install.sh robustified
408 :     - CM manual updates
409 :     - paranoid mode
410 :     * no more CMB.deliver() (i.e., all done by CMB.make())
411 :     * can re-use existing sml.boot.* files
412 :     * init.cmi now treated as library
413 :     * library stamps for consistency checks
414 :     - sml.boot.<arch>-<os>/PIDMAP file
415 :     * This file is read by the CM startup code. This is used to minimize
416 :     the amount of dynamic state that needs to be stowed away for the
417 :     purpose of sharing between interactive system and user code.
418 :     - CM.Anchor.anchor instead of CM.Anchor.{set,cancel}
419 :     * Upon request by Elsa. Anchors now controlled by get-set-pair
420 :     like most other CM state variables.
421 :     - Compiler.CMSA eliminated
422 :     * No longer supported by CM anyway.
423 :     - fixed bugs in pickler that kept biting Stefan
424 :     * past refs to past refs (was caused by the possibility that
425 :     ad-hoc sharing is more discriminating than hash-cons sharing)
426 :     * integer overflow on LargeInt.minInt
427 :     - ml-{lex,yacc} build scripts now use new mechanism
428 :     for building standalone programs
429 :     - fixed several gcc -Wall warnings that were caused by missing header
430 :     files, missing initializations, etc., in runtime (not all warnings
431 :     eliminated, though)

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0