1 : |
dbm |
610 |
|
2 : |
|
|
S M L / N J
|
3 : |
|
|
|
4 : |
|
|
1 1 0 . 2 7 N E W S
|
5 : |
|
|
|
6 : |
|
|
April 10, 2000
|
7 : |
|
|
|
8 : |
|
|
WARNING
|
9 : |
|
|
|
10 : |
|
|
This version is intended for compiler hackers.
|
11 : |
|
|
We are in the midst of substantial structural changes,
|
12 : |
|
|
and this is a snapshot.
|
13 : |
|
|
|
14 : |
|
|
http://cm.bell-labs.com/cm/cs/what/smlnj/index.html
|
15 : |
|
|
|
16 : |
|
|
------------------------------------------------------------------------
|
17 : |
|
|
Summary:
|
18 : |
|
|
|
19 : |
dbm |
619 |
* This version has some minor tweeks to FLINT (after the major merge
|
20 : |
|
|
in 110.26). Work continues on tuning FLINT and the various optimizations
|
21 : |
|
|
it implements.
|
22 : |
dbm |
610 |
|
23 : |
dbm |
619 |
* CM has been revised extensively, and the modmap environment mechanism
|
24 : |
|
|
supporting stubbified pickles has been reworked completely. The pathconfig
|
25 : |
|
|
file has been simplified. Installation scripts have been further
|
26 : |
|
|
modified. See src/system/README and the latest version of the
|
27 : |
|
|
CM manual at
|
28 : |
dbm |
610 |
|
29 : |
dbm |
619 |
<//http://www.kurims.kyoto-u.ac.jp/~blume/SMLNJ-DEV/manual/index.html>
|
30 : |
|
|
<//http://www.kurims.kyoto-u.ac.jp/~blume/SMLNJ-DEV/manual.ps>
|
31 : |
dbm |
616 |
|
32 : |
dbm |
619 |
for further information about these changes.
|
33 : |
dbm |
616 |
|
34 : |
dbm |
619 |
* MLRISC, and particularly the x86 back end have been modified extensively.
|
35 : |
dbm |
610 |
|
36 : |
dbm |
619 |
* There are a few updates to the SML/NJ Library
|
37 : |
dbm |
610 |
|
38 : |
dbm |
619 |
* Reported bug fixes:
|
39 : |
|
|
1556. (jhr) signal race condition
|
40 : |
|
|
Some CM bugs (not recorded)
|
41 : |
dbm |
610 |
|
42 : |
dbm |
619 |
* Distribution file names have been simplified. They no longer start
|
43 : |
|
|
with the version number (e.g. "110.27-config.tar.gz" is now
|
44 : |
|
|
simply "config.tar.gz"). The boot directory tarballs are now
|
45 : |
|
|
"boot.alpha32-unix.tar.gz", etc. (i.e. no version number and the
|
46 : |
|
|
"sml." prefix is dropped). The new install script will restore
|
47 : |
|
|
the usual name (e.g. "sml.boot.alpha32-unix" when the tarball is
|
48 : |
|
|
unpacked. [We dropped the initial "sml." for the boot tarballs to
|
49 : |
|
|
get the file names under 28 characters because of a limitation of
|
50 : |
|
|
the Bell Labs ftp server.]
|
51 : |
|
|
The version README file is still named 110.27-README, however.
|
52 : |
dbm |
610 |
|
53 : |
dbm |
619 |
110.27-README
|
54 : |
|
|
HISTORY
|
55 : |
|
|
MLRISC.tar.gz
|
56 : |
|
|
boot.alpha32-unix.tar.gz
|
57 : |
|
|
boot.hppa-unix.tar.gz
|
58 : |
|
|
boot.ppc-unix.tar.gz
|
59 : |
|
|
boot.sparc-unix.tar.gz
|
60 : |
|
|
boot.x86-unix.tar.gz
|
61 : |
|
|
cm.tar.gz
|
62 : |
|
|
compiler.tar.gz
|
63 : |
|
|
config.tar.gz
|
64 : |
|
|
ml-burg.tar.gz
|
65 : |
|
|
ml-lex.tar.gz
|
66 : |
|
|
ml-yacc.tar.gz
|
67 : |
|
|
runtime.tar.gz
|
68 : |
|
|
smlnj-lib.tar.gz
|
69 : |
|
|
system.tar.gz
|
70 : |
dbm |
616 |
|
71 : |
dbm |
619 |
======================================================================
|
72 : |
|
|
Details of changes
|
73 : |
dbm |
616 |
|
74 : |
dbm |
610 |
|
75 : |
|
|
======================================================================
|
76 : |
|
|
FLINT:
|
77 : |
|
|
======================================================================
|
78 : |
|
|
|
79 : |
|
|
Name: Stefan
|
80 : |
|
|
Date: 2000/04/07 10:00:00 EDT
|
81 : |
|
|
Tag: monnier-20000406-branch-handling
|
82 : |
|
|
Description:
|
83 : |
|
|
|
84 : |
|
|
Improved handling of branches (mostly those generated from
|
85 : |
|
|
polymorphic equality), removed switchoff and changed the
|
86 : |
|
|
default optimization settings (more cpsopt and less flintopt).
|
87 : |
|
|
|
88 : |
|
|
|
89 : |
|
|
======================================================================
|
90 : |
|
|
MLRISC:
|
91 : |
|
|
======================================================================
|
92 : |
|
|
|
93 : |
leunga |
617 |
1. Register Allocator
|
94 : |
dbm |
610 |
|
95 : |
leunga |
617 |
a. The interface and implementation of the register allocator have been
|
96 : |
|
|
changed slightly to accommodate the possibility of skipping
|
97 : |
|
|
the register allocation phases completely and go directly to
|
98 : |
|
|
memory allocation. This is needed for C-- use.
|
99 : |
dbm |
610 |
|
100 : |
leunga |
617 |
b. I've improved the spill propagation algorithm, using an approximation
|
101 : |
|
|
of maximal weighted independent sets. This affects only the x86
|
102 : |
|
|
platform.
|
103 : |
dbm |
610 |
|
104 : |
leunga |
617 |
2. MLTREE
|
105 : |
dbm |
610 |
|
106 : |
leunga |
617 |
a. Renamed the constructor CALL in MLTREE by popular demand.
|
107 : |
dbm |
610 |
|
108 : |
leunga |
617 |
3. X86
|
109 : |
dbm |
610 |
|
110 : |
leunga |
617 |
a. More assembly output problems involving the indexed addressing mode
|
111 : |
|
|
on the x86 have been found and corrected. Thanks to Fermin Reig for the
|
112 : |
|
|
fix.
|
113 : |
dbm |
610 |
|
114 : |
leunga |
617 |
b. x86Rewrite bug with MUL3 (found by Lal)
|
115 : |
|
|
|
116 : |
|
|
c. Added the instructions FSTS, FSTL
|
117 : |
|
|
|
118 : |
|
|
d. The old code generated for SETcc was completely wrong.
|
119 : |
|
|
The Intel optimization guide is VERY misleading.
|
120 : |
|
|
|
121 : |
|
|
e. Various fixes related floating point, and extensions.
|
122 : |
|
|
|
123 : |
|
|
f. Things like
|
124 : |
|
|
|
125 : |
|
|
jmp %eax
|
126 : |
|
|
jmp (%eax)
|
127 : |
|
|
|
128 : |
|
|
are now output as
|
129 : |
|
|
|
130 : |
|
|
jmp *%eax
|
131 : |
|
|
jmp *(%eax)
|
132 : |
|
|
|
133 : |
|
|
g. Yet another fix for x86 assembly for idivl, imull, mull and friends.
|
134 : |
|
|
|
135 : |
|
|
h. I've changed andl to testl in the floating point test sequence
|
136 : |
|
|
whenever appropriate. The Intel optimization guide states that
|
137 : |
|
|
testl is perferable to andl.
|
138 : |
|
|
|
139 : |
|
|
4. Alpha
|
140 : |
|
|
|
141 : |
|
|
a. Some extra patterns related to loads with signed/zero extension
|
142 : |
|
|
provided by Fermin.
|
143 : |
|
|
b. Added the instructions LDBU, LDWU, STB, STW as per Fermin's suggestion.
|
144 : |
|
|
c. Added a new mode byteWordLoadStores to the functor parameter to Alpha()
|
145 : |
|
|
d. Added reassociation code for address computation.
|
146 : |
|
|
|
147 : |
|
|
5. PA-RISC
|
148 : |
|
|
|
149 : |
dbm |
610 |
a. B label should not be a delay slot candidate! Why did this work?
|
150 : |
|
|
b. ADDT(32, REG(32, r), LI n) now generates one instruction instead of two,
|
151 : |
|
|
as it should be.
|
152 : |
|
|
c. The assembly syntax for fstds and fstdd was wrong.
|
153 : |
|
|
d. Added the composite instruction COMICLR/LDO, which is the immediate
|
154 : |
|
|
operand variant of COMCLR/LDO.
|
155 : |
leunga |
617 |
e. Long jumps in span dependence resolution used to depend on the existence
|
156 : |
|
|
of the base pointer in the SML/NJ runtime.
|
157 : |
dbm |
610 |
|
158 : |
leunga |
617 |
A jump to a long label L was expanded into the following sequence:
|
159 : |
|
|
|
160 : |
|
|
LDIL %hi(L-8192), %r29
|
161 : |
|
|
LDO %lo(L-8192)(%r29), %r29
|
162 : |
|
|
ADD %r29, baseptr, %r29
|
163 : |
|
|
BV,n %r0(%r29)
|
164 : |
dbm |
610 |
|
165 : |
leunga |
617 |
I've changed it so that the following sequence of instructions
|
166 : |
|
|
are generated, which doesn't mention the base pointer at all:
|
167 : |
|
|
|
168 : |
|
|
BL,n L', %r29 /* branch and link, L' + 4 -> %r29 */
|
169 : |
|
|
L': ADDIL L-(L'+4), %r29 /* Compute address of L */
|
170 : |
|
|
BV,n %r0(%r29) /* Jump */
|
171 : |
|
|
|
172 : |
|
|
6. Generic MLRISC
|
173 : |
|
|
|
174 : |
dbm |
610 |
a. shuffle.sml rewritten to be slightly more efficient
|
175 : |
|
|
b. DIV bug in mltree-simplify fixed (found by Fermin)
|
176 : |
|
|
|
177 : |
leunga |
617 |
7. Assembly Output
|
178 : |
dbm |
610 |
|
179 : |
leunga |
617 |
a. When generating assemby, resolve the value of client defined constants,
|
180 : |
|
|
instead of generating symbolic values. This is controlled by the
|
181 : |
|
|
new flag "asm-resolve-constants", which is default to true.
|
182 : |
dbm |
610 |
|
183 : |
leunga |
617 |
b. Added a new flag
|
184 : |
dbm |
610 |
|
185 : |
leunga |
617 |
"asm-indent-copies" (default to false)
|
186 : |
dbm |
610 |
|
187 : |
leunga |
617 |
When this flag is on, parallel copies will be indented an extra level.
|
188 : |
dbm |
610 |
|
189 : |
|
|
|
190 : |
leunga |
617 |
8. Machine Descriptions/Generation
|
191 : |
dbm |
610 |
|
192 : |
leunga |
617 |
a. The precedence parser was slightly broken when parsing infixr symbols.
|
193 : |
|
|
b. The type generalizing code had the bound variables reversed, resulting
|
194 : |
|
|
in a problem during arity raising.
|
195 : |
|
|
c. Various fixes in machine descriptions.
|
196 : |
dbm |
610 |
|
197 : |
leunga |
617 |
======================================================================
|
198 : |
|
|
CPS->MLRISC Code Generation
|
199 : |
|
|
======================================================================
|
200 : |
dbm |
610 |
|
201 : |
leunga |
617 |
This release contains *MAJOR* changes to the way code is generated from CPS
|
202 : |
|
|
in the module mlriscGen, and in various backend modules.
|
203 : |
dbm |
610 |
|
204 : |
leunga |
617 |
1. Forward propagation fix.
|
205 : |
|
|
|
206 : |
dbm |
610 |
There was a bug in forward propagation introduced at about the same time
|
207 : |
|
|
as the MLRISC x86 backend, which prohibits coalescing to be
|
208 : |
|
|
performed effectively in loops.
|
209 : |
|
|
|
210 : |
|
|
Effect: speed up of loops in RISC architectures.
|
211 : |
|
|
By itself, this actually slowed down certain benchmarks on the x86.
|
212 : |
|
|
|
213 : |
leunga |
617 |
2. Forward propagating addresses from consing.
|
214 : |
dbm |
610 |
|
215 : |
|
|
I've changed the way consing code is generated. Basically I separated
|
216 : |
|
|
out the initialization part:
|
217 : |
|
|
|
218 : |
|
|
store tag, offset(allocptr)
|
219 : |
|
|
store elem1, offset+4(allocptr)
|
220 : |
|
|
store elem2, offset+8(allocptr)
|
221 : |
|
|
...
|
222 : |
|
|
store elemn, offset+4n(allocptr)
|
223 : |
|
|
|
224 : |
|
|
and the address computation part:
|
225 : |
|
|
|
226 : |
|
|
celladdr <- offset+4+alloctpr
|
227 : |
|
|
|
228 : |
|
|
and move the address computation part
|
229 : |
|
|
|
230 : |
|
|
Effect: register pressure is generally lower as a result. This
|
231 : |
|
|
makes compilation of certain expressions much faster, such as
|
232 : |
|
|
long lists with non-trivial elements.
|
233 : |
|
|
|
234 : |
|
|
[(0,0), (0,0), .... (0,0)]
|
235 : |
|
|
|
236 : |
leunga |
617 |
3. Base pointer elimination.
|
237 : |
dbm |
610 |
|
238 : |
|
|
As part of the linkage mechanism, we generate the sequence:
|
239 : |
|
|
|
240 : |
|
|
L: ... <- start of the code fragment
|
241 : |
|
|
|
242 : |
|
|
L1:
|
243 : |
|
|
base pointer <- linkreg - L1 + L
|
244 : |
|
|
|
245 : |
|
|
The base pointer was then used for computing relocatable addresses
|
246 : |
|
|
in the code fragment. Frequently (such as in lots of continuations)
|
247 : |
|
|
this is not needed. We now eliminate this sequence whenever possible.
|
248 : |
|
|
|
249 : |
|
|
For compile time efficiency, I'm using a very stupid local heuristic.
|
250 : |
|
|
But in general, this should be done as a control flow analysis.
|
251 : |
|
|
|
252 : |
|
|
Effect: Smaller code size. Speed up of most programs.
|
253 : |
|
|
|
254 : |
|
|
|
255 : |
leunga |
617 |
4. Frequency annotations
|
256 : |
dbm |
610 |
|
257 : |
|
|
I've added an annotation that states that all call gc blocks have zero
|
258 : |
|
|
execution frequencies. This improves register allocation on the x86.
|
259 : |
|
|
|
260 : |
|
|
BENCHMARKS
|
261 : |
|
|
==========
|
262 : |
|
|
|
263 : |
|
|
I've only perform the comparison on 110.25.
|
264 : |
|
|
|
265 : |
|
|
The platforms are:
|
266 : |
|
|
|
267 : |
|
|
HPPA A four processor HP machine (E9000) with 5G of memory.
|
268 : |
|
|
X86 A 300Hhz Pentium II with 128M of memory, and
|
269 : |
|
|
SPARC An Ultra sparc 2 with 512M of memory.
|
270 : |
|
|
|
271 : |
|
|
I used the following parameters for the SML benchmarks:
|
272 : |
|
|
|
273 : |
|
|
@SMLalloc
|
274 : |
|
|
HPPA 256k
|
275 : |
|
|
SPARC 512k
|
276 : |
|
|
X86 256k
|
277 : |
|
|
|
278 : |
|
|
COMPILATION TIME
|
279 : |
|
|
----------------
|
280 : |
|
|
Here are the numbers comparing the compilation times of the compilers.
|
281 : |
|
|
I've only compared 110.25 compiling the new sources versus
|
282 : |
|
|
a fixpoint version of the new compiler compiling the same.
|
283 : |
|
|
|
284 : |
|
|
110.25 New
|
285 : |
|
|
Total Time in RA Spill+Reload Total Time In RA Spill+Reload
|
286 : |
|
|
HPPA 627s 116s 2684+3584 599s 95s 1003+1879
|
287 : |
|
|
SPARC 892s 173s 2891+3870 708s 116s 1004+1880
|
288 : |
|
|
X86 999s 315s 94006+130691 987s 296s 108877+141957
|
289 : |
|
|
|
290 : |
|
|
110.25 New
|
291 : |
|
|
Code Size Code Size
|
292 : |
|
|
HPPA 8596736 8561421
|
293 : |
|
|
SPARC 8974299 8785143
|
294 : |
|
|
X86 9029180 8716783
|
295 : |
|
|
|
296 : |
|
|
So in summary, things are at least as good as before. Dramatic
|
297 : |
|
|
reduction in compilation is obtained on the Sparc; I can't explain it,
|
298 : |
|
|
but it is reproducible. Perhaps someone should try to reproduce this
|
299 : |
|
|
on their own machines.
|
300 : |
|
|
|
301 : |
|
|
SML BENCHMARKS
|
302 : |
|
|
--------------
|
303 : |
|
|
|
304 : |
|
|
On the average, all benchmarks perform at least as well as before.
|
305 : |
|
|
|
306 : |
|
|
HPPA Compilation Time Spill+Reload Run Time
|
307 : |
|
|
110.25 New 110.25 New 110.25 New
|
308 : |
|
|
|
309 : |
|
|
barnesHut 3.158 3.015 4.75% 1+1 0+0 2.980 2.922 2.00%
|
310 : |
|
|
boyer 6.152 5.708 7.77% 0+0 0+0 0.218 0.213 2.34%
|
311 : |
|
|
count-graphs 1.168 1.120 4.32% 0+0 0+0 22.705 23.073 -1.60%
|
312 : |
|
|
fft 0.877 0.792 10.74% 1+3 1+3 0.602 0.587 2.56%
|
313 : |
|
|
knuthBendix 3.180 2.857 11.32% 0+0 0+0 0.675 0.662 2.02%
|
314 : |
|
|
lexgen 6.190 5.290 17.01% 0+0 0+0 0.913 0.788 15.86%
|
315 : |
|
|
life 0.803 0.703 14.22% 25+25 0+0 0.153 0.140 9.52%
|
316 : |
|
|
logic 2.048 2.007 2.08% 6+6 1+1 4.133 4.008 3.12%
|
317 : |
|
|
mandelbrot 0.077 0.080 -4.17% 0+0 0+0 0.765 0.712 7.49%
|
318 : |
|
|
mlyacc 22.932 20.937 9.53% 154+181 32+57 0.468 0.430 8.91%
|
319 : |
|
|
nucleic 5.183 5.060 2.44% 2+2 0+0 0.125 0.120 4.17%
|
320 : |
|
|
ratio-regions 3.357 3.142 6.84% 0+0 0+0 116.225 113.173 2.70%
|
321 : |
|
|
ray 1.283 1.290 -0.52% 0+0 0+0 2.887 2.855 1.11%
|
322 : |
|
|
simple 6.307 6.032 4.56% 28+30 5+7 3.705 3.658 1.28%
|
323 : |
|
|
tsp 0.888 0.862 3.09% 0+0 0+0 7.040 6.893 2.13%
|
324 : |
|
|
vliw 24.378 23.455 3.94% 106+127 25+45 2.758 2.707 1.91%
|
325 : |
|
|
--------------------------------------------------------------------------
|
326 : |
|
|
Average 6.12% 4.09%
|
327 : |
|
|
|
328 : |
|
|
SPARC Compilation Time Spill+Reload Run Time
|
329 : |
|
|
110.25 New 110.25 New 110.25 New
|
330 : |
|
|
|
331 : |
|
|
barnesHut 3.778 3.592 5.20% 2+2 0+0 3.648 3.453 5.65%
|
332 : |
|
|
boyer 6.632 6.110 8.54% 0+0 0+0 0.258 0.242 6.90%
|
333 : |
|
|
count-graphs 1.435 1.325 8.30% 0+0 0+0 33.672 34.737 -3.07%
|
334 : |
|
|
fft 0.980 0.940 4.26% 3+9 2+6 0.838 0.827 1.41%
|
335 : |
|
|
knuthBendix 3.590 3.138 14.39% 0+0 0+0 0.962 0.967 -0.52%
|
336 : |
|
|
lexgen 6.593 6.072 8.59% 1+1 0+0 1.077 1.078 -0.15%
|
337 : |
|
|
life 0.972 0.868 11.90% 26+26 0+0 0.143 0.140 2.38%
|
338 : |
|
|
logic 2.525 2.387 5.80% 7+7 1+1 5.625 5.158 9.05%
|
339 : |
|
|
mandelbrot 0.090 0.093 -3.57% 0+0 0+0 0.855 0.728 17.39%
|
340 : |
|
|
mlyacc 26.732 23.827 12.19% 162+189 32+57 0.550 0.560 -1.79%
|
341 : |
|
|
nucleic 6.233 6.197 0.59% 3+3 0+0 0.163 0.173 -5.77%
|
342 : |
|
|
ratio-regions 3.780 3.507 7.79% 0+0 0+0 133.993 131.035 2.26%
|
343 : |
|
|
ray 1.595 1.550 2.90% 1+1 0+0 3.440 3.418 0.63%
|
344 : |
|
|
simple 6.972 6.487 7.48% 29+32 5+7 3.523 3.525 -0.05%
|
345 : |
|
|
tsp 1.115 1.063 4.86% 0+0 0+0 7.393 7.265 1.77%
|
346 : |
|
|
vliw 27.765 24.818 11.87% 110+135 25+45 2.265 2.135 6.09%
|
347 : |
|
|
----------------------------------------------------------------------------
|
348 : |
|
|
Average 6.94% 2.64%
|
349 : |
|
|
|
350 : |
|
|
X86 Compilation Time Spill+Reload Run Time
|
351 : |
|
|
110.25 New 110.25 New 110.25 New
|
352 : |
|
|
|
353 : |
|
|
barnesHut 5.530 5.420 2.03% 593+893 597+915 3.532 3.440 2.66%
|
354 : |
|
|
boyer 8.768 7.747 13.19% 493+199 301+289 0.327 0.297 10.11%
|
355 : |
|
|
count-graphs 2.040 2.010 1.49% 298+394 315+457 26.578 28.660 -7.26%
|
356 : |
|
|
fft 1.327 1.302 1.92% 112+209 115+210 1.055 0.962 9.71%
|
357 : |
|
|
knuthBendix 5.218 5.475 -4.69% 451+598 510+650 0.928 0.932 -0.36%
|
358 : |
|
|
lexgen 9.970 9.623 3.60% 1014+841 1157+885 0.947 0.928 1.97%
|
359 : |
|
|
life 1.183 1.183 0.00% 162+182 145+148 0.127 0.103 22.58%
|
360 : |
|
|
logic 3.285 3.512 -6.45% 514+684 591+836 5.682 5.577 1.88%
|
361 : |
|
|
mandelbrot 0.147 0.143 2.33% 38+41 33+54 0.703 0.690 1.93%
|
362 : |
|
|
mlyacc 35.457 32.763 8.22% 3496+4564 3611+4860 0.552 0.550 0.30%
|
363 : |
|
|
nucleic 7.100 6.888 3.07% 239+168 201+158 0.175 0.173 0.96%
|
364 : |
|
|
ratio-regions 6.388 6.843 -6.65% 1182+257 981+300 120.142 120.345 -0.17%
|
365 : |
|
|
ray 2.332 2.338 -0.29% 346+398 402+494 3.593 3.540 1.51%
|
366 : |
|
|
simple 9.912 9.903 0.08% 1475+941 1579+1168 3.057 3.178 -3.83%
|
367 : |
|
|
tsp 1.623 1.532 5.98% 266+200 250+211 8.045 7.878 2.12%
|
368 : |
|
|
vliw 33.947 35.470 -4.29% 2629+2774 2877+3171 2.072 1.890 9.61%
|
369 : |
|
|
----------------------------------------------------------------------------
|
370 : |
|
|
Average 1.22% 3.36%
|
371 : |
|
|
|
372 : |
|
|
|
373 : |
leunga |
617 |
|
374 : |
|
|
Aliasing
|
375 : |
|
|
---------
|
376 : |
dbm |
610 |
This update contains a rewritten (and hopefully more correct) module
|
377 : |
|
|
for extracting aliasing information from CPS.
|
378 : |
|
|
|
379 : |
|
|
To turn on this feature:
|
380 : |
|
|
|
381 : |
|
|
Compiler.Control.CG.memDisambiguate := true
|
382 : |
|
|
|
383 : |
|
|
To pretty print the region information with assembly
|
384 : |
|
|
|
385 : |
|
|
Compiler.Control.MLRISC.getFlag "asm-show-region" := true;
|
386 : |
|
|
|
387 : |
|
|
To control how many levels of aliasing information are printed, use:
|
388 : |
|
|
|
389 : |
|
|
Compiler.Control.MLRISC.getInt "points-to-show-level" := n
|
390 : |
|
|
|
391 : |
|
|
The default of n is 3.
|
392 : |
|
|
|
393 : |
|
|
======================================================================
|
394 : |
blume |
618 |
Boot code and glue scripts
|
395 : |
dbm |
610 |
======================================================================
|
396 : |
|
|
|
397 : |
blume |
618 |
Size info in BOOTLIST
|
398 : |
dbm |
610 |
|
399 : |
blume |
618 |
The BOOTLIST file now has an optional first line that specifies an
|
400 : |
|
|
upper bound on the number of boot files and an upper bound on the
|
401 : |
|
|
length of each individual name. With this, there are no longer
|
402 : |
|
|
hard-wired restrictions on these values in the runtime system.
|
403 : |
|
|
(If the specification is missing in BOOTLIST, the runtime system
|
404 : |
|
|
falls back to its old behavior, i.e., hard-wired defaults.)
|
405 : |
dbm |
610 |
|
406 : |
blume |
618 |
Allocation-size heuristics in .run-sml
|
407 : |
dbm |
610 |
|
408 : |
blume |
618 |
The .run-sml scripts tries to read processor cache size from
|
409 : |
|
|
/proc/cpuinfo. This works on Linux and is important for small-cache
|
410 : |
|
|
Celeron systems that suffer badly when allocation size is set too
|
411 : |
|
|
high.
|
412 : |
dbm |
610 |
|
413 : |
blume |
618 |
Install script
|
414 : |
dbm |
610 |
|
415 : |
blume |
618 |
- Written in a more modular fashion (using shell functions).
|
416 : |
|
|
- Made more robust.
|
417 : |
|
|
- Automagically fetches archive files over the network if they do not
|
418 : |
|
|
exist locally. Thus, you only need to fetch config.tar.gz yourself.
|
419 : |
|
|
Unpack it and go!
|
420 : |
|
|
(Requires "wget" or "lynx" to be installed on the system and a
|
421 : |
|
|
live connection to the internet. Moreover, the contents of
|
422 : |
|
|
config/srcarchiveurl must be set properly.)
|
423 : |
|
|
For CVS users, this may be convenient when fetching new sets of binfiles.
|
424 : |
|
|
- Handles archive files with or without version number and compressed
|
425 : |
|
|
with one of "gzip", "compress", or "bzip2". Recognized suffixes are
|
426 : |
|
|
".tar.gz", ".tgz", ".tar", ".tar.Z", and ".tar.bz2".
|
427 : |
dbm |
610 |
|
428 : |
blume |
618 |
PIDMAP file
|
429 : |
dbm |
610 |
|
430 : |
blume |
618 |
There is a file called PIDMAP in the bootfile directory.
|
431 : |
|
|
It is used to minimize the amount of dynamic state that needs to be
|
432 : |
|
|
stowed away for the purpose of sharing between interactive system
|
433 : |
|
|
and user code.
|
434 : |
dbm |
610 |
|
435 : |
blume |
618 |
Building standalone programs
|
436 : |
dbm |
610 |
|
437 : |
blume |
618 |
The command ml-build can be used to build standalone programs.
|
438 : |
|
|
ml-build takes three arguments:
|
439 : |
dbm |
610 |
|
440 : |
blume |
618 |
1. the name of the CM library that implements and exports the "main"
|
441 : |
|
|
function of your program
|
442 : |
|
|
2. the name of the "main" function of your program as exported by 1.
|
443 : |
|
|
(The function must have a type that makes it suitable as an argument
|
444 : |
|
|
to SMLofNJ.exportFn.)
|
445 : |
|
|
3. the name of the heapfile to be generated
|
446 : |
dbm |
610 |
|
447 : |
blume |
618 |
Other build scripts
|
448 : |
dbm |
610 |
|
449 : |
blume |
618 |
ml-{lex,yacc} build scripts now make use of the new mechanism for
|
450 : |
|
|
building standalone programs.
|
451 : |
dbm |
610 |
|
452 : |
blume |
618 |
Fixpoint script
|
453 : |
dbm |
610 |
|
454 : |
blume |
618 |
I added a re-written version of Dave's fixpt script to src/system.
|
455 : |
|
|
Changes relative to the original version:
|
456 : |
|
|
- sh-ified (not everybody has ksh)
|
457 : |
|
|
- automatically figures out which architecture it runs on
|
458 : |
|
|
- uses ./makeml a bit more cleverly
|
459 : |
|
|
- never invokes ./installml (and, thus, does not clobber your
|
460 : |
|
|
good and working installation of sml in case something goes wrong)
|
461 : |
|
|
- accepts max iteration count using option "-iter <n>"
|
462 : |
|
|
- accepts a "base" name using option "-base <base>"
|
463 : |
dbm |
610 |
|
464 : |
blume |
618 |
It does not build any extraneous heap images but directly rebuilds
|
465 : |
|
|
bin- and boot-hierarchies using makeml's "-rebuild" switch. Finally,
|
466 : |
|
|
it can incorporate existing bin- and boot- hierarchies. For example,
|
467 : |
|
|
suppose the base is set to "sml" (which is the default). Then it
|
468 : |
|
|
successively builds
|
469 : |
dbm |
610 |
|
470 : |
blume |
618 |
sml.bin.<arch>-unix and sml.boot.<arch>-unix
|
471 : |
|
|
then sml1.bin.<arch>-unix and sml1.boot.<arch>-unix
|
472 : |
|
|
then sml2.bin.<arch>-unix and sml2.boot.<arch>-unix
|
473 : |
|
|
...
|
474 : |
|
|
then sml<n>.bin.<arch>-unix and sml<n>.boot.<arch>-unix
|
475 : |
dbm |
610 |
|
476 : |
blume |
618 |
and so on. If any of these already exist, it will just use what's
|
477 : |
|
|
there. In particular, many people will have the initial set of bin
|
478 : |
|
|
and boot files around, so this saves time for at least one full
|
479 : |
|
|
rebuild. Having sets of the form <base><k>.{bin,boot}.<arch>-unix for
|
480 : |
|
|
<k>=1,2,... is normally not a good idea when invoking fixpt. However,
|
481 : |
|
|
they might be the result of an earlier partial run of fixpt (which
|
482 : |
|
|
perhaps got accidentially killed). In this case, fixpt will quickly
|
483 : |
|
|
move through what exists before continuing where it left off earlier,
|
484 : |
|
|
and, thus, saves a lot of time.
|
485 : |
dbm |
610 |
|
486 : |
blume |
618 |
Runtime system code
|
487 : |
dbm |
610 |
|
488 : |
blume |
618 |
- fixed several gcc -Wall warnings that were caused by missing header
|
489 : |
|
|
files, missing initializations, etc., in runtime (not all warnings
|
490 : |
|
|
eliminated, though)
|
491 : |
|
|
- hand to "un-fix" some of them later because they broke the HPPA compile
|
492 : |
dbm |
610 |
|
493 : |
blume |
618 |
======================================================================
|
494 : |
|
|
CM
|
495 : |
|
|
======================================================================
|
496 : |
dbm |
610 |
|
497 : |
blume |
618 |
Several manual updates
|
498 : |
dbm |
610 |
|
499 : |
blume |
618 |
I always try to keep the manual in sync with CM's latest features.
|
500 : |
dbm |
610 |
|
501 : |
blume |
618 |
Bootstrap compilation
|
502 : |
dbm |
610 |
|
503 : |
blume |
618 |
No more "CMB.deliver"
|
504 : |
dbm |
610 |
|
505 : |
blume |
618 |
- All work is done by CMB.make (as it used to be in the old CM).
|
506 : |
|
|
- CMB.make can be used even with existing bootfiles, i.e., bootfiles do
|
507 : |
|
|
not have to be removed beforehand.
|
508 : |
|
|
- In "paranoid mode" CM checks a stable libraries CRC checksum to
|
509 : |
|
|
verify that it is "valid". (In "normal mode", such checks do not
|
510 : |
|
|
occur.) Paranoid mode is used for bootstrap compilation. This is
|
511 : |
|
|
what makes it possible to re-use existing bootfiles.
|
512 : |
dbm |
610 |
|
513 : |
blume |
618 |
Initial glue code (init.cmi)
|
514 : |
dbm |
610 |
|
515 : |
blume |
618 |
- treated as a genuine library now
|
516 : |
|
|
- there are no more "built-in" modules
|
517 : |
dbm |
610 |
|
518 : |
blume |
618 |
CM API
|
519 : |
dbm |
610 |
|
520 : |
blume |
618 |
CM.Anchor.anchor instead of CM.Anchor.{set,cancel}
|
521 : |
|
|
- Upon request by Elsa. Anchors now controlled by get-set-pair
|
522 : |
|
|
like most other CM state variables.
|
523 : |
dbm |
610 |
|
524 : |
blume |
618 |
CM tools:
|
525 : |
|
|
- It is now possible to have tools that accept additional
|
526 : |
|
|
"command line" parameters (specified in the .cm file at each
|
527 : |
|
|
instance where the tool's class is used).
|
528 : |
dbm |
610 |
|
529 : |
blume |
618 |
- The parser understands named parameters and recursive options.
|
530 : |
dbm |
610 |
|
531 : |
blume |
618 |
- new "make" and "shell" tools added
|
532 : |
|
|
* facilitate fairly seemless hookup to portions of code
|
533 : |
|
|
managed using Makefiles or Shell scripts.
|
534 : |
dbm |
610 |
|
535 : |
blume |
618 |
- There are no classes "shared" or "private" anymore. Instead,
|
536 : |
|
|
the sharing annotation is now a parameter to the "sml" class.
|
537 : |
dbm |
610 |
|
538 : |
blume |
618 |
- Tools.registerStdShellCmdTool (from smlnj/cm/tool.cm) takes an
|
539 : |
|
|
additional argument called "template" which is an optional
|
540 : |
|
|
string that specifiel the layout of the tool command line. See
|
541 : |
|
|
the CM manual for explanation.
|
542 : |
dbm |
610 |
|
543 : |
blume |
618 |
- A special-purpose tool can be "registered" by simply dropping
|
544 : |
|
|
the corresponding <...>-tool.cm (and/or <...>-ext.cm) into the
|
545 : |
|
|
same directory where the .cm file lives that uses this tool.
|
546 : |
|
|
(The behavior/misfeature until now was to look for the tool
|
547 : |
|
|
description files in the current working directory.) As
|
548 : |
|
|
before, tool description files could also be anchored -- in
|
549 : |
|
|
which case they can live anywhere they like. Following the
|
550 : |
|
|
recent e-mail discussion, this change should make it easier to
|
551 : |
|
|
have special-purpose tools that are shipped together with the
|
552 : |
|
|
sources of the program that uses them.
|
553 : |
|
|
Bug: such a tool does not get un-registered after being done
|
554 : |
dbm |
610 |
|
555 : |
blume |
618 |
Library names
|
556 : |
dbm |
610 |
|
557 : |
blume |
618 |
Library names have been completely re-organized.
|
558 : |
|
|
Many libraries have been consolidated so that they share the same
|
559 : |
|
|
path anchor. For example, all MLRISC-related libraries are
|
560 : |
|
|
anchored at MLRISC, most libraries that are SML/NJ-specific are
|
561 : |
|
|
under "smlnj". Notice that names like host-cmb.cm or
|
562 : |
|
|
host-compiler.cm no longer exist. See system/README for a
|
563 : |
|
|
complete description of the new naming scheme. Quick reference:
|
564 : |
dbm |
610 |
|
565 : |
blume |
618 |
host-cmb.cm -> smlnj/cmb.cm
|
566 : |
|
|
host-compiler.cm -> smlnj/compiler.cm
|
567 : |
|
|
full-cm.cm -> smlnj/cm.cm
|
568 : |
|
|
<arch>-<os>.cm -> smlnj/cmb/<arch>-<os>.cm
|
569 : |
|
|
<arch>-compiler.cm -> smlnj/compiler/<arch>.cm
|
570 : |
dbm |
610 |
|
571 : |
blume |
618 |
CM bug fixes
|
572 : |
dbm |
610 |
|
573 : |
blume |
618 |
- exceptions in user code are being passed through (i.e., reach top level)
|
574 : |
|
|
- more bugs in paranoia mode fixed
|
575 : |
|
|
- bug related to checking group owners fixed
|
576 : |
|
|
- better error handling (suppresses many followup-messages)
|
577 : |
dbm |
610 |
|
578 : |
blume |
618 |
Internals
|
579 : |
dbm |
610 |
|
580 : |
blume |
618 |
"Global" modmap:
|
581 : |
|
|
CM now maintains one "global" modmap that is used for all stable
|
582 : |
|
|
libraries. The use of such a global modmap maximizes sharing and
|
583 : |
|
|
minimizes the need for re-traversing parts of environments during
|
584 : |
|
|
modmap construction. (However, this has minor impact since modmap
|
585 : |
|
|
construction seems to account for just one percent or less of total
|
586 : |
|
|
compile time.)
|
587 : |
dbm |
610 |
|
588 : |
blume |
618 |
======================================================================
|
589 : |
|
|
Compiler Internals
|
590 : |
|
|
======================================================================
|
591 : |
dbm |
610 |
|
592 : |
blume |
618 |
Environment data structures: major changes
|
593 : |
dbm |
610 |
|
594 : |
blume |
618 |
No CMStaticEnv anymore.
|
595 : |
|
|
- no CMEnv, no "BareEnvironment" (actually, _only_ BareEnvironment,
|
596 : |
|
|
but it is called Environment), no conversions between different
|
597 : |
|
|
kinds of static environments
|
598 : |
dbm |
610 |
|
599 : |
blume |
618 |
- There is still a notion of a "modmap", but such modmaps are generated
|
600 : |
|
|
on demand at the time when they are needed. This sounds slow, but I
|
601 : |
|
|
sped up the code that generates modmaps enough for this not to lead to
|
602 : |
|
|
a slowdown of the compiler (at least I didn't detect any).
|
603 : |
dbm |
610 |
|
604 : |
blume |
618 |
- To facilitate rapid modmap generation, static environments now
|
605 : |
|
|
contain an (optional) "modtree" structure. Modtree annotations are
|
606 : |
|
|
constructed by the unpickler during unpickling. (This means that
|
607 : |
|
|
the elaborator does not have to worry about modtrees at all.)
|
608 : |
|
|
Modtrees have the advantage that they are compositional in the same
|
609 : |
|
|
way as the environment data structure itself is compositional.
|
610 : |
|
|
As a result, modtrees never hang on to parts of an environment that
|
611 : |
|
|
has already been rendered "stale" by filtering or rebinding.
|
612 : |
dbm |
610 |
|
613 : |
blume |
618 |
- all files that I touched now compile without warnings (other than
|
614 : |
|
|
"polyEqual").
|
615 : |
|
|
|
616 : |
|
|
- compiler now tends to run "leaner" (i.e., ties up less memory in
|
617 : |
|
|
redundant modmaps)
|
618 : |
|
|
|
619 : |
|
|
Stats phase "genmap" added
|
620 : |
|
|
|
621 : |
|
|
- measures time spent during on-the-fly modmap generation
|
622 : |
|
|
|
623 : |
|
|
Changes on behalf of CM
|
624 : |
|
|
|
625 : |
|
|
Compiler.CMSA eliminated
|
626 : |
|
|
- No longer supported by CM anyway.
|
627 : |
|
|
|
628 : |
|
|
Fixed bugs in pickler that kept biting Stefan
|
629 : |
|
|
- past refs to past refs (was caused by the possibility that
|
630 : |
|
|
ad-hoc sharing is more discriminating than hash-cons sharing)
|
631 : |
|
|
- integer overflow on LargeInt.minInt
|
632 : |
|
|
|
633 : |
|
|
Handling of "core" environment:
|
634 : |
|
|
|
635 : |
|
|
I eliminated coreEnv from compInfo. Access to the "Core"
|
636 : |
|
|
structure is now done via the ordinary static environment that is
|
637 : |
|
|
context to each compilation unit.
|
638 : |
|
|
|
639 : |
|
|
To this end, I arranged that instead of "structure Core" a
|
640 : |
|
|
"structure _Core" is bound in the pervasive environment. Core
|
641 : |
|
|
access is done via _Core (which can never be accidentially rebound
|
642 : |
|
|
because _Core is not a legal surface-syntax symbol).
|
643 : |
|
|
|
644 : |
|
|
The current solution is much cleaner because the core environment
|
645 : |
|
|
is now simply part of the pervasive environment which is part of
|
646 : |
|
|
every compilation unit's context anyway. In particular, this
|
647 : |
|
|
eliminates all special-case handling that was necessary until now
|
648 : |
|
|
in order to deal with dynamic and symbolic parts of the core
|
649 : |
|
|
environment.
|
650 : |
|
|
|
651 : |
|
|
Remaining hackery (to bind the "magic" symbol _Core) is localized
|
652 : |
|
|
in the compilation mananger's bootstrap compiler (actually: in the
|
653 : |
|
|
"init group" handling). See the comments in
|
654 : |
|
|
src/system/smlnj/init/init.cmi for more details.
|
655 : |
|
|
|
656 : |
|
|
I also tried to track down all mentions of "Core" (as string
|
657 : |
|
|
argument to Symbol.strSymbol) in the compiler and replaced them
|
658 : |
|
|
with a reference to the new CoreSym.coreSym. Seems cleaner since
|
659 : |
|
|
the actual name appears in one place only.
|