Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1067, Fri Feb 15 17:08:17 2002 UTC revision 1124, Thu Mar 7 19:49:22 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Lal George
17    Date: 2002/03/07 14:44:24 EST 2002
18    Tag: george-20020307-weighted-block-placement
19    
20    Tested the weighted block placement optimization on all architectures
21    (except the hppa) using AMPL to generate the block and edge frequencies.
22    Changes were required in the machine properties to correctly
23    categorize trap instructions. There is an MLRISC flag
24    "weighted-block-placement" that can be used to enable weighted block
25    placement, but this will be ineffective without block/edge
26    frequencies (coming soon).
27    
28    
29    ----------------------------------------------------------------------
30    Name: Lal George
31    Date: 2002/03/05 17:24:48 EST
32    Tag: george-20020305-linkage-cluster
33    
34    In order to support the block placement optimization, a new cluster
35    is generated as the very first cluster (called the linkage cluster).
36    It contains a single jump to the 'real' entry point for the compilation
37    unit. Block placement has no effect on the linkage cluster itself, but
38    all the other clusters  have full freedom in the manner in which they
39    reorder blocks or functions.
40    
41    On the x86 the typical linkage code that is generated is:
42       ----------------------
43            .align 2
44       L0:
45            addl    $L1-L0, 72(%esp)
46            jmp     L1
47    
48    
49            .align  2
50       L1:
51       ----------------------
52    
53    72(%esp) is the memory location for the stdlink register. This
54    must contain the address of the CPS function being called. In the
55    above example, it contains the address of  L0; before
56    calling L1 (the real entry point for the compilation unit), it
57    must contain the address for L1, and hence
58    
59            addl $L1-L0, 72(%esp)
60    
61    I have tested this on all architectures except the hppa.The increase
62    in code size is of course negligible
63    
64    ----------------------------------------------------------------------
65    Name: Allen Leung
66    Date: 2002/03/03 13:20:00 EST
67    Tag: leunga-20020303-mlrisc-tools
68    
69      Added #[ ... ] expressions to mlrisc tools
70    
71    ----------------------------------------------------------------------
72    Name: Matthias Blume
73    Date: 2002/02/27 12:29:00 EST
74    Tag: blume-20020227-cdebug
75    Description:
76    
77    - made types in structure C and C_Debug to be equal
78    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
79    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
80    
81    ----------------------------------------------------------------------
82    Name: Matthias Blume
83    Date: 2002/02/26 12:00:00 EST
84    Tag: blume-20020226-ffi
85    Description:
86    
87    1. Fixed a minor bug in CM's "noweb" tool:
88       If numbering is turned off, then truly don't number (i.e., do not
89       supply the -L option to noweb).  The previous behavior was to supply
90       -L'' -- which caused noweb to use the "default" line numbering scheme.
91       Thanks to Chris Richards for pointing this out (and supplying the fix).
92    
93    2. Once again, I reworked some aspects of the FFI:
94    
95       A. The incomplete/complete type business:
96    
97       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
98         gone!
99       - ML types representing an incomplete type are now *equal* to
100         ML types representing their corresponding complete types (just like
101         in C).  This is still safe because ml-nlffigen will not generate
102         RTTI for incomplete types, nor will it generate functions that
103         require access to such RTTI.   But when ML code generated from both
104         incomplete and complete versions of the C type meet, the ML types
105         are trivially interoperable.
106    
107         NOTE:  These changes restore the full generality of the translation
108         (which was previously lost when I eliminated functorization)!
109    
110       B. Enum types:
111    
112       - Structure C now has a type constructor "enum" that is similar to
113         how the "su" constructor works.  However, "enum" is not a phantom
114         type because each "T enum" has values (and is isomorphic to
115         MLRep.Signed.int).
116       - There are generic access operations for enum objects (using
117         MLRep.Signed.int).
118       - ml-nlffigen will generate a structure E_foo for each "enum foo".
119         * The structure contains the definition of type "mlrep" (the ML-side
120         representation type of the enum).  Normally, mlrep is the same
121         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
122         then mlrep will be defined as a datatype -- thus facilitating
123         pattern matching on mlrep values.
124         ("-ec" will be suppressed if there are duplicate values in an
125          enumeration.)
126         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
127         will be generated for each C enum constant xxx.
128         * Conversion functions m2i and i2m convert between mlrep and
129         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
130         * Coversion functions c and ml convert between mlrep and "tag enum".
131         * Access functions (get/set) fetch and store mlrep values.
132       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
133         enumerations are merged into one single enumeration represented by
134         structure E_'.
135    
136    ----------------------------------------------------------------------
137    Name: Allen Leung
138    Date: 2002/02/25 04:45:00 EST
139    Tag: leunga-20020225-cps-spill
140    
141    This is a new implementation of the CPS spill phase.
142    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
143    In case of problems, replace it with the old file spill.sml
144    
145    The current compiler runs into some serious performance problems when
146    constructing a large record.  This can happen when we try to compile a
147    structure with many items.  Even a very simple structure like the following
148    makes the compiler slow down.
149    
150        structure Foo = struct
151           val x_1 = 0w1 : Word32.int
152           val x_2 = 0w2 : Word32.int
153           val x_3 = 0w3 : Word32.int
154           ...
155           val x_N = 0wN : Word32.int
156        end
157    
158    The following table shows the compile time, from N=1000 to N=4000,
159    with the old compiler:
160    
161    N
162    1000   CPS 100 spill                           0.04u  0.00s  0.00g
163           MLRISC ra                               0.06u  0.00s  0.05g
164              (spills = 0 reloads = 0)
165           TOTAL                                   0.63u  0.07s  0.21g
166    
167    1100   CPS 100 spill                           8.25u  0.32s  0.64g
168           MLRISC ra                               5.68u  0.59s  3.93g
169              (spills = 0 reloads = 0)
170           TOTAL                                   14.71u  0.99s  4.81g
171    
172    1500   CPS 100 spill                           58.55u  2.34s  1.74g
173           MLRISC ra                               5.54u  0.65s  3.91g
174              (spills = 543 reloads = 1082)
175           TOTAL                                   65.40u  3.13s  6.00g
176    
177    2000   CPS 100 spill                           126.69u  4.84s  3.08g
178           MLRISC ra                               0.80u  0.10s  0.55g
179              (spills = 42 reloads = 84)
180           TOTAL                                   129.42u  5.10s  4.13g
181    
182    3000   CPS 100 spill                           675.59u  19.03s  11.64g
183           MLRISC ra                               2.69u  0.27s  1.38g
184              (spills = 62 reloads = 124)
185           TOTAL                                   682.48u  19.61s  13.99g
186    
187    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
188           MLRISC ra                               4.96u  0.27s  2.72g
189              (spills = 85 reloads = 170)
190           TOTAL                                   2375.26u  57.21s  48.00g
191    
192    As you can see the old cps spill module suffers from some serious
193    performance problem.  But since I cannot decipher the old code fully,
194    instead of patching the problems up, I'm reimplementing it
195    with a different algorithm.  The new code is more modular,
196    smaller when compiled, and substantially faster
197    (O(n log n) time and O(n) space).  Timing of the new spill module:
198    
199    4000  CPS 100 spill                           0.02u  0.00s  0.00g
200          MLRISC ra                               0.25u  0.02s  0.15g
201             (spills=1 reloads=3)
202          TOTAL                                   7.74u  0.34s  1.62g
203    
204    Implementation details:
205    
206    As far as I can tell, the purpose of the CPS spill module is to make sure the
207    number of live variables at any program point (the bandwidth)
208    does not exceed a certain limit, which is determined by the
209    size of the spill area.
210    
211    When the bandwidth is too large, we decrease the register pressure by
212    packing live variables into spill records.  How we achieve this is
213    completely different than what we did in the old code.
214    
215    First, there is something about the MLRiscGen code generator
216    that we should be aware of:
217    
218    o MLRiscGen performs code motion!
219    
220       In particular, it will move floating point computations and
221       address computations involving only the heap pointer to
222       their use sites (if there is only a single use).
223       What this means is that if we have a CPS record construction
224       statement
225    
226           RECORD(k,vl,w,e)
227    
228       we should never count the new record address w as live if w
229       has only one use (which is often the case).
230    
231       We should do something similar to floating point, but the transformation
232       there is much more complex, so I won't deal with that.
233    
234    Secondly, there are now two new cps primops at our disposal:
235    
236     1. rawrecord of record_kind option
237        This pure operator allocates some uninitialized storage from the heap.
238        There are two forms:
239    
240         rawrecord NONE [INT n]  allocates a tagless record of length n
241         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
242                                     and initializes the tag.
243    
244     2. rawupdate of cty
245          rawupdate cty (v,i,x)
246          Assigns to x to the ith component of record v.
247          The storelist is not updated.
248    
249    We use these new primops for both spilling and increment record construction.
250    
251     1. Spilling.
252    
253        This is implemented with a linear scan algorithm (but generalized
254        to trees).  The algorithm will create a single spill record at the
255        beginning of the cps function and use rawupdate to spill to it,
256        and SELECT or SELp to reload from it.  So both spills and reloads
257        are fine-grain operations.  In contrast, in the old algorithm
258        "spills" have to be bundled together in records.
259    
260        Ideally, we should sink the spill record construction to where
261        it is needed.  We can even split the spill record into multiple ones
262        at the places where they are needed.  But CPS is not a good
263        representation for global code motion, so I'll keep it simple and
264        am not attempting this.
265    
266     2. Incremental record construction (aka record splitting).
267    
268        Long records with many component values which are simulatenously live
269        (recall that single use record addresses are not considered to
270         be live) are constructed with rawrecord and rawupdate.
271        We allocate space on the heap with rawrecord first, then gradually
272        fill it in with rawupdate.  This is the technique suggested to me
273        by Matthias.
274    
275        Some restrictions on when this is applicable:
276        1. It is not a VECTOR record.  The code generator currently does not handle
277           this case. VECTOR record uses double indirection like arrays.
278        2. All the record component values are defined in the same "basic block"
279           as the record constructor.  This is to prevent speculative
280           record construction.
281    
282    ----------------------------------------------------------------------
283    Name: Allen Leung
284    Date: 2002/02/22 01:02:00 EST
285    Tag: leunga-20020222-mlrisc-tools
286    
287    Minor bug fixes in the parser and rewriter
288    
289    ----------------------------------------------------------------------
290    Name: Allen Leung
291    Date: 2002/02/21 20:20:00 EST
292    Tag: leunga-20020221-peephole
293    
294    Regenerated the peephole files.  Some contained typos in the specification
295    and some didn't compile because of pretty printing bugs in the old version
296    of 'nowhere'.
297    
298    ----------------------------------------------------------------------
299    Name: Allen Leung
300    Date: 2002/02/19 20:20:00 EST
301    Tag: leunga-20020219-mlrisc-tools
302    Description:
303    
304       Minor bug fixes to the mlrisc-tools library:
305    
306       1.  Fixed up parsing colon suffixed keywords
307       2.  Added the ability to shut the error messages up
308       3.  Reimplemented the pretty printer and fixed up/improved
309           the pretty printing of handle and -> types.
310       4.  Fixed up generation of literal symbols in the nowhere tool.
311       5.  Added some SML keywords to to sml.sty
312    
313    ----------------------------------------------------------------------
314    Name: Matthias Blume
315    Date: 2002/02/19 16:20:00 EST
316    Tag: blume-20020219-cmffi
317    Description:
318    
319    A wild mix of changes, some minor, some major:
320    
321    * All C FFI-related libraries are now anchored under $c:
322        $/c.cm      --> $c/c.cm
323        $/c-int.cm  --> $c/internals/c-int.cm
324        $/memory.cm --> $c/memory/memory.cm
325    
326    * "make" tool (in CM) now treats its argument pathname slightly
327      differently:
328        1. If the native expansion is an absolute name, then before invoking
329           the "make" command on it, CM will apply OS.Path.mkRelative
330           (with relativeTo = OS.FileSys.getDir()) to it.
331        2. The argument will be passed through to subsequent phases of CM
332           processing without "going native".  In particular, if the argument
333           was an anchored path, then "make" will not lose track of that anchor.
334    
335    * Compiler backends now "know" their respective C calling conventions
336      instead of having to be told about it by ml-nlffigen.  This relieves
337      ml-nlffigen from one of its burdens.
338    
339    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
340    
341    * Export C_DEBUG and C_Debug from $c/c.cm.
342    
343    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
344      subtyping relationship between incomplete pointers and their complete
345      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
346      with the convention of instantiating 'o with ('t, 'c) obj whenever
347      the pointer target type is complete.  In the incomplete case, 'o
348      will be instantiated with some "'c iobj" -- a type obtained by
349      using one of the functors PointerToIncompleteType or PointerToCompleteType.
350    
351      Operations that work on both incomplete and complete pointer types are
352      typed as taking an 'o ptr while operations that require the target to
353      be known are typed as taking some ('t, 'c) obj ptr.
354    
355      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
356      where void is an eqtype without any values.  This makes it possible
357      to work on voidptr values using functions meant to operate on light
358      incomplete pointers.
359    
360    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
361      been vastly simplified.
362    
363    ----------------------------------------------------------------------
364    Name: Matthias Blume
365    Date: 2002/02/19 10:48:00 EST
366    Tag: blume-20020219-pqfix
367    Description:
368    
369    Applied Chris Okasaki's bug fix for priority queues.
370    
371    ----------------------------------------------------------------------
372    Name: Matthias Blume
373    Date: 2002/02/15 17:05:00
374    Tag: Release_110_39
375    Description:
376    
377    Last-minute retagging is becoming a tradition... :-(
378    
379    This is the working release 110.39.
380    
381    ----------------------------------------------------------------------
382    Name: Matthias Blume
383    Date: 2002/02/15 16:00:00 EST
384    Tag: Release_110_39-orig
385    Description:
386    
387    Working release 110.39.  New bootfiles.
388    
389    (Update: There was a small bug in the installer so it wouldn't work
390    with all shells.  So I retagged. -Matthias)
391    
392    ----------------------------------------------------------------------
393    Name: Matthias Blume
394    Date: 2002/02/15 14:17:00 EST
395    Tag: blume-20020215-showbindings
396    Description:
397    
398    Added EnvRef.listBoundSymbols and CM.State.showBindings.  Especially
399    the latter can be useful for exploring what bindings are available at
400    the interactive prompt.  (The first function returns only the list
401    of symbols that are really bound, the second prints those but also the
402    ones that CM's autoloading mechanism knows about.)
403    
404    ----------------------------------------------------------------------
405  Name: Matthias Blume  Name: Matthias Blume
406  Date: 2002/02/15 12:08:00 EST  Date: 2002/02/15 12:08:00 EST
407  Tag: blume-20020215-iptrs  Tag: blume-20020215-iptrs

Legend:
Removed from v.1067  
changed lines
  Added in v.1124

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0