Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1073, Fri Feb 15 22:07:38 2002 UTC revision 1127, Fri Mar 8 01:35:33 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2002/03/07 20:45:00 EST
18    Tag: leunga-20020307-x86-cmov
19    Description:
20    
21       Bug fixes for CMOVcc on x86.
22    
23       1. Added machine code generation for CMOVcc
24       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
25       3. CMOVcc cannot have an immediate operand as argument.
26    
27    ----------------------------------------------------------------------
28    Name: Matthias Blume
29    Date: 2002/03/07 16:15:00 EST
30    Tag: blume-20020307-controls
31    Description:
32    
33    This is a very large but mostly boring patch which makes (almost)
34    every tuneable compiler knob (i.e., pretty much everything under
35    Control.* plus a few other things) configurable via both the command
36    line and environment variables in the style CM did its configuration
37    until now.
38    
39    Try starting sml with '-h' (or, if you are brave, '-H')
40    
41    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
42    implements the underlying generic mechanism.
43    
44    The interface to some of the existing such facilities has changed somewhat.
45    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
46    (The getFoo interface is still there for backward-compatibility, but its
47    use is deprecated.)
48    
49    The ml-build script passes -Cxxx=yyy command-line arguments through so
50    that one can now twiddle the compiler settings when using this "batch"
51    compiler.
52    
53    TODO items:
54    
55    We should go through and throw out all controls that are no longer
56    connected to anything.  Moreover, we should go through and provide
57    meaningful (and correct!) documentation strings for those controls
58    that still are connected.
59    
60    Currently, multiple calls to Controls.new are accepted (only the first
61    has any effect).  Eventually we should make sure that every control
62    is being made (via Controls.new) exactly once.  Future access can then
63    be done using Controls.acc.
64    
65    Finally, it would probably be a good idea to use the getter-setter
66    interface to controls rather than ref cells.  For the time being, both
67    styles are provided by the Controls module, but getter-setter pairs are
68    better if thread-safety is of any concern because they can be wrapped.
69    
70    *****************************************
71    
72    One bug fix: The function blockPlacement in three of the MLRISC
73    backpatch files used to be hard-wired to one of two possibilities at
74    link time (according to the value of the placementFlag).  But (I
75    think) it should rather sense the flag every time.
76    
77    *****************************************
78    
79    Other assorted changes (by other people who did not supply a HISTORY entry):
80    
81    1. the cross-module inliner now works much better (Monnier)
82    2. representation of weights, frequencies, and probabilities in MLRISC
83       changed in preparation of using those for weighted block placement
84       (Reppy, George)
85    
86    ----------------------------------------------------------------------
87    Name: Lal George
88    Date: 2002/03/07 14:44:24 EST 2002
89    Tag: george-20020307-weighted-block-placement
90    
91    Tested the weighted block placement optimization on all architectures
92    (except the hppa) using AMPL to generate the block and edge frequencies.
93    Changes were required in the machine properties to correctly
94    categorize trap instructions. There is an MLRISC flag
95    "weighted-block-placement" that can be used to enable weighted block
96    placement, but this will be ineffective without block/edge
97    frequencies (coming soon).
98    
99    
100    ----------------------------------------------------------------------
101    Name: Lal George
102    Date: 2002/03/05 17:24:48 EST
103    Tag: george-20020305-linkage-cluster
104    
105    In order to support the block placement optimization, a new cluster
106    is generated as the very first cluster (called the linkage cluster).
107    It contains a single jump to the 'real' entry point for the compilation
108    unit. Block placement has no effect on the linkage cluster itself, but
109    all the other clusters  have full freedom in the manner in which they
110    reorder blocks or functions.
111    
112    On the x86 the typical linkage code that is generated is:
113       ----------------------
114            .align 2
115       L0:
116            addl    $L1-L0, 72(%esp)
117            jmp     L1
118    
119    
120            .align  2
121       L1:
122       ----------------------
123    
124    72(%esp) is the memory location for the stdlink register. This
125    must contain the address of the CPS function being called. In the
126    above example, it contains the address of  L0; before
127    calling L1 (the real entry point for the compilation unit), it
128    must contain the address for L1, and hence
129    
130            addl $L1-L0, 72(%esp)
131    
132    I have tested this on all architectures except the hppa.The increase
133    in code size is of course negligible
134    
135    ----------------------------------------------------------------------
136    Name: Allen Leung
137    Date: 2002/03/03 13:20:00 EST
138    Tag: leunga-20020303-mlrisc-tools
139    
140      Added #[ ... ] expressions to mlrisc tools
141    
142    ----------------------------------------------------------------------
143    Name: Matthias Blume
144    Date: 2002/02/27 12:29:00 EST
145    Tag: blume-20020227-cdebug
146    Description:
147    
148    - made types in structure C and C_Debug to be equal
149    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
150    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
151    
152    ----------------------------------------------------------------------
153    Name: Matthias Blume
154    Date: 2002/02/26 12:00:00 EST
155    Tag: blume-20020226-ffi
156    Description:
157    
158    1. Fixed a minor bug in CM's "noweb" tool:
159       If numbering is turned off, then truly don't number (i.e., do not
160       supply the -L option to noweb).  The previous behavior was to supply
161       -L'' -- which caused noweb to use the "default" line numbering scheme.
162       Thanks to Chris Richards for pointing this out (and supplying the fix).
163    
164    2. Once again, I reworked some aspects of the FFI:
165    
166       A. The incomplete/complete type business:
167    
168       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
169         gone!
170       - ML types representing an incomplete type are now *equal* to
171         ML types representing their corresponding complete types (just like
172         in C).  This is still safe because ml-nlffigen will not generate
173         RTTI for incomplete types, nor will it generate functions that
174         require access to such RTTI.   But when ML code generated from both
175         incomplete and complete versions of the C type meet, the ML types
176         are trivially interoperable.
177    
178         NOTE:  These changes restore the full generality of the translation
179         (which was previously lost when I eliminated functorization)!
180    
181       B. Enum types:
182    
183       - Structure C now has a type constructor "enum" that is similar to
184         how the "su" constructor works.  However, "enum" is not a phantom
185         type because each "T enum" has values (and is isomorphic to
186         MLRep.Signed.int).
187       - There are generic access operations for enum objects (using
188         MLRep.Signed.int).
189       - ml-nlffigen will generate a structure E_foo for each "enum foo".
190         * The structure contains the definition of type "mlrep" (the ML-side
191         representation type of the enum).  Normally, mlrep is the same
192         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
193         then mlrep will be defined as a datatype -- thus facilitating
194         pattern matching on mlrep values.
195         ("-ec" will be suppressed if there are duplicate values in an
196          enumeration.)
197         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
198         will be generated for each C enum constant xxx.
199         * Conversion functions m2i and i2m convert between mlrep and
200         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
201         * Coversion functions c and ml convert between mlrep and "tag enum".
202         * Access functions (get/set) fetch and store mlrep values.
203       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
204         enumerations are merged into one single enumeration represented by
205         structure E_'.
206    
207    ----------------------------------------------------------------------
208    Name: Allen Leung
209    Date: 2002/02/25 04:45:00 EST
210    Tag: leunga-20020225-cps-spill
211    
212    This is a new implementation of the CPS spill phase.
213    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
214    In case of problems, replace it with the old file spill.sml
215    
216    The current compiler runs into some serious performance problems when
217    constructing a large record.  This can happen when we try to compile a
218    structure with many items.  Even a very simple structure like the following
219    makes the compiler slow down.
220    
221        structure Foo = struct
222           val x_1 = 0w1 : Word32.int
223           val x_2 = 0w2 : Word32.int
224           val x_3 = 0w3 : Word32.int
225           ...
226           val x_N = 0wN : Word32.int
227        end
228    
229    The following table shows the compile time, from N=1000 to N=4000,
230    with the old compiler:
231    
232    N
233    1000   CPS 100 spill                           0.04u  0.00s  0.00g
234           MLRISC ra                               0.06u  0.00s  0.05g
235              (spills = 0 reloads = 0)
236           TOTAL                                   0.63u  0.07s  0.21g
237    
238    1100   CPS 100 spill                           8.25u  0.32s  0.64g
239           MLRISC ra                               5.68u  0.59s  3.93g
240              (spills = 0 reloads = 0)
241           TOTAL                                   14.71u  0.99s  4.81g
242    
243    1500   CPS 100 spill                           58.55u  2.34s  1.74g
244           MLRISC ra                               5.54u  0.65s  3.91g
245              (spills = 543 reloads = 1082)
246           TOTAL                                   65.40u  3.13s  6.00g
247    
248    2000   CPS 100 spill                           126.69u  4.84s  3.08g
249           MLRISC ra                               0.80u  0.10s  0.55g
250              (spills = 42 reloads = 84)
251           TOTAL                                   129.42u  5.10s  4.13g
252    
253    3000   CPS 100 spill                           675.59u  19.03s  11.64g
254           MLRISC ra                               2.69u  0.27s  1.38g
255              (spills = 62 reloads = 124)
256           TOTAL                                   682.48u  19.61s  13.99g
257    
258    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
259           MLRISC ra                               4.96u  0.27s  2.72g
260              (spills = 85 reloads = 170)
261           TOTAL                                   2375.26u  57.21s  48.00g
262    
263    As you can see the old cps spill module suffers from some serious
264    performance problem.  But since I cannot decipher the old code fully,
265    instead of patching the problems up, I'm reimplementing it
266    with a different algorithm.  The new code is more modular,
267    smaller when compiled, and substantially faster
268    (O(n log n) time and O(n) space).  Timing of the new spill module:
269    
270    4000  CPS 100 spill                           0.02u  0.00s  0.00g
271          MLRISC ra                               0.25u  0.02s  0.15g
272             (spills=1 reloads=3)
273          TOTAL                                   7.74u  0.34s  1.62g
274    
275    Implementation details:
276    
277    As far as I can tell, the purpose of the CPS spill module is to make sure the
278    number of live variables at any program point (the bandwidth)
279    does not exceed a certain limit, which is determined by the
280    size of the spill area.
281    
282    When the bandwidth is too large, we decrease the register pressure by
283    packing live variables into spill records.  How we achieve this is
284    completely different than what we did in the old code.
285    
286    First, there is something about the MLRiscGen code generator
287    that we should be aware of:
288    
289    o MLRiscGen performs code motion!
290    
291       In particular, it will move floating point computations and
292       address computations involving only the heap pointer to
293       their use sites (if there is only a single use).
294       What this means is that if we have a CPS record construction
295       statement
296    
297           RECORD(k,vl,w,e)
298    
299       we should never count the new record address w as live if w
300       has only one use (which is often the case).
301    
302       We should do something similar to floating point, but the transformation
303       there is much more complex, so I won't deal with that.
304    
305    Secondly, there are now two new cps primops at our disposal:
306    
307     1. rawrecord of record_kind option
308        This pure operator allocates some uninitialized storage from the heap.
309        There are two forms:
310    
311         rawrecord NONE [INT n]  allocates a tagless record of length n
312         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
313                                     and initializes the tag.
314    
315     2. rawupdate of cty
316          rawupdate cty (v,i,x)
317          Assigns to x to the ith component of record v.
318          The storelist is not updated.
319    
320    We use these new primops for both spilling and increment record construction.
321    
322     1. Spilling.
323    
324        This is implemented with a linear scan algorithm (but generalized
325        to trees).  The algorithm will create a single spill record at the
326        beginning of the cps function and use rawupdate to spill to it,
327        and SELECT or SELp to reload from it.  So both spills and reloads
328        are fine-grain operations.  In contrast, in the old algorithm
329        "spills" have to be bundled together in records.
330    
331        Ideally, we should sink the spill record construction to where
332        it is needed.  We can even split the spill record into multiple ones
333        at the places where they are needed.  But CPS is not a good
334        representation for global code motion, so I'll keep it simple and
335        am not attempting this.
336    
337     2. Incremental record construction (aka record splitting).
338    
339        Long records with many component values which are simulatenously live
340        (recall that single use record addresses are not considered to
341         be live) are constructed with rawrecord and rawupdate.
342        We allocate space on the heap with rawrecord first, then gradually
343        fill it in with rawupdate.  This is the technique suggested to me
344        by Matthias.
345    
346        Some restrictions on when this is applicable:
347        1. It is not a VECTOR record.  The code generator currently does not handle
348           this case. VECTOR record uses double indirection like arrays.
349        2. All the record component values are defined in the same "basic block"
350           as the record constructor.  This is to prevent speculative
351           record construction.
352    
353    ----------------------------------------------------------------------
354    Name: Allen Leung
355    Date: 2002/02/22 01:02:00 EST
356    Tag: leunga-20020222-mlrisc-tools
357    
358    Minor bug fixes in the parser and rewriter
359    
360    ----------------------------------------------------------------------
361    Name: Allen Leung
362    Date: 2002/02/21 20:20:00 EST
363    Tag: leunga-20020221-peephole
364    
365    Regenerated the peephole files.  Some contained typos in the specification
366    and some didn't compile because of pretty printing bugs in the old version
367    of 'nowhere'.
368    
369    ----------------------------------------------------------------------
370    Name: Allen Leung
371    Date: 2002/02/19 20:20:00 EST
372    Tag: leunga-20020219-mlrisc-tools
373    Description:
374    
375       Minor bug fixes to the mlrisc-tools library:
376    
377       1.  Fixed up parsing colon suffixed keywords
378       2.  Added the ability to shut the error messages up
379       3.  Reimplemented the pretty printer and fixed up/improved
380           the pretty printing of handle and -> types.
381       4.  Fixed up generation of literal symbols in the nowhere tool.
382       5.  Added some SML keywords to to sml.sty
383    
384    ----------------------------------------------------------------------
385    Name: Matthias Blume
386    Date: 2002/02/19 16:20:00 EST
387    Tag: blume-20020219-cmffi
388    Description:
389    
390    A wild mix of changes, some minor, some major:
391    
392    * All C FFI-related libraries are now anchored under $c:
393        $/c.cm      --> $c/c.cm
394        $/c-int.cm  --> $c/internals/c-int.cm
395        $/memory.cm --> $c/memory/memory.cm
396    
397    * "make" tool (in CM) now treats its argument pathname slightly
398      differently:
399        1. If the native expansion is an absolute name, then before invoking
400           the "make" command on it, CM will apply OS.Path.mkRelative
401           (with relativeTo = OS.FileSys.getDir()) to it.
402        2. The argument will be passed through to subsequent phases of CM
403           processing without "going native".  In particular, if the argument
404           was an anchored path, then "make" will not lose track of that anchor.
405    
406    * Compiler backends now "know" their respective C calling conventions
407      instead of having to be told about it by ml-nlffigen.  This relieves
408      ml-nlffigen from one of its burdens.
409    
410    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
411    
412    * Export C_DEBUG and C_Debug from $c/c.cm.
413    
414    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
415      subtyping relationship between incomplete pointers and their complete
416      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
417      with the convention of instantiating 'o with ('t, 'c) obj whenever
418      the pointer target type is complete.  In the incomplete case, 'o
419      will be instantiated with some "'c iobj" -- a type obtained by
420      using one of the functors PointerToIncompleteType or PointerToCompleteType.
421    
422      Operations that work on both incomplete and complete pointer types are
423      typed as taking an 'o ptr while operations that require the target to
424      be known are typed as taking some ('t, 'c) obj ptr.
425    
426      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
427      where void is an eqtype without any values.  This makes it possible
428      to work on voidptr values using functions meant to operate on light
429      incomplete pointers.
430    
431    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
432      been vastly simplified.
433    
434    ----------------------------------------------------------------------
435    Name: Matthias Blume
436    Date: 2002/02/19 10:48:00 EST
437    Tag: blume-20020219-pqfix
438    Description:
439    
440    Applied Chris Okasaki's bug fix for priority queues.
441    
442    ----------------------------------------------------------------------
443  Name: Matthias Blume  Name: Matthias Blume
444  Date: 2002/02/15 17:05:00  Date: 2002/02/15 17:05:00
445  Tag: Release_110_39  Tag: Release_110_39

Legend:
Removed from v.1073  
changed lines
  Added in v.1127

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0