Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1069, Fri Feb 15 21:00:05 2002 UTC revision 1131, Mon Mar 11 15:20:52 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2002/03/11 10:30:00 EST
18    Tag: leunga-20020310-runtime-string0
19    Description:
20    
21       The representation of the empty string now points to a
22    legal null terminated C string instead of unit.  It is now possible
23    to convert an ML string into C string with InlineT.CharVector.getData.
24    This compiles into one single machine instruction.
25    
26    ----------------------------------------------------------------------
27    Name: Allen Leung
28    Date: 2002/03/10 23:55:00 EST
29    Tag: leunga-20020310-x86-call
30    Description:
31    
32       Added machine generation for CALL instruction (relative displacement mode)
33    
34    ----------------------------------------------------------------------
35  Name: Matthias Blume  Name: Matthias Blume
36  Date: 2002/02/15 16:00:00 EST  Date: 2002/03/08 16:05:00
37    Tag: blume-20020308-entrypoints
38    Description:
39    
40    Version number bumped to 110.39.1.  NEW BOOTFILES!
41    
42    Entrypoints: non-zero offset into a code object where execution should begin.
43    
44    - Added the notion of an entrypoint to CodeObj.
45    - Added reading/writing of entrypoint info to Binfile.
46    - Made runtime system bootloader aware of entrypoints.
47    - Use the address of the label of the first function given to mlriscGen
48      as the entrypoint.  This address is currently always 0, but it will
49      not be 0 once we turn on block placement.
50    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
51      with entry points) from mlriscGen.
52    
53    ----------------------------------------------------------------------
54    Name: Allen Leung
55    Date: 2002/03/07 20:45:00 EST
56    Tag: leunga-20020307-x86-cmov
57    Description:
58    
59       Bug fixes for CMOVcc on x86.
60    
61       1. Added machine code generation for CMOVcc
62       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
63       3. CMOVcc cannot have an immediate operand as argument.
64    
65    ----------------------------------------------------------------------
66    Name: Matthias Blume
67    Date: 2002/03/07 16:15:00 EST
68    Tag: blume-20020307-controls
69    Description:
70    
71    This is a very large but mostly boring patch which makes (almost)
72    every tuneable compiler knob (i.e., pretty much everything under
73    Control.* plus a few other things) configurable via both the command
74    line and environment variables in the style CM did its configuration
75    until now.
76    
77    Try starting sml with '-h' (or, if you are brave, '-H')
78    
79    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
80    implements the underlying generic mechanism.
81    
82    The interface to some of the existing such facilities has changed somewhat.
83    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
84    (The getFoo interface is still there for backward-compatibility, but its
85    use is deprecated.)
86    
87    The ml-build script passes -Cxxx=yyy command-line arguments through so
88    that one can now twiddle the compiler settings when using this "batch"
89    compiler.
90    
91    TODO items:
92    
93    We should go through and throw out all controls that are no longer
94    connected to anything.  Moreover, we should go through and provide
95    meaningful (and correct!) documentation strings for those controls
96    that still are connected.
97    
98    Currently, multiple calls to Controls.new are accepted (only the first
99    has any effect).  Eventually we should make sure that every control
100    is being made (via Controls.new) exactly once.  Future access can then
101    be done using Controls.acc.
102    
103    Finally, it would probably be a good idea to use the getter-setter
104    interface to controls rather than ref cells.  For the time being, both
105    styles are provided by the Controls module, but getter-setter pairs are
106    better if thread-safety is of any concern because they can be wrapped.
107    
108    *****************************************
109    
110    One bug fix: The function blockPlacement in three of the MLRISC
111    backpatch files used to be hard-wired to one of two possibilities at
112    link time (according to the value of the placementFlag).  But (I
113    think) it should rather sense the flag every time.
114    
115    *****************************************
116    
117    Other assorted changes (by other people who did not supply a HISTORY entry):
118    
119    1. the cross-module inliner now works much better (Monnier)
120    2. representation of weights, frequencies, and probabilities in MLRISC
121       changed in preparation of using those for weighted block placement
122       (Reppy, George)
123    
124    ----------------------------------------------------------------------
125    Name: Lal George
126    Date: 2002/03/07 14:44:24 EST 2002
127    Tag: george-20020307-weighted-block-placement
128    
129    Tested the weighted block placement optimization on all architectures
130    (except the hppa) using AMPL to generate the block and edge frequencies.
131    Changes were required in the machine properties to correctly
132    categorize trap instructions. There is an MLRISC flag
133    "weighted-block-placement" that can be used to enable weighted block
134    placement, but this will be ineffective without block/edge
135    frequencies (coming soon).
136    
137    
138    ----------------------------------------------------------------------
139    Name: Lal George
140    Date: 2002/03/05 17:24:48 EST
141    Tag: george-20020305-linkage-cluster
142    
143    In order to support the block placement optimization, a new cluster
144    is generated as the very first cluster (called the linkage cluster).
145    It contains a single jump to the 'real' entry point for the compilation
146    unit. Block placement has no effect on the linkage cluster itself, but
147    all the other clusters  have full freedom in the manner in which they
148    reorder blocks or functions.
149    
150    On the x86 the typical linkage code that is generated is:
151       ----------------------
152            .align 2
153       L0:
154            addl    $L1-L0, 72(%esp)
155            jmp     L1
156    
157    
158            .align  2
159       L1:
160       ----------------------
161    
162    72(%esp) is the memory location for the stdlink register. This
163    must contain the address of the CPS function being called. In the
164    above example, it contains the address of  L0; before
165    calling L1 (the real entry point for the compilation unit), it
166    must contain the address for L1, and hence
167    
168            addl $L1-L0, 72(%esp)
169    
170    I have tested this on all architectures except the hppa.The increase
171    in code size is of course negligible
172    
173    ----------------------------------------------------------------------
174    Name: Allen Leung
175    Date: 2002/03/03 13:20:00 EST
176    Tag: leunga-20020303-mlrisc-tools
177    
178      Added #[ ... ] expressions to mlrisc tools
179    
180    ----------------------------------------------------------------------
181    Name: Matthias Blume
182    Date: 2002/02/27 12:29:00 EST
183    Tag: blume-20020227-cdebug
184    Description:
185    
186    - made types in structure C and C_Debug to be equal
187    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
188    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
189    
190    ----------------------------------------------------------------------
191    Name: Matthias Blume
192    Date: 2002/02/26 12:00:00 EST
193    Tag: blume-20020226-ffi
194    Description:
195    
196    1. Fixed a minor bug in CM's "noweb" tool:
197       If numbering is turned off, then truly don't number (i.e., do not
198       supply the -L option to noweb).  The previous behavior was to supply
199       -L'' -- which caused noweb to use the "default" line numbering scheme.
200       Thanks to Chris Richards for pointing this out (and supplying the fix).
201    
202    2. Once again, I reworked some aspects of the FFI:
203    
204       A. The incomplete/complete type business:
205    
206       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
207         gone!
208       - ML types representing an incomplete type are now *equal* to
209         ML types representing their corresponding complete types (just like
210         in C).  This is still safe because ml-nlffigen will not generate
211         RTTI for incomplete types, nor will it generate functions that
212         require access to such RTTI.   But when ML code generated from both
213         incomplete and complete versions of the C type meet, the ML types
214         are trivially interoperable.
215    
216         NOTE:  These changes restore the full generality of the translation
217         (which was previously lost when I eliminated functorization)!
218    
219       B. Enum types:
220    
221       - Structure C now has a type constructor "enum" that is similar to
222         how the "su" constructor works.  However, "enum" is not a phantom
223         type because each "T enum" has values (and is isomorphic to
224         MLRep.Signed.int).
225       - There are generic access operations for enum objects (using
226         MLRep.Signed.int).
227       - ml-nlffigen will generate a structure E_foo for each "enum foo".
228         * The structure contains the definition of type "mlrep" (the ML-side
229         representation type of the enum).  Normally, mlrep is the same
230         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
231         then mlrep will be defined as a datatype -- thus facilitating
232         pattern matching on mlrep values.
233         ("-ec" will be suppressed if there are duplicate values in an
234          enumeration.)
235         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
236         will be generated for each C enum constant xxx.
237         * Conversion functions m2i and i2m convert between mlrep and
238         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
239         * Coversion functions c and ml convert between mlrep and "tag enum".
240         * Access functions (get/set) fetch and store mlrep values.
241       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
242         enumerations are merged into one single enumeration represented by
243         structure E_'.
244    
245    ----------------------------------------------------------------------
246    Name: Allen Leung
247    Date: 2002/02/25 04:45:00 EST
248    Tag: leunga-20020225-cps-spill
249    
250    This is a new implementation of the CPS spill phase.
251    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
252    In case of problems, replace it with the old file spill.sml
253    
254    The current compiler runs into some serious performance problems when
255    constructing a large record.  This can happen when we try to compile a
256    structure with many items.  Even a very simple structure like the following
257    makes the compiler slow down.
258    
259        structure Foo = struct
260           val x_1 = 0w1 : Word32.int
261           val x_2 = 0w2 : Word32.int
262           val x_3 = 0w3 : Word32.int
263           ...
264           val x_N = 0wN : Word32.int
265        end
266    
267    The following table shows the compile time, from N=1000 to N=4000,
268    with the old compiler:
269    
270    N
271    1000   CPS 100 spill                           0.04u  0.00s  0.00g
272           MLRISC ra                               0.06u  0.00s  0.05g
273              (spills = 0 reloads = 0)
274           TOTAL                                   0.63u  0.07s  0.21g
275    
276    1100   CPS 100 spill                           8.25u  0.32s  0.64g
277           MLRISC ra                               5.68u  0.59s  3.93g
278              (spills = 0 reloads = 0)
279           TOTAL                                   14.71u  0.99s  4.81g
280    
281    1500   CPS 100 spill                           58.55u  2.34s  1.74g
282           MLRISC ra                               5.54u  0.65s  3.91g
283              (spills = 543 reloads = 1082)
284           TOTAL                                   65.40u  3.13s  6.00g
285    
286    2000   CPS 100 spill                           126.69u  4.84s  3.08g
287           MLRISC ra                               0.80u  0.10s  0.55g
288              (spills = 42 reloads = 84)
289           TOTAL                                   129.42u  5.10s  4.13g
290    
291    3000   CPS 100 spill                           675.59u  19.03s  11.64g
292           MLRISC ra                               2.69u  0.27s  1.38g
293              (spills = 62 reloads = 124)
294           TOTAL                                   682.48u  19.61s  13.99g
295    
296    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
297           MLRISC ra                               4.96u  0.27s  2.72g
298              (spills = 85 reloads = 170)
299           TOTAL                                   2375.26u  57.21s  48.00g
300    
301    As you can see the old cps spill module suffers from some serious
302    performance problem.  But since I cannot decipher the old code fully,
303    instead of patching the problems up, I'm reimplementing it
304    with a different algorithm.  The new code is more modular,
305    smaller when compiled, and substantially faster
306    (O(n log n) time and O(n) space).  Timing of the new spill module:
307    
308    4000  CPS 100 spill                           0.02u  0.00s  0.00g
309          MLRISC ra                               0.25u  0.02s  0.15g
310             (spills=1 reloads=3)
311          TOTAL                                   7.74u  0.34s  1.62g
312    
313    Implementation details:
314    
315    As far as I can tell, the purpose of the CPS spill module is to make sure the
316    number of live variables at any program point (the bandwidth)
317    does not exceed a certain limit, which is determined by the
318    size of the spill area.
319    
320    When the bandwidth is too large, we decrease the register pressure by
321    packing live variables into spill records.  How we achieve this is
322    completely different than what we did in the old code.
323    
324    First, there is something about the MLRiscGen code generator
325    that we should be aware of:
326    
327    o MLRiscGen performs code motion!
328    
329       In particular, it will move floating point computations and
330       address computations involving only the heap pointer to
331       their use sites (if there is only a single use).
332       What this means is that if we have a CPS record construction
333       statement
334    
335           RECORD(k,vl,w,e)
336    
337       we should never count the new record address w as live if w
338       has only one use (which is often the case).
339    
340       We should do something similar to floating point, but the transformation
341       there is much more complex, so I won't deal with that.
342    
343    Secondly, there are now two new cps primops at our disposal:
344    
345     1. rawrecord of record_kind option
346        This pure operator allocates some uninitialized storage from the heap.
347        There are two forms:
348    
349         rawrecord NONE [INT n]  allocates a tagless record of length n
350         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
351                                     and initializes the tag.
352    
353     2. rawupdate of cty
354          rawupdate cty (v,i,x)
355          Assigns to x to the ith component of record v.
356          The storelist is not updated.
357    
358    We use these new primops for both spilling and increment record construction.
359    
360     1. Spilling.
361    
362        This is implemented with a linear scan algorithm (but generalized
363        to trees).  The algorithm will create a single spill record at the
364        beginning of the cps function and use rawupdate to spill to it,
365        and SELECT or SELp to reload from it.  So both spills and reloads
366        are fine-grain operations.  In contrast, in the old algorithm
367        "spills" have to be bundled together in records.
368    
369        Ideally, we should sink the spill record construction to where
370        it is needed.  We can even split the spill record into multiple ones
371        at the places where they are needed.  But CPS is not a good
372        representation for global code motion, so I'll keep it simple and
373        am not attempting this.
374    
375     2. Incremental record construction (aka record splitting).
376    
377        Long records with many component values which are simulatenously live
378        (recall that single use record addresses are not considered to
379         be live) are constructed with rawrecord and rawupdate.
380        We allocate space on the heap with rawrecord first, then gradually
381        fill it in with rawupdate.  This is the technique suggested to me
382        by Matthias.
383    
384        Some restrictions on when this is applicable:
385        1. It is not a VECTOR record.  The code generator currently does not handle
386           this case. VECTOR record uses double indirection like arrays.
387        2. All the record component values are defined in the same "basic block"
388           as the record constructor.  This is to prevent speculative
389           record construction.
390    
391    ----------------------------------------------------------------------
392    Name: Allen Leung
393    Date: 2002/02/22 01:02:00 EST
394    Tag: leunga-20020222-mlrisc-tools
395    
396    Minor bug fixes in the parser and rewriter
397    
398    ----------------------------------------------------------------------
399    Name: Allen Leung
400    Date: 2002/02/21 20:20:00 EST
401    Tag: leunga-20020221-peephole
402    
403    Regenerated the peephole files.  Some contained typos in the specification
404    and some didn't compile because of pretty printing bugs in the old version
405    of 'nowhere'.
406    
407    ----------------------------------------------------------------------
408    Name: Allen Leung
409    Date: 2002/02/19 20:20:00 EST
410    Tag: leunga-20020219-mlrisc-tools
411    Description:
412    
413       Minor bug fixes to the mlrisc-tools library:
414    
415       1.  Fixed up parsing colon suffixed keywords
416       2.  Added the ability to shut the error messages up
417       3.  Reimplemented the pretty printer and fixed up/improved
418           the pretty printing of handle and -> types.
419       4.  Fixed up generation of literal symbols in the nowhere tool.
420       5.  Added some SML keywords to to sml.sty
421    
422    ----------------------------------------------------------------------
423    Name: Matthias Blume
424    Date: 2002/02/19 16:20:00 EST
425    Tag: blume-20020219-cmffi
426    Description:
427    
428    A wild mix of changes, some minor, some major:
429    
430    * All C FFI-related libraries are now anchored under $c:
431        $/c.cm      --> $c/c.cm
432        $/c-int.cm  --> $c/internals/c-int.cm
433        $/memory.cm --> $c/memory/memory.cm
434    
435    * "make" tool (in CM) now treats its argument pathname slightly
436      differently:
437        1. If the native expansion is an absolute name, then before invoking
438           the "make" command on it, CM will apply OS.Path.mkRelative
439           (with relativeTo = OS.FileSys.getDir()) to it.
440        2. The argument will be passed through to subsequent phases of CM
441           processing without "going native".  In particular, if the argument
442           was an anchored path, then "make" will not lose track of that anchor.
443    
444    * Compiler backends now "know" their respective C calling conventions
445      instead of having to be told about it by ml-nlffigen.  This relieves
446      ml-nlffigen from one of its burdens.
447    
448    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
449    
450    * Export C_DEBUG and C_Debug from $c/c.cm.
451    
452    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
453      subtyping relationship between incomplete pointers and their complete
454      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
455      with the convention of instantiating 'o with ('t, 'c) obj whenever
456      the pointer target type is complete.  In the incomplete case, 'o
457      will be instantiated with some "'c iobj" -- a type obtained by
458      using one of the functors PointerToIncompleteType or PointerToCompleteType.
459    
460      Operations that work on both incomplete and complete pointer types are
461      typed as taking an 'o ptr while operations that require the target to
462      be known are typed as taking some ('t, 'c) obj ptr.
463    
464      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
465      where void is an eqtype without any values.  This makes it possible
466      to work on voidptr values using functions meant to operate on light
467      incomplete pointers.
468    
469    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
470      been vastly simplified.
471    
472    ----------------------------------------------------------------------
473    Name: Matthias Blume
474    Date: 2002/02/19 10:48:00 EST
475    Tag: blume-20020219-pqfix
476    Description:
477    
478    Applied Chris Okasaki's bug fix for priority queues.
479    
480    ----------------------------------------------------------------------
481    Name: Matthias Blume
482    Date: 2002/02/15 17:05:00
483  Tag: Release_110_39  Tag: Release_110_39
484  Description:  Description:
485    
486    Last-minute retagging is becoming a tradition... :-(
487    
488    This is the working release 110.39.
489    
490    ----------------------------------------------------------------------
491    Name: Matthias Blume
492    Date: 2002/02/15 16:00:00 EST
493    Tag: Release_110_39-orig
494    Description:
495    
496  Working release 110.39.  New bootfiles.  Working release 110.39.  New bootfiles.
497    
498    (Update: There was a small bug in the installer so it wouldn't work
499    with all shells.  So I retagged. -Matthias)
500    
501  ----------------------------------------------------------------------  ----------------------------------------------------------------------
502  Name: Matthias Blume  Name: Matthias Blume
503  Date: 2002/02/15 14:17:00 EST  Date: 2002/02/15 14:17:00 EST

Legend:
Removed from v.1069  
changed lines
  Added in v.1131

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0