Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1069, Fri Feb 15 21:00:05 2002 UTC revision 1137, Tue Mar 12 22:28:55 2002 UTC
# Line 11  Line 11 
11  Date: yyyy/mm/dd  Date: yyyy/mm/dd
12  Tag: <post-commit CVS tag>  Tag: <post-commit CVS tag>
13  Description:  Description:
14    ----------------------------------------------------------------------
15    Name: Matthias Blume
16    Date: 2002/03/12 17:30:00 EST
17    Tag: blume-20020312-version110392
18    Description:
19    
20    Yet another version number bump (because of small changes to the
21    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
22    
23    Changes:
24    
25      The new pid generation scheme described a few weeks ago was overly
26      complicated.  I implemented a new mechanism that is simpler and
27      provides a bit more "stability":  Once CM has seen a compilation
28      unit, it keeps its identity constant (as long as you do not delete
29      those crucial CM/GUID/* files).  This means that when you change
30      and interface, compiler, then go back to the old interface, and
31      compile again, you arrive at the original pid.
32    
33      There now also is a mechanism that instructs CM to use the plain
34      environment hash as a module's pid (effectively making its GUID
35      the empty string).  For this, "noguid" must be specified as an
36      option to the .sml file in question within its .cm file.
37      This is most useful for code that is being generated by tools such
38      as ml-nlffigen (because during development programmers tend to
39      erase the tool's entire output directory tree including CM's cached
40      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
41      revert to the old, broken behavior of SML/NJ, but in specific cases
42      where there is no danger of interface confusion, its use is ok
43      (I think).
44    
45      ml-nlffigen by default generates "noguid" annotations.  They can be
46      turned off by specifying -guid in its command line.
47    
48    ----------------------------------------------------------------------
49    Name: Lal George
50    Date: 2002/03/12 12 14:42:36 EST
51    Tag: george-20020312-frequency-computation
52    Description:
53    
54    Integrated jump chaining and static block frequency into the
55    compiler. More details and numbers later.
56    
57    ----------------------------------------------------------------------
58    Name: Lal George
59    Date: 2002/03/11 11 22:38:53 EST
60    Tag: george-20020311-jump-chain-elim
61    Description:
62    
63    Tested the jump chain elimination on all architectures (except the
64    hppa).  This is on by default right now and is profitable for the
65    alpha and x86, however, it may not be profitable for the sparc and ppc
66    when compiling the compiler.
67    
68    The gc test will typically jump to a label at the end of the cluster,
69    where there is another jump to an external cluster containing the actual
70    code to invoke gc. This is to allow factoring of common gc invocation
71    sequences. That is to say, we generate:
72    
73            f:
74               testgc
75               ja   L1      % jump if above to L1
76    
77            L1:
78               jmp L2
79    
80    
81    After jump chain elimination the 'ja L1' instructions is converted to
82    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
83    up being implemented in their long form (if L2 is far away) using:
84    
85            jbe     L3      % jump if below or equal to L3
86            jmp     L2
87         L3:
88            ...
89    
90    
91    For large compilation units L2  may be far away.
92    
93    
94  ----------------------------------------------------------------------  ----------------------------------------------------------------------
95  Name: Matthias Blume  Name: Matthias Blume
96  Date: 2002/02/15 16:00:00 EST  Date: 2002/03/11 13:30:00 EST
97    Tag: blume-20020311-mltreeeval
98    Description:
99    
100    A functor parameter was missing.
101    
102    ----------------------------------------------------------------------
103    Name: Allen Leung
104    Date: 2002/03/11 10:30:00 EST
105    Tag: leunga-20020310-runtime-string0
106    Description:
107    
108       The representation of the empty string now points to a
109    legal null terminated C string instead of unit.  It is now possible
110    to convert an ML string into C string with InlineT.CharVector.getData.
111    This compiles into one single machine instruction.
112    
113    ----------------------------------------------------------------------
114    Name: Allen Leung
115    Date: 2002/03/10 23:55:00 EST
116    Tag: leunga-20020310-x86-call
117    Description:
118    
119       Added machine generation for CALL instruction (relative displacement mode)
120    
121    ----------------------------------------------------------------------
122    Name: Matthias Blume
123    Date: 2002/03/08 16:05:00
124    Tag: blume-20020308-entrypoints
125    Description:
126    
127    Version number bumped to 110.39.1.  NEW BOOTFILES!
128    
129    Entrypoints: non-zero offset into a code object where execution should begin.
130    
131    - Added the notion of an entrypoint to CodeObj.
132    - Added reading/writing of entrypoint info to Binfile.
133    - Made runtime system bootloader aware of entrypoints.
134    - Use the address of the label of the first function given to mlriscGen
135      as the entrypoint.  This address is currently always 0, but it will
136      not be 0 once we turn on block placement.
137    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
138      with entry points) from mlriscGen.
139    
140    ----------------------------------------------------------------------
141    Name: Allen Leung
142    Date: 2002/03/07 20:45:00 EST
143    Tag: leunga-20020307-x86-cmov
144    Description:
145    
146       Bug fixes for CMOVcc on x86.
147    
148       1. Added machine code generation for CMOVcc
149       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
150       3. CMOVcc cannot have an immediate operand as argument.
151    
152    ----------------------------------------------------------------------
153    Name: Matthias Blume
154    Date: 2002/03/07 16:15:00 EST
155    Tag: blume-20020307-controls
156    Description:
157    
158    This is a very large but mostly boring patch which makes (almost)
159    every tuneable compiler knob (i.e., pretty much everything under
160    Control.* plus a few other things) configurable via both the command
161    line and environment variables in the style CM did its configuration
162    until now.
163    
164    Try starting sml with '-h' (or, if you are brave, '-H')
165    
166    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
167    implements the underlying generic mechanism.
168    
169    The interface to some of the existing such facilities has changed somewhat.
170    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
171    (The getFoo interface is still there for backward-compatibility, but its
172    use is deprecated.)
173    
174    The ml-build script passes -Cxxx=yyy command-line arguments through so
175    that one can now twiddle the compiler settings when using this "batch"
176    compiler.
177    
178    TODO items:
179    
180    We should go through and throw out all controls that are no longer
181    connected to anything.  Moreover, we should go through and provide
182    meaningful (and correct!) documentation strings for those controls
183    that still are connected.
184    
185    Currently, multiple calls to Controls.new are accepted (only the first
186    has any effect).  Eventually we should make sure that every control
187    is being made (via Controls.new) exactly once.  Future access can then
188    be done using Controls.acc.
189    
190    Finally, it would probably be a good idea to use the getter-setter
191    interface to controls rather than ref cells.  For the time being, both
192    styles are provided by the Controls module, but getter-setter pairs are
193    better if thread-safety is of any concern because they can be wrapped.
194    
195    *****************************************
196    
197    One bug fix: The function blockPlacement in three of the MLRISC
198    backpatch files used to be hard-wired to one of two possibilities at
199    link time (according to the value of the placementFlag).  But (I
200    think) it should rather sense the flag every time.
201    
202    *****************************************
203    
204    Other assorted changes (by other people who did not supply a HISTORY entry):
205    
206    1. the cross-module inliner now works much better (Monnier)
207    2. representation of weights, frequencies, and probabilities in MLRISC
208       changed in preparation of using those for weighted block placement
209       (Reppy, George)
210    
211    ----------------------------------------------------------------------
212    Name: Lal George
213    Date: 2002/03/07 14:44:24 EST 2002
214    Tag: george-20020307-weighted-block-placement
215    
216    Tested the weighted block placement optimization on all architectures
217    (except the hppa) using AMPL to generate the block and edge frequencies.
218    Changes were required in the machine properties to correctly
219    categorize trap instructions. There is an MLRISC flag
220    "weighted-block-placement" that can be used to enable weighted block
221    placement, but this will be ineffective without block/edge
222    frequencies (coming soon).
223    
224    
225    ----------------------------------------------------------------------
226    Name: Lal George
227    Date: 2002/03/05 17:24:48 EST
228    Tag: george-20020305-linkage-cluster
229    
230    In order to support the block placement optimization, a new cluster
231    is generated as the very first cluster (called the linkage cluster).
232    It contains a single jump to the 'real' entry point for the compilation
233    unit. Block placement has no effect on the linkage cluster itself, but
234    all the other clusters  have full freedom in the manner in which they
235    reorder blocks or functions.
236    
237    On the x86 the typical linkage code that is generated is:
238       ----------------------
239            .align 2
240       L0:
241            addl    $L1-L0, 72(%esp)
242            jmp     L1
243    
244    
245            .align  2
246       L1:
247       ----------------------
248    
249    72(%esp) is the memory location for the stdlink register. This
250    must contain the address of the CPS function being called. In the
251    above example, it contains the address of  L0; before
252    calling L1 (the real entry point for the compilation unit), it
253    must contain the address for L1, and hence
254    
255            addl $L1-L0, 72(%esp)
256    
257    I have tested this on all architectures except the hppa.The increase
258    in code size is of course negligible
259    
260    ----------------------------------------------------------------------
261    Name: Allen Leung
262    Date: 2002/03/03 13:20:00 EST
263    Tag: leunga-20020303-mlrisc-tools
264    
265      Added #[ ... ] expressions to mlrisc tools
266    
267    ----------------------------------------------------------------------
268    Name: Matthias Blume
269    Date: 2002/02/27 12:29:00 EST
270    Tag: blume-20020227-cdebug
271    Description:
272    
273    - made types in structure C and C_Debug to be equal
274    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
275    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
276    
277    ----------------------------------------------------------------------
278    Name: Matthias Blume
279    Date: 2002/02/26 12:00:00 EST
280    Tag: blume-20020226-ffi
281    Description:
282    
283    1. Fixed a minor bug in CM's "noweb" tool:
284       If numbering is turned off, then truly don't number (i.e., do not
285       supply the -L option to noweb).  The previous behavior was to supply
286       -L'' -- which caused noweb to use the "default" line numbering scheme.
287       Thanks to Chris Richards for pointing this out (and supplying the fix).
288    
289    2. Once again, I reworked some aspects of the FFI:
290    
291       A. The incomplete/complete type business:
292    
293       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
294         gone!
295       - ML types representing an incomplete type are now *equal* to
296         ML types representing their corresponding complete types (just like
297         in C).  This is still safe because ml-nlffigen will not generate
298         RTTI for incomplete types, nor will it generate functions that
299         require access to such RTTI.   But when ML code generated from both
300         incomplete and complete versions of the C type meet, the ML types
301         are trivially interoperable.
302    
303         NOTE:  These changes restore the full generality of the translation
304         (which was previously lost when I eliminated functorization)!
305    
306       B. Enum types:
307    
308       - Structure C now has a type constructor "enum" that is similar to
309         how the "su" constructor works.  However, "enum" is not a phantom
310         type because each "T enum" has values (and is isomorphic to
311         MLRep.Signed.int).
312       - There are generic access operations for enum objects (using
313         MLRep.Signed.int).
314       - ml-nlffigen will generate a structure E_foo for each "enum foo".
315         * The structure contains the definition of type "mlrep" (the ML-side
316         representation type of the enum).  Normally, mlrep is the same
317         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
318         then mlrep will be defined as a datatype -- thus facilitating
319         pattern matching on mlrep values.
320         ("-ec" will be suppressed if there are duplicate values in an
321          enumeration.)
322         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
323         will be generated for each C enum constant xxx.
324         * Conversion functions m2i and i2m convert between mlrep and
325         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
326         * Coversion functions c and ml convert between mlrep and "tag enum".
327         * Access functions (get/set) fetch and store mlrep values.
328       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
329         enumerations are merged into one single enumeration represented by
330         structure E_'.
331    
332    ----------------------------------------------------------------------
333    Name: Allen Leung
334    Date: 2002/02/25 04:45:00 EST
335    Tag: leunga-20020225-cps-spill
336    
337    This is a new implementation of the CPS spill phase.
338    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
339    In case of problems, replace it with the old file spill.sml
340    
341    The current compiler runs into some serious performance problems when
342    constructing a large record.  This can happen when we try to compile a
343    structure with many items.  Even a very simple structure like the following
344    makes the compiler slow down.
345    
346        structure Foo = struct
347           val x_1 = 0w1 : Word32.int
348           val x_2 = 0w2 : Word32.int
349           val x_3 = 0w3 : Word32.int
350           ...
351           val x_N = 0wN : Word32.int
352        end
353    
354    The following table shows the compile time, from N=1000 to N=4000,
355    with the old compiler:
356    
357    N
358    1000   CPS 100 spill                           0.04u  0.00s  0.00g
359           MLRISC ra                               0.06u  0.00s  0.05g
360              (spills = 0 reloads = 0)
361           TOTAL                                   0.63u  0.07s  0.21g
362    
363    1100   CPS 100 spill                           8.25u  0.32s  0.64g
364           MLRISC ra                               5.68u  0.59s  3.93g
365              (spills = 0 reloads = 0)
366           TOTAL                                   14.71u  0.99s  4.81g
367    
368    1500   CPS 100 spill                           58.55u  2.34s  1.74g
369           MLRISC ra                               5.54u  0.65s  3.91g
370              (spills = 543 reloads = 1082)
371           TOTAL                                   65.40u  3.13s  6.00g
372    
373    2000   CPS 100 spill                           126.69u  4.84s  3.08g
374           MLRISC ra                               0.80u  0.10s  0.55g
375              (spills = 42 reloads = 84)
376           TOTAL                                   129.42u  5.10s  4.13g
377    
378    3000   CPS 100 spill                           675.59u  19.03s  11.64g
379           MLRISC ra                               2.69u  0.27s  1.38g
380              (spills = 62 reloads = 124)
381           TOTAL                                   682.48u  19.61s  13.99g
382    
383    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
384           MLRISC ra                               4.96u  0.27s  2.72g
385              (spills = 85 reloads = 170)
386           TOTAL                                   2375.26u  57.21s  48.00g
387    
388    As you can see the old cps spill module suffers from some serious
389    performance problem.  But since I cannot decipher the old code fully,
390    instead of patching the problems up, I'm reimplementing it
391    with a different algorithm.  The new code is more modular,
392    smaller when compiled, and substantially faster
393    (O(n log n) time and O(n) space).  Timing of the new spill module:
394    
395    4000  CPS 100 spill                           0.02u  0.00s  0.00g
396          MLRISC ra                               0.25u  0.02s  0.15g
397             (spills=1 reloads=3)
398          TOTAL                                   7.74u  0.34s  1.62g
399    
400    Implementation details:
401    
402    As far as I can tell, the purpose of the CPS spill module is to make sure the
403    number of live variables at any program point (the bandwidth)
404    does not exceed a certain limit, which is determined by the
405    size of the spill area.
406    
407    When the bandwidth is too large, we decrease the register pressure by
408    packing live variables into spill records.  How we achieve this is
409    completely different than what we did in the old code.
410    
411    First, there is something about the MLRiscGen code generator
412    that we should be aware of:
413    
414    o MLRiscGen performs code motion!
415    
416       In particular, it will move floating point computations and
417       address computations involving only the heap pointer to
418       their use sites (if there is only a single use).
419       What this means is that if we have a CPS record construction
420       statement
421    
422           RECORD(k,vl,w,e)
423    
424       we should never count the new record address w as live if w
425       has only one use (which is often the case).
426    
427       We should do something similar to floating point, but the transformation
428       there is much more complex, so I won't deal with that.
429    
430    Secondly, there are now two new cps primops at our disposal:
431    
432     1. rawrecord of record_kind option
433        This pure operator allocates some uninitialized storage from the heap.
434        There are two forms:
435    
436         rawrecord NONE [INT n]  allocates a tagless record of length n
437         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
438                                     and initializes the tag.
439    
440     2. rawupdate of cty
441          rawupdate cty (v,i,x)
442          Assigns to x to the ith component of record v.
443          The storelist is not updated.
444    
445    We use these new primops for both spilling and increment record construction.
446    
447     1. Spilling.
448    
449        This is implemented with a linear scan algorithm (but generalized
450        to trees).  The algorithm will create a single spill record at the
451        beginning of the cps function and use rawupdate to spill to it,
452        and SELECT or SELp to reload from it.  So both spills and reloads
453        are fine-grain operations.  In contrast, in the old algorithm
454        "spills" have to be bundled together in records.
455    
456        Ideally, we should sink the spill record construction to where
457        it is needed.  We can even split the spill record into multiple ones
458        at the places where they are needed.  But CPS is not a good
459        representation for global code motion, so I'll keep it simple and
460        am not attempting this.
461    
462     2. Incremental record construction (aka record splitting).
463    
464        Long records with many component values which are simulatenously live
465        (recall that single use record addresses are not considered to
466         be live) are constructed with rawrecord and rawupdate.
467        We allocate space on the heap with rawrecord first, then gradually
468        fill it in with rawupdate.  This is the technique suggested to me
469        by Matthias.
470    
471        Some restrictions on when this is applicable:
472        1. It is not a VECTOR record.  The code generator currently does not handle
473           this case. VECTOR record uses double indirection like arrays.
474        2. All the record component values are defined in the same "basic block"
475           as the record constructor.  This is to prevent speculative
476           record construction.
477    
478    ----------------------------------------------------------------------
479    Name: Allen Leung
480    Date: 2002/02/22 01:02:00 EST
481    Tag: leunga-20020222-mlrisc-tools
482    
483    Minor bug fixes in the parser and rewriter
484    
485    ----------------------------------------------------------------------
486    Name: Allen Leung
487    Date: 2002/02/21 20:20:00 EST
488    Tag: leunga-20020221-peephole
489    
490    Regenerated the peephole files.  Some contained typos in the specification
491    and some didn't compile because of pretty printing bugs in the old version
492    of 'nowhere'.
493    
494    ----------------------------------------------------------------------
495    Name: Allen Leung
496    Date: 2002/02/19 20:20:00 EST
497    Tag: leunga-20020219-mlrisc-tools
498    Description:
499    
500       Minor bug fixes to the mlrisc-tools library:
501    
502       1.  Fixed up parsing colon suffixed keywords
503       2.  Added the ability to shut the error messages up
504       3.  Reimplemented the pretty printer and fixed up/improved
505           the pretty printing of handle and -> types.
506       4.  Fixed up generation of literal symbols in the nowhere tool.
507       5.  Added some SML keywords to to sml.sty
508    
509    ----------------------------------------------------------------------
510    Name: Matthias Blume
511    Date: 2002/02/19 16:20:00 EST
512    Tag: blume-20020219-cmffi
513    Description:
514    
515    A wild mix of changes, some minor, some major:
516    
517    * All C FFI-related libraries are now anchored under $c:
518        $/c.cm      --> $c/c.cm
519        $/c-int.cm  --> $c/internals/c-int.cm
520        $/memory.cm --> $c/memory/memory.cm
521    
522    * "make" tool (in CM) now treats its argument pathname slightly
523      differently:
524        1. If the native expansion is an absolute name, then before invoking
525           the "make" command on it, CM will apply OS.Path.mkRelative
526           (with relativeTo = OS.FileSys.getDir()) to it.
527        2. The argument will be passed through to subsequent phases of CM
528           processing without "going native".  In particular, if the argument
529           was an anchored path, then "make" will not lose track of that anchor.
530    
531    * Compiler backends now "know" their respective C calling conventions
532      instead of having to be told about it by ml-nlffigen.  This relieves
533      ml-nlffigen from one of its burdens.
534    
535    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
536    
537    * Export C_DEBUG and C_Debug from $c/c.cm.
538    
539    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
540      subtyping relationship between incomplete pointers and their complete
541      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
542      with the convention of instantiating 'o with ('t, 'c) obj whenever
543      the pointer target type is complete.  In the incomplete case, 'o
544      will be instantiated with some "'c iobj" -- a type obtained by
545      using one of the functors PointerToIncompleteType or PointerToCompleteType.
546    
547      Operations that work on both incomplete and complete pointer types are
548      typed as taking an 'o ptr while operations that require the target to
549      be known are typed as taking some ('t, 'c) obj ptr.
550    
551      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
552      where void is an eqtype without any values.  This makes it possible
553      to work on voidptr values using functions meant to operate on light
554      incomplete pointers.
555    
556    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
557      been vastly simplified.
558    
559    ----------------------------------------------------------------------
560    Name: Matthias Blume
561    Date: 2002/02/19 10:48:00 EST
562    Tag: blume-20020219-pqfix
563    Description:
564    
565    Applied Chris Okasaki's bug fix for priority queues.
566    
567    ----------------------------------------------------------------------
568    Name: Matthias Blume
569    Date: 2002/02/15 17:05:00
570  Tag: Release_110_39  Tag: Release_110_39
571  Description:  Description:
572    
573    Last-minute retagging is becoming a tradition... :-(
574    
575    This is the working release 110.39.
576    
577    ----------------------------------------------------------------------
578    Name: Matthias Blume
579    Date: 2002/02/15 16:00:00 EST
580    Tag: Release_110_39-orig
581    Description:
582    
583  Working release 110.39.  New bootfiles.  Working release 110.39.  New bootfiles.
584    
585    (Update: There was a small bug in the installer so it wouldn't work
586    with all shells.  So I retagged. -Matthias)
587    
588  ----------------------------------------------------------------------  ----------------------------------------------------------------------
589  Name: Matthias Blume  Name: Matthias Blume
590  Date: 2002/02/15 14:17:00 EST  Date: 2002/02/15 14:17:00 EST

Legend:
Removed from v.1069  
changed lines
  Added in v.1137

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0