Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1076, Tue Feb 19 15:47:18 2002 UTC revision 1140, Wed Mar 13 18:59:03 2002 UTC
# Line 11  Line 11 
11  Date: yyyy/mm/dd  Date: yyyy/mm/dd
12  Tag: <post-commit CVS tag>  Tag: <post-commit CVS tag>
13  Description:  Description:
14    ----------------------------------------------------------------------
15    Name: Matthias Blume
16    Date: 2002/03/13 14:00:00 EST
17    Tag: blume-20020313-overload-etc
18    Description:
19    
20    1. Added _overload as a synonym for overload for backward compatibility.
21       (Control.overloadKW must be true for either version to be accepted.)
22    
23    2. Fixed bug in install script that caused more things to be installed
24       than what was requested in config/targets.
25    
26    3. Made CM aware of the (_)overload construct so that autoloading
27       works.
28    
29    ----------------------------------------------------------------------
30    Name: Matthias Blume
31    Date: 2002/03/12 22:03:00 EST
32    Tag: blume-20020312-url
33    Description:
34    
35    Forgot to update BOOT and srcarchiveurl.
36    
37    ----------------------------------------------------------------------
38    Name: Matthias Blume
39    Date: 2002/03/12 17:30:00 EST
40    Tag: blume-20020312-version110392
41    Description:
42    
43    Yet another version number bump (because of small changes to the
44    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
45    
46    Changes:
47    
48      The new pid generation scheme described a few weeks ago was overly
49      complicated.  I implemented a new mechanism that is simpler and
50      provides a bit more "stability":  Once CM has seen a compilation
51      unit, it keeps its identity constant (as long as you do not delete
52      those crucial CM/GUID/* files).  This means that when you change
53      and interface, compiler, then go back to the old interface, and
54      compile again, you arrive at the original pid.
55    
56      There now also is a mechanism that instructs CM to use the plain
57      environment hash as a module's pid (effectively making its GUID
58      the empty string).  For this, "noguid" must be specified as an
59      option to the .sml file in question within its .cm file.
60      This is most useful for code that is being generated by tools such
61      as ml-nlffigen (because during development programmers tend to
62      erase the tool's entire output directory tree including CM's cached
63      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
64      revert to the old, broken behavior of SML/NJ, but in specific cases
65      where there is no danger of interface confusion, its use is ok
66      (I think).
67    
68      ml-nlffigen by default generates "noguid" annotations.  They can be
69      turned off by specifying -guid in its command line.
70    
71    ----------------------------------------------------------------------
72    Name: Lal George
73    Date: 2002/03/12 12 14:42:36 EST
74    Tag: george-20020312-frequency-computation
75    Description:
76    
77    Integrated jump chaining and static block frequency into the
78    compiler. More details and numbers later.
79    
80    ----------------------------------------------------------------------
81    Name: Lal George
82    Date: 2002/03/11 11 22:38:53 EST
83    Tag: george-20020311-jump-chain-elim
84    Description:
85    
86    Tested the jump chain elimination on all architectures (except the
87    hppa).  This is on by default right now and is profitable for the
88    alpha and x86, however, it may not be profitable for the sparc and ppc
89    when compiling the compiler.
90    
91    The gc test will typically jump to a label at the end of the cluster,
92    where there is another jump to an external cluster containing the actual
93    code to invoke gc. This is to allow factoring of common gc invocation
94    sequences. That is to say, we generate:
95    
96            f:
97               testgc
98               ja   L1      % jump if above to L1
99    
100            L1:
101               jmp L2
102    
103    
104    After jump chain elimination the 'ja L1' instructions is converted to
105    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
106    up being implemented in their long form (if L2 is far away) using:
107    
108            jbe     L3      % jump if below or equal to L3
109            jmp     L2
110         L3:
111            ...
112    
113    
114    For large compilation units L2  may be far away.
115    
116    
117    ----------------------------------------------------------------------
118    Name: Matthias Blume
119    Date: 2002/03/11 13:30:00 EST
120    Tag: blume-20020311-mltreeeval
121    Description:
122    
123    A functor parameter was missing.
124    
125    ----------------------------------------------------------------------
126    Name: Allen Leung
127    Date: 2002/03/11 10:30:00 EST
128    Tag: leunga-20020310-runtime-string0
129    Description:
130    
131       The representation of the empty string now points to a
132    legal null terminated C string instead of unit.  It is now possible
133    to convert an ML string into C string with InlineT.CharVector.getData.
134    This compiles into one single machine instruction.
135    
136    ----------------------------------------------------------------------
137    Name: Allen Leung
138    Date: 2002/03/10 23:55:00 EST
139    Tag: leunga-20020310-x86-call
140    Description:
141    
142       Added machine generation for CALL instruction (relative displacement mode)
143    
144    ----------------------------------------------------------------------
145    Name: Matthias Blume
146    Date: 2002/03/08 16:05:00
147    Tag: blume-20020308-entrypoints
148    Description:
149    
150    Version number bumped to 110.39.1.  NEW BOOTFILES!
151    
152    Entrypoints: non-zero offset into a code object where execution should begin.
153    
154    - Added the notion of an entrypoint to CodeObj.
155    - Added reading/writing of entrypoint info to Binfile.
156    - Made runtime system bootloader aware of entrypoints.
157    - Use the address of the label of the first function given to mlriscGen
158      as the entrypoint.  This address is currently always 0, but it will
159      not be 0 once we turn on block placement.
160    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
161      with entry points) from mlriscGen.
162    
163    ----------------------------------------------------------------------
164    Name: Allen Leung
165    Date: 2002/03/07 20:45:00 EST
166    Tag: leunga-20020307-x86-cmov
167    Description:
168    
169       Bug fixes for CMOVcc on x86.
170    
171       1. Added machine code generation for CMOVcc
172       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
173       3. CMOVcc cannot have an immediate operand as argument.
174    
175    ----------------------------------------------------------------------
176    Name: Matthias Blume
177    Date: 2002/03/07 16:15:00 EST
178    Tag: blume-20020307-controls
179    Description:
180    
181    This is a very large but mostly boring patch which makes (almost)
182    every tuneable compiler knob (i.e., pretty much everything under
183    Control.* plus a few other things) configurable via both the command
184    line and environment variables in the style CM did its configuration
185    until now.
186    
187    Try starting sml with '-h' (or, if you are brave, '-H')
188    
189    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
190    implements the underlying generic mechanism.
191    
192    The interface to some of the existing such facilities has changed somewhat.
193    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
194    (The getFoo interface is still there for backward-compatibility, but its
195    use is deprecated.)
196    
197    The ml-build script passes -Cxxx=yyy command-line arguments through so
198    that one can now twiddle the compiler settings when using this "batch"
199    compiler.
200    
201    TODO items:
202    
203    We should go through and throw out all controls that are no longer
204    connected to anything.  Moreover, we should go through and provide
205    meaningful (and correct!) documentation strings for those controls
206    that still are connected.
207    
208    Currently, multiple calls to Controls.new are accepted (only the first
209    has any effect).  Eventually we should make sure that every control
210    is being made (via Controls.new) exactly once.  Future access can then
211    be done using Controls.acc.
212    
213    Finally, it would probably be a good idea to use the getter-setter
214    interface to controls rather than ref cells.  For the time being, both
215    styles are provided by the Controls module, but getter-setter pairs are
216    better if thread-safety is of any concern because they can be wrapped.
217    
218    *****************************************
219    
220    One bug fix: The function blockPlacement in three of the MLRISC
221    backpatch files used to be hard-wired to one of two possibilities at
222    link time (according to the value of the placementFlag).  But (I
223    think) it should rather sense the flag every time.
224    
225    *****************************************
226    
227    Other assorted changes (by other people who did not supply a HISTORY entry):
228    
229    1. the cross-module inliner now works much better (Monnier)
230    2. representation of weights, frequencies, and probabilities in MLRISC
231       changed in preparation of using those for weighted block placement
232       (Reppy, George)
233    
234    ----------------------------------------------------------------------
235    Name: Lal George
236    Date: 2002/03/07 14:44:24 EST 2002
237    Tag: george-20020307-weighted-block-placement
238    
239    Tested the weighted block placement optimization on all architectures
240    (except the hppa) using AMPL to generate the block and edge frequencies.
241    Changes were required in the machine properties to correctly
242    categorize trap instructions. There is an MLRISC flag
243    "weighted-block-placement" that can be used to enable weighted block
244    placement, but this will be ineffective without block/edge
245    frequencies (coming soon).
246    
247    
248    ----------------------------------------------------------------------
249    Name: Lal George
250    Date: 2002/03/05 17:24:48 EST
251    Tag: george-20020305-linkage-cluster
252    
253    In order to support the block placement optimization, a new cluster
254    is generated as the very first cluster (called the linkage cluster).
255    It contains a single jump to the 'real' entry point for the compilation
256    unit. Block placement has no effect on the linkage cluster itself, but
257    all the other clusters  have full freedom in the manner in which they
258    reorder blocks or functions.
259    
260    On the x86 the typical linkage code that is generated is:
261       ----------------------
262            .align 2
263       L0:
264            addl    $L1-L0, 72(%esp)
265            jmp     L1
266    
267    
268            .align  2
269       L1:
270       ----------------------
271    
272    72(%esp) is the memory location for the stdlink register. This
273    must contain the address of the CPS function being called. In the
274    above example, it contains the address of  L0; before
275    calling L1 (the real entry point for the compilation unit), it
276    must contain the address for L1, and hence
277    
278            addl $L1-L0, 72(%esp)
279    
280    I have tested this on all architectures except the hppa.The increase
281    in code size is of course negligible
282    
283    ----------------------------------------------------------------------
284    Name: Allen Leung
285    Date: 2002/03/03 13:20:00 EST
286    Tag: leunga-20020303-mlrisc-tools
287    
288      Added #[ ... ] expressions to mlrisc tools
289    
290    ----------------------------------------------------------------------
291    Name: Matthias Blume
292    Date: 2002/02/27 12:29:00 EST
293    Tag: blume-20020227-cdebug
294    Description:
295    
296    - made types in structure C and C_Debug to be equal
297    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
298    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
299    
300    ----------------------------------------------------------------------
301    Name: Matthias Blume
302    Date: 2002/02/26 12:00:00 EST
303    Tag: blume-20020226-ffi
304    Description:
305    
306    1. Fixed a minor bug in CM's "noweb" tool:
307       If numbering is turned off, then truly don't number (i.e., do not
308       supply the -L option to noweb).  The previous behavior was to supply
309       -L'' -- which caused noweb to use the "default" line numbering scheme.
310       Thanks to Chris Richards for pointing this out (and supplying the fix).
311    
312    2. Once again, I reworked some aspects of the FFI:
313    
314       A. The incomplete/complete type business:
315    
316       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
317         gone!
318       - ML types representing an incomplete type are now *equal* to
319         ML types representing their corresponding complete types (just like
320         in C).  This is still safe because ml-nlffigen will not generate
321         RTTI for incomplete types, nor will it generate functions that
322         require access to such RTTI.   But when ML code generated from both
323         incomplete and complete versions of the C type meet, the ML types
324         are trivially interoperable.
325    
326         NOTE:  These changes restore the full generality of the translation
327         (which was previously lost when I eliminated functorization)!
328    
329       B. Enum types:
330    
331       - Structure C now has a type constructor "enum" that is similar to
332         how the "su" constructor works.  However, "enum" is not a phantom
333         type because each "T enum" has values (and is isomorphic to
334         MLRep.Signed.int).
335       - There are generic access operations for enum objects (using
336         MLRep.Signed.int).
337       - ml-nlffigen will generate a structure E_foo for each "enum foo".
338         * The structure contains the definition of type "mlrep" (the ML-side
339         representation type of the enum).  Normally, mlrep is the same
340         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
341         then mlrep will be defined as a datatype -- thus facilitating
342         pattern matching on mlrep values.
343         ("-ec" will be suppressed if there are duplicate values in an
344          enumeration.)
345         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
346         will be generated for each C enum constant xxx.
347         * Conversion functions m2i and i2m convert between mlrep and
348         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
349         * Coversion functions c and ml convert between mlrep and "tag enum".
350         * Access functions (get/set) fetch and store mlrep values.
351       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
352         enumerations are merged into one single enumeration represented by
353         structure E_'.
354    
355    ----------------------------------------------------------------------
356    Name: Allen Leung
357    Date: 2002/02/25 04:45:00 EST
358    Tag: leunga-20020225-cps-spill
359    
360    This is a new implementation of the CPS spill phase.
361    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
362    In case of problems, replace it with the old file spill.sml
363    
364    The current compiler runs into some serious performance problems when
365    constructing a large record.  This can happen when we try to compile a
366    structure with many items.  Even a very simple structure like the following
367    makes the compiler slow down.
368    
369        structure Foo = struct
370           val x_1 = 0w1 : Word32.int
371           val x_2 = 0w2 : Word32.int
372           val x_3 = 0w3 : Word32.int
373           ...
374           val x_N = 0wN : Word32.int
375        end
376    
377    The following table shows the compile time, from N=1000 to N=4000,
378    with the old compiler:
379    
380    N
381    1000   CPS 100 spill                           0.04u  0.00s  0.00g
382           MLRISC ra                               0.06u  0.00s  0.05g
383              (spills = 0 reloads = 0)
384           TOTAL                                   0.63u  0.07s  0.21g
385    
386    1100   CPS 100 spill                           8.25u  0.32s  0.64g
387           MLRISC ra                               5.68u  0.59s  3.93g
388              (spills = 0 reloads = 0)
389           TOTAL                                   14.71u  0.99s  4.81g
390    
391    1500   CPS 100 spill                           58.55u  2.34s  1.74g
392           MLRISC ra                               5.54u  0.65s  3.91g
393              (spills = 543 reloads = 1082)
394           TOTAL                                   65.40u  3.13s  6.00g
395    
396    2000   CPS 100 spill                           126.69u  4.84s  3.08g
397           MLRISC ra                               0.80u  0.10s  0.55g
398              (spills = 42 reloads = 84)
399           TOTAL                                   129.42u  5.10s  4.13g
400    
401    3000   CPS 100 spill                           675.59u  19.03s  11.64g
402           MLRISC ra                               2.69u  0.27s  1.38g
403              (spills = 62 reloads = 124)
404           TOTAL                                   682.48u  19.61s  13.99g
405    
406    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
407           MLRISC ra                               4.96u  0.27s  2.72g
408              (spills = 85 reloads = 170)
409           TOTAL                                   2375.26u  57.21s  48.00g
410    
411    As you can see the old cps spill module suffers from some serious
412    performance problem.  But since I cannot decipher the old code fully,
413    instead of patching the problems up, I'm reimplementing it
414    with a different algorithm.  The new code is more modular,
415    smaller when compiled, and substantially faster
416    (O(n log n) time and O(n) space).  Timing of the new spill module:
417    
418    4000  CPS 100 spill                           0.02u  0.00s  0.00g
419          MLRISC ra                               0.25u  0.02s  0.15g
420             (spills=1 reloads=3)
421          TOTAL                                   7.74u  0.34s  1.62g
422    
423    Implementation details:
424    
425    As far as I can tell, the purpose of the CPS spill module is to make sure the
426    number of live variables at any program point (the bandwidth)
427    does not exceed a certain limit, which is determined by the
428    size of the spill area.
429    
430    When the bandwidth is too large, we decrease the register pressure by
431    packing live variables into spill records.  How we achieve this is
432    completely different than what we did in the old code.
433    
434    First, there is something about the MLRiscGen code generator
435    that we should be aware of:
436    
437    o MLRiscGen performs code motion!
438    
439       In particular, it will move floating point computations and
440       address computations involving only the heap pointer to
441       their use sites (if there is only a single use).
442       What this means is that if we have a CPS record construction
443       statement
444    
445           RECORD(k,vl,w,e)
446    
447       we should never count the new record address w as live if w
448       has only one use (which is often the case).
449    
450       We should do something similar to floating point, but the transformation
451       there is much more complex, so I won't deal with that.
452    
453    Secondly, there are now two new cps primops at our disposal:
454    
455     1. rawrecord of record_kind option
456        This pure operator allocates some uninitialized storage from the heap.
457        There are two forms:
458    
459         rawrecord NONE [INT n]  allocates a tagless record of length n
460         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
461                                     and initializes the tag.
462    
463     2. rawupdate of cty
464          rawupdate cty (v,i,x)
465          Assigns to x to the ith component of record v.
466          The storelist is not updated.
467    
468    We use these new primops for both spilling and increment record construction.
469    
470     1. Spilling.
471    
472        This is implemented with a linear scan algorithm (but generalized
473        to trees).  The algorithm will create a single spill record at the
474        beginning of the cps function and use rawupdate to spill to it,
475        and SELECT or SELp to reload from it.  So both spills and reloads
476        are fine-grain operations.  In contrast, in the old algorithm
477        "spills" have to be bundled together in records.
478    
479        Ideally, we should sink the spill record construction to where
480        it is needed.  We can even split the spill record into multiple ones
481        at the places where they are needed.  But CPS is not a good
482        representation for global code motion, so I'll keep it simple and
483        am not attempting this.
484    
485     2. Incremental record construction (aka record splitting).
486    
487        Long records with many component values which are simulatenously live
488        (recall that single use record addresses are not considered to
489         be live) are constructed with rawrecord and rawupdate.
490        We allocate space on the heap with rawrecord first, then gradually
491        fill it in with rawupdate.  This is the technique suggested to me
492        by Matthias.
493    
494        Some restrictions on when this is applicable:
495        1. It is not a VECTOR record.  The code generator currently does not handle
496           this case. VECTOR record uses double indirection like arrays.
497        2. All the record component values are defined in the same "basic block"
498           as the record constructor.  This is to prevent speculative
499           record construction.
500    
501    ----------------------------------------------------------------------
502    Name: Allen Leung
503    Date: 2002/02/22 01:02:00 EST
504    Tag: leunga-20020222-mlrisc-tools
505    
506    Minor bug fixes in the parser and rewriter
507    
508    ----------------------------------------------------------------------
509    Name: Allen Leung
510    Date: 2002/02/21 20:20:00 EST
511    Tag: leunga-20020221-peephole
512    
513    Regenerated the peephole files.  Some contained typos in the specification
514    and some didn't compile because of pretty printing bugs in the old version
515    of 'nowhere'.
516    
517    ----------------------------------------------------------------------
518    Name: Allen Leung
519    Date: 2002/02/19 20:20:00 EST
520    Tag: leunga-20020219-mlrisc-tools
521    Description:
522    
523       Minor bug fixes to the mlrisc-tools library:
524    
525       1.  Fixed up parsing colon suffixed keywords
526       2.  Added the ability to shut the error messages up
527       3.  Reimplemented the pretty printer and fixed up/improved
528           the pretty printing of handle and -> types.
529       4.  Fixed up generation of literal symbols in the nowhere tool.
530       5.  Added some SML keywords to to sml.sty
531    
532    ----------------------------------------------------------------------
533    Name: Matthias Blume
534    Date: 2002/02/19 16:20:00 EST
535    Tag: blume-20020219-cmffi
536    Description:
537    
538    A wild mix of changes, some minor, some major:
539    
540    * All C FFI-related libraries are now anchored under $c:
541        $/c.cm      --> $c/c.cm
542        $/c-int.cm  --> $c/internals/c-int.cm
543        $/memory.cm --> $c/memory/memory.cm
544    
545    * "make" tool (in CM) now treats its argument pathname slightly
546      differently:
547        1. If the native expansion is an absolute name, then before invoking
548           the "make" command on it, CM will apply OS.Path.mkRelative
549           (with relativeTo = OS.FileSys.getDir()) to it.
550        2. The argument will be passed through to subsequent phases of CM
551           processing without "going native".  In particular, if the argument
552           was an anchored path, then "make" will not lose track of that anchor.
553    
554    * Compiler backends now "know" their respective C calling conventions
555      instead of having to be told about it by ml-nlffigen.  This relieves
556      ml-nlffigen from one of its burdens.
557    
558    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
559    
560    * Export C_DEBUG and C_Debug from $c/c.cm.
561    
562    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
563      subtyping relationship between incomplete pointers and their complete
564      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
565      with the convention of instantiating 'o with ('t, 'c) obj whenever
566      the pointer target type is complete.  In the incomplete case, 'o
567      will be instantiated with some "'c iobj" -- a type obtained by
568      using one of the functors PointerToIncompleteType or PointerToCompleteType.
569    
570      Operations that work on both incomplete and complete pointer types are
571      typed as taking an 'o ptr while operations that require the target to
572      be known are typed as taking some ('t, 'c) obj ptr.
573    
574      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
575      where void is an eqtype without any values.  This makes it possible
576      to work on voidptr values using functions meant to operate on light
577      incomplete pointers.
578    
579    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
580      been vastly simplified.
581    
582  ----------------------------------------------------------------------  ----------------------------------------------------------------------
583  Name: Matthias Blume  Name: Matthias Blume

Legend:
Removed from v.1076  
changed lines
  Added in v.1140

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0