Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1085, Fri Feb 22 00:15:55 2002 UTC revision 1148, Fri Mar 15 21:38:57 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Matthias Blume
17    Date: 2002/03/15 16:40:00 EST
18    Tag: blume-20020315-basis
19    Description:
20    
21    Provided (preliminary?) implementations for
22    
23      {String,Substring}.{concatWith,isSuffix,isSubstring}
24    
25    and
26    
27      Substring.full
28    
29    Those are in the Basis spec but they were missing in SML/NJ.
30    
31    ----------------------------------------------------------------------
32    Name: Matthias Blume
33    Date: 2002/03/14 21:30:00 EST
34    Tag: blume-20020314-controls
35    Description:
36    
37    Controls:
38    ---------
39    
40    1. Factored out the recently-added Controls : CONTROLS stuff and put
41       it into its own library $/controls-lib.cm.  The source tree for
42       this is under src/smlnj-lib/Controls.
43    
44    2. Changed the names of types and functions in this interface, so they
45       make a bit more "sense":
46    
47          module -> registry
48          'a registry -> 'a group
49    
50    3. The interface now deals in ref cells only.  The getter/setter interface
51       is (mostly) gone.
52    
53    4. Added a function that lets one register an already-existing ref cell.
54    
55    5. Made the corresponding modifications to the rest of the code so that
56       everything compiles again.
57    
58    6. Changed the implementation of Controls.MLRISC back to something closer
59       to the original.  In particular, this module (and therefore MLRISC)
60       does not depend on Controls.  There now is some link-time code in
61       int-sys.sml that registers the MLRISC controls with the Controls
62       module.
63    
64    CM:
65    ---
66    
67      * One can now specify the lambda-split aggressiveness in init.cmi.
68    
69    ----------------------------------------------------------------------
70    Name: Allen Leung
71    Date: 2002/03/13 17:30:00 EST
72    Tag: leunga-20020313-x86-fp-unary
73    Description:
74    
75    Bug fix for:
76    
77    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
78    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
79    > - fun f(x,(y,z)) = Real.~ y;
80    > [autoloading]
81    > [autoloading done]
82    >       fchsl   (%eax), 184(%esp)
83    > Error: MLRisc bug: X86MCEmitter.emitInstr
84    >
85    > uncaught exception Error
86    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
87    
88    The problem was that the code generator did not generate any fp registers
89    in this case, and the ra didn't know that it needed to run the X86FP phase to
90    translate the pseudo fp instruction.   This only happened with unary fp
91    operators in certain situations.
92    
93    ----------------------------------------------------------------------
94    Name: Matthias Blume
95    Date: 2002/03/13 14:00:00 EST
96    Tag: blume-20020313-overload-etc
97    Description:
98    
99    1. Added _overload as a synonym for overload for backward compatibility.
100       (Control.overloadKW must be true for either version to be accepted.)
101    
102    2. Fixed bug in install script that caused more things to be installed
103       than what was requested in config/targets.
104    
105    3. Made CM aware of the (_)overload construct so that autoloading
106       works.
107    
108    ----------------------------------------------------------------------
109    Name: Matthias Blume
110    Date: 2002/03/12 22:03:00 EST
111    Tag: blume-20020312-url
112    Description:
113    
114    Forgot to update BOOT and srcarchiveurl.
115    
116    ----------------------------------------------------------------------
117    Name: Matthias Blume
118    Date: 2002/03/12 17:30:00 EST
119    Tag: blume-20020312-version110392
120    Description:
121    
122    Yet another version number bump (because of small changes to the
123    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
124    
125    Changes:
126    
127      The new pid generation scheme described a few weeks ago was overly
128      complicated.  I implemented a new mechanism that is simpler and
129      provides a bit more "stability":  Once CM has seen a compilation
130      unit, it keeps its identity constant (as long as you do not delete
131      those crucial CM/GUID/* files).  This means that when you change
132      an interface, compile, then go back to the old interface, and
133      compile again, you arrive at the original pid.
134    
135      There now also is a mechanism that instructs CM to use the plain
136      environment hash as a module's pid (effectively making its GUID
137      the empty string).  For this, "noguid" must be specified as an
138      option to the .sml file in question within its .cm file.
139      This is most useful for code that is being generated by tools such
140      as ml-nlffigen (because during development programmers tend to
141      erase the tool's entire output directory tree including CM's cached
142      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
143      revert to the old, broken behavior of SML/NJ, but in specific cases
144      where there is no danger of interface confusion, its use is ok
145      (I think).
146    
147      ml-nlffigen by default generates "noguid" annotations.  They can be
148      turned off by specifying -guid in its command line.
149    
150    ----------------------------------------------------------------------
151    Name: Lal George
152    Date: 2002/03/12 12 14:42:36 EST
153    Tag: george-20020312-frequency-computation
154    Description:
155    
156    Integrated jump chaining and static block frequency into the
157    compiler. More details and numbers later.
158    
159    ----------------------------------------------------------------------
160    Name: Lal George
161    Date: 2002/03/11 11 22:38:53 EST
162    Tag: george-20020311-jump-chain-elim
163    Description:
164    
165    Tested the jump chain elimination on all architectures (except the
166    hppa).  This is on by default right now and is profitable for the
167    alpha and x86, however, it may not be profitable for the sparc and ppc
168    when compiling the compiler.
169    
170    The gc test will typically jump to a label at the end of the cluster,
171    where there is another jump to an external cluster containing the actual
172    code to invoke gc. This is to allow factoring of common gc invocation
173    sequences. That is to say, we generate:
174    
175            f:
176               testgc
177               ja   L1      % jump if above to L1
178    
179            L1:
180               jmp L2
181    
182    
183    After jump chain elimination the 'ja L1' instructions is converted to
184    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
185    up being implemented in their long form (if L2 is far away) using:
186    
187            jbe     L3      % jump if below or equal to L3
188            jmp     L2
189         L3:
190            ...
191    
192    
193    For large compilation units L2  may be far away.
194    
195    
196    ----------------------------------------------------------------------
197    Name: Matthias Blume
198    Date: 2002/03/11 13:30:00 EST
199    Tag: blume-20020311-mltreeeval
200    Description:
201    
202    A functor parameter was missing.
203    
204    ----------------------------------------------------------------------
205    Name: Allen Leung
206    Date: 2002/03/11 10:30:00 EST
207    Tag: leunga-20020311-runtime-string0
208    Description:
209    
210       The representation of the empty string now points to a
211    legal null terminated C string instead of unit.  It is now possible
212    to convert an ML string into C string with InlineT.CharVector.getData.
213    This compiles into one single machine instruction.
214    
215    ----------------------------------------------------------------------
216    Name: Allen Leung
217    Date: 2002/03/10 23:55:00 EST
218    Tag: leunga-20020310-x86-call
219    Description:
220    
221       Added machine generation for CALL instruction (relative displacement mode)
222    
223    ----------------------------------------------------------------------
224    Name: Matthias Blume
225    Date: 2002/03/08 16:05:00
226    Tag: blume-20020308-entrypoints
227    Description:
228    
229    Version number bumped to 110.39.1.  NEW BOOTFILES!
230    
231    Entrypoints: non-zero offset into a code object where execution should begin.
232    
233    - Added the notion of an entrypoint to CodeObj.
234    - Added reading/writing of entrypoint info to Binfile.
235    - Made runtime system bootloader aware of entrypoints.
236    - Use the address of the label of the first function given to mlriscGen
237      as the entrypoint.  This address is currently always 0, but it will
238      not be 0 once we turn on block placement.
239    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
240      with entry points) from mlriscGen.
241    
242    ----------------------------------------------------------------------
243    Name: Allen Leung
244    Date: 2002/03/07 20:45:00 EST
245    Tag: leunga-20020307-x86-cmov
246    Description:
247    
248       Bug fixes for CMOVcc on x86.
249    
250       1. Added machine code generation for CMOVcc
251       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
252       3. CMOVcc cannot have an immediate operand as argument.
253    
254    ----------------------------------------------------------------------
255    Name: Matthias Blume
256    Date: 2002/03/07 16:15:00 EST
257    Tag: blume-20020307-controls
258    Description:
259    
260    This is a very large but mostly boring patch which makes (almost)
261    every tuneable compiler knob (i.e., pretty much everything under
262    Control.* plus a few other things) configurable via both the command
263    line and environment variables in the style CM did its configuration
264    until now.
265    
266    Try starting sml with '-h' (or, if you are brave, '-H')
267    
268    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
269    implements the underlying generic mechanism.
270    
271    The interface to some of the existing such facilities has changed somewhat.
272    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
273    (The getFoo interface is still there for backward-compatibility, but its
274    use is deprecated.)
275    
276    The ml-build script passes -Cxxx=yyy command-line arguments through so
277    that one can now twiddle the compiler settings when using this "batch"
278    compiler.
279    
280    TODO items:
281    
282    We should go through and throw out all controls that are no longer
283    connected to anything.  Moreover, we should go through and provide
284    meaningful (and correct!) documentation strings for those controls
285    that still are connected.
286    
287    Currently, multiple calls to Controls.new are accepted (only the first
288    has any effect).  Eventually we should make sure that every control
289    is being made (via Controls.new) exactly once.  Future access can then
290    be done using Controls.acc.
291    
292    Finally, it would probably be a good idea to use the getter-setter
293    interface to controls rather than ref cells.  For the time being, both
294    styles are provided by the Controls module, but getter-setter pairs are
295    better if thread-safety is of any concern because they can be wrapped.
296    
297    *****************************************
298    
299    One bug fix: The function blockPlacement in three of the MLRISC
300    backpatch files used to be hard-wired to one of two possibilities at
301    link time (according to the value of the placementFlag).  But (I
302    think) it should rather sense the flag every time.
303    
304    *****************************************
305    
306    Other assorted changes (by other people who did not supply a HISTORY entry):
307    
308    1. the cross-module inliner now works much better (Monnier)
309    2. representation of weights, frequencies, and probabilities in MLRISC
310       changed in preparation of using those for weighted block placement
311       (Reppy, George)
312    
313    ----------------------------------------------------------------------
314    Name: Lal George
315    Date: 2002/03/07 14:44:24 EST 2002
316    Tag: george-20020307-weighted-block-placement
317    
318    Tested the weighted block placement optimization on all architectures
319    (except the hppa) using AMPL to generate the block and edge frequencies.
320    Changes were required in the machine properties to correctly
321    categorize trap instructions. There is an MLRISC flag
322    "weighted-block-placement" that can be used to enable weighted block
323    placement, but this will be ineffective without block/edge
324    frequencies (coming soon).
325    
326    
327    ----------------------------------------------------------------------
328    Name: Lal George
329    Date: 2002/03/05 17:24:48 EST
330    Tag: george-20020305-linkage-cluster
331    
332    In order to support the block placement optimization, a new cluster
333    is generated as the very first cluster (called the linkage cluster).
334    It contains a single jump to the 'real' entry point for the compilation
335    unit. Block placement has no effect on the linkage cluster itself, but
336    all the other clusters  have full freedom in the manner in which they
337    reorder blocks or functions.
338    
339    On the x86 the typical linkage code that is generated is:
340       ----------------------
341            .align 2
342       L0:
343            addl    $L1-L0, 72(%esp)
344            jmp     L1
345    
346    
347            .align  2
348       L1:
349       ----------------------
350    
351    72(%esp) is the memory location for the stdlink register. This
352    must contain the address of the CPS function being called. In the
353    above example, it contains the address of  L0; before
354    calling L1 (the real entry point for the compilation unit), it
355    must contain the address for L1, and hence
356    
357            addl $L1-L0, 72(%esp)
358    
359    I have tested this on all architectures except the hppa.The increase
360    in code size is of course negligible
361    
362    ----------------------------------------------------------------------
363    Name: Allen Leung
364    Date: 2002/03/03 13:20:00 EST
365    Tag: leunga-20020303-mlrisc-tools
366    
367      Added #[ ... ] expressions to mlrisc tools
368    
369    ----------------------------------------------------------------------
370    Name: Matthias Blume
371    Date: 2002/02/27 12:29:00 EST
372    Tag: blume-20020227-cdebug
373    Description:
374    
375    - made types in structure C and C_Debug to be equal
376    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
377    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
378    
379    ----------------------------------------------------------------------
380    Name: Matthias Blume
381    Date: 2002/02/26 12:00:00 EST
382    Tag: blume-20020226-ffi
383    Description:
384    
385    1. Fixed a minor bug in CM's "noweb" tool:
386       If numbering is turned off, then truly don't number (i.e., do not
387       supply the -L option to noweb).  The previous behavior was to supply
388       -L'' -- which caused noweb to use the "default" line numbering scheme.
389       Thanks to Chris Richards for pointing this out (and supplying the fix).
390    
391    2. Once again, I reworked some aspects of the FFI:
392    
393       A. The incomplete/complete type business:
394    
395       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
396         gone!
397       - ML types representing an incomplete type are now *equal* to
398         ML types representing their corresponding complete types (just like
399         in C).  This is still safe because ml-nlffigen will not generate
400         RTTI for incomplete types, nor will it generate functions that
401         require access to such RTTI.   But when ML code generated from both
402         incomplete and complete versions of the C type meet, the ML types
403         are trivially interoperable.
404    
405         NOTE:  These changes restore the full generality of the translation
406         (which was previously lost when I eliminated functorization)!
407    
408       B. Enum types:
409    
410       - Structure C now has a type constructor "enum" that is similar to
411         how the "su" constructor works.  However, "enum" is not a phantom
412         type because each "T enum" has values (and is isomorphic to
413         MLRep.Signed.int).
414       - There are generic access operations for enum objects (using
415         MLRep.Signed.int).
416       - ml-nlffigen will generate a structure E_foo for each "enum foo".
417         * The structure contains the definition of type "mlrep" (the ML-side
418         representation type of the enum).  Normally, mlrep is the same
419         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
420         then mlrep will be defined as a datatype -- thus facilitating
421         pattern matching on mlrep values.
422         ("-ec" will be suppressed if there are duplicate values in an
423          enumeration.)
424         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
425         will be generated for each C enum constant xxx.
426         * Conversion functions m2i and i2m convert between mlrep and
427         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
428         * Coversion functions c and ml convert between mlrep and "tag enum".
429         * Access functions (get/set) fetch and store mlrep values.
430       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
431         enumerations are merged into one single enumeration represented by
432         structure E_'.
433    
434    ----------------------------------------------------------------------
435    Name: Allen Leung
436    Date: 2002/02/25 04:45:00 EST
437    Tag: leunga-20020225-cps-spill
438    
439    This is a new implementation of the CPS spill phase.
440    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
441    In case of problems, replace it with the old file spill.sml
442    
443    The current compiler runs into some serious performance problems when
444    constructing a large record.  This can happen when we try to compile a
445    structure with many items.  Even a very simple structure like the following
446    makes the compiler slow down.
447    
448        structure Foo = struct
449           val x_1 = 0w1 : Word32.int
450           val x_2 = 0w2 : Word32.int
451           val x_3 = 0w3 : Word32.int
452           ...
453           val x_N = 0wN : Word32.int
454        end
455    
456    The following table shows the compile time, from N=1000 to N=4000,
457    with the old compiler:
458    
459    N
460    1000   CPS 100 spill                           0.04u  0.00s  0.00g
461           MLRISC ra                               0.06u  0.00s  0.05g
462              (spills = 0 reloads = 0)
463           TOTAL                                   0.63u  0.07s  0.21g
464    
465    1100   CPS 100 spill                           8.25u  0.32s  0.64g
466           MLRISC ra                               5.68u  0.59s  3.93g
467              (spills = 0 reloads = 0)
468           TOTAL                                   14.71u  0.99s  4.81g
469    
470    1500   CPS 100 spill                           58.55u  2.34s  1.74g
471           MLRISC ra                               5.54u  0.65s  3.91g
472              (spills = 543 reloads = 1082)
473           TOTAL                                   65.40u  3.13s  6.00g
474    
475    2000   CPS 100 spill                           126.69u  4.84s  3.08g
476           MLRISC ra                               0.80u  0.10s  0.55g
477              (spills = 42 reloads = 84)
478           TOTAL                                   129.42u  5.10s  4.13g
479    
480    3000   CPS 100 spill                           675.59u  19.03s  11.64g
481           MLRISC ra                               2.69u  0.27s  1.38g
482              (spills = 62 reloads = 124)
483           TOTAL                                   682.48u  19.61s  13.99g
484    
485    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
486           MLRISC ra                               4.96u  0.27s  2.72g
487              (spills = 85 reloads = 170)
488           TOTAL                                   2375.26u  57.21s  48.00g
489    
490    As you can see the old cps spill module suffers from some serious
491    performance problem.  But since I cannot decipher the old code fully,
492    instead of patching the problems up, I'm reimplementing it
493    with a different algorithm.  The new code is more modular,
494    smaller when compiled, and substantially faster
495    (O(n log n) time and O(n) space).  Timing of the new spill module:
496    
497    4000  CPS 100 spill                           0.02u  0.00s  0.00g
498          MLRISC ra                               0.25u  0.02s  0.15g
499             (spills=1 reloads=3)
500          TOTAL                                   7.74u  0.34s  1.62g
501    
502    Implementation details:
503    
504    As far as I can tell, the purpose of the CPS spill module is to make sure the
505    number of live variables at any program point (the bandwidth)
506    does not exceed a certain limit, which is determined by the
507    size of the spill area.
508    
509    When the bandwidth is too large, we decrease the register pressure by
510    packing live variables into spill records.  How we achieve this is
511    completely different than what we did in the old code.
512    
513    First, there is something about the MLRiscGen code generator
514    that we should be aware of:
515    
516    o MLRiscGen performs code motion!
517    
518       In particular, it will move floating point computations and
519       address computations involving only the heap pointer to
520       their use sites (if there is only a single use).
521       What this means is that if we have a CPS record construction
522       statement
523    
524           RECORD(k,vl,w,e)
525    
526       we should never count the new record address w as live if w
527       has only one use (which is often the case).
528    
529       We should do something similar to floating point, but the transformation
530       there is much more complex, so I won't deal with that.
531    
532    Secondly, there are now two new cps primops at our disposal:
533    
534     1. rawrecord of record_kind option
535        This pure operator allocates some uninitialized storage from the heap.
536        There are two forms:
537    
538         rawrecord NONE [INT n]  allocates a tagless record of length n
539         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
540                                     and initializes the tag.
541    
542     2. rawupdate of cty
543          rawupdate cty (v,i,x)
544          Assigns to x to the ith component of record v.
545          The storelist is not updated.
546    
547    We use these new primops for both spilling and increment record construction.
548    
549     1. Spilling.
550    
551        This is implemented with a linear scan algorithm (but generalized
552        to trees).  The algorithm will create a single spill record at the
553        beginning of the cps function and use rawupdate to spill to it,
554        and SELECT or SELp to reload from it.  So both spills and reloads
555        are fine-grain operations.  In contrast, in the old algorithm
556        "spills" have to be bundled together in records.
557    
558        Ideally, we should sink the spill record construction to where
559        it is needed.  We can even split the spill record into multiple ones
560        at the places where they are needed.  But CPS is not a good
561        representation for global code motion, so I'll keep it simple and
562        am not attempting this.
563    
564     2. Incremental record construction (aka record splitting).
565    
566        Long records with many component values which are simulatenously live
567        (recall that single use record addresses are not considered to
568         be live) are constructed with rawrecord and rawupdate.
569        We allocate space on the heap with rawrecord first, then gradually
570        fill it in with rawupdate.  This is the technique suggested to me
571        by Matthias.
572    
573        Some restrictions on when this is applicable:
574        1. It is not a VECTOR record.  The code generator currently does not handle
575           this case. VECTOR record uses double indirection like arrays.
576        2. All the record component values are defined in the same "basic block"
577           as the record constructor.  This is to prevent speculative
578           record construction.
579    
580    ----------------------------------------------------------------------
581    Name: Allen Leung
582    Date: 2002/02/22 01:02:00 EST
583    Tag: leunga-20020222-mlrisc-tools
584    
585    Minor bug fixes in the parser and rewriter
586    
587    ----------------------------------------------------------------------
588  Name: Allen Leung  Name: Allen Leung
589  Date: 2002/02/21 20:20:00 EST  Date: 2002/02/21 20:20:00 EST
590  Tag: leunga-20020221-peephole  Tag: leunga-20020221-peephole

Legend:
Removed from v.1085  
changed lines
  Added in v.1148

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0