Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1085, Fri Feb 22 00:15:55 2002 UTC revision 1152, Tue Mar 19 21:36:30 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Matthias Blume
17    Date: 2002/03/19 16:37:00 EST
18    Tag: blume-20020319-witnesses
19    Description:
20    
21    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
22    
23    This implements a generic mechanism for changing constness qualifiers
24    anywhere within big C types without resorting to outright "casts".
25    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
26    modify the constness at the outermost level.)
27    The implementation of "convert" is based on the idea of "witness"
28    values -- values that are not used by the operation but whose types
29    "testify" to their applicability.  On the implementation side, "convert"
30    is simply a projection (returning its second curried argument).  With
31    cross-module inlining, it should not result in any machine code being
32    generated.
33    
34    ----------------------------------------------------------------------
35    Name: Matthias Blume
36    Date: 2002/03/15 16:40:00 EST
37    Tag: blume-20020315-basis
38    Description:
39    
40    Provided (preliminary?) implementations for
41    
42      {String,Substring}.{concatWith,isSuffix,isSubstring}
43    
44    and
45    
46      Substring.full
47    
48    Those are in the Basis spec but they were missing in SML/NJ.
49    
50    ----------------------------------------------------------------------
51    Name: Matthias Blume
52    Date: 2002/03/14 21:30:00 EST
53    Tag: blume-20020314-controls
54    Description:
55    
56    Controls:
57    ---------
58    
59    1. Factored out the recently-added Controls : CONTROLS stuff and put
60       it into its own library $/controls-lib.cm.  The source tree for
61       this is under src/smlnj-lib/Controls.
62    
63    2. Changed the names of types and functions in this interface, so they
64       make a bit more "sense":
65    
66          module -> registry
67          'a registry -> 'a group
68    
69    3. The interface now deals in ref cells only.  The getter/setter interface
70       is (mostly) gone.
71    
72    4. Added a function that lets one register an already-existing ref cell.
73    
74    5. Made the corresponding modifications to the rest of the code so that
75       everything compiles again.
76    
77    6. Changed the implementation of Controls.MLRISC back to something closer
78       to the original.  In particular, this module (and therefore MLRISC)
79       does not depend on Controls.  There now is some link-time code in
80       int-sys.sml that registers the MLRISC controls with the Controls
81       module.
82    
83    CM:
84    ---
85    
86      * One can now specify the lambda-split aggressiveness in init.cmi.
87    
88    ----------------------------------------------------------------------
89    Name: Allen Leung
90    Date: 2002/03/13 17:30:00 EST
91    Tag: leunga-20020313-x86-fp-unary
92    Description:
93    
94    Bug fix for:
95    
96    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
97    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
98    > - fun f(x,(y,z)) = Real.~ y;
99    > [autoloading]
100    > [autoloading done]
101    >       fchsl   (%eax), 184(%esp)
102    > Error: MLRisc bug: X86MCEmitter.emitInstr
103    >
104    > uncaught exception Error
105    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
106    
107    The problem was that the code generator did not generate any fp registers
108    in this case, and the ra didn't know that it needed to run the X86FP phase to
109    translate the pseudo fp instruction.   This only happened with unary fp
110    operators in certain situations.
111    
112    ----------------------------------------------------------------------
113    Name: Matthias Blume
114    Date: 2002/03/13 14:00:00 EST
115    Tag: blume-20020313-overload-etc
116    Description:
117    
118    1. Added _overload as a synonym for overload for backward compatibility.
119       (Control.overloadKW must be true for either version to be accepted.)
120    
121    2. Fixed bug in install script that caused more things to be installed
122       than what was requested in config/targets.
123    
124    3. Made CM aware of the (_)overload construct so that autoloading
125       works.
126    
127    ----------------------------------------------------------------------
128    Name: Matthias Blume
129    Date: 2002/03/12 22:03:00 EST
130    Tag: blume-20020312-url
131    Description:
132    
133    Forgot to update BOOT and srcarchiveurl.
134    
135    ----------------------------------------------------------------------
136    Name: Matthias Blume
137    Date: 2002/03/12 17:30:00 EST
138    Tag: blume-20020312-version110392
139    Description:
140    
141    Yet another version number bump (because of small changes to the
142    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
143    
144    Changes:
145    
146      The new pid generation scheme described a few weeks ago was overly
147      complicated.  I implemented a new mechanism that is simpler and
148      provides a bit more "stability":  Once CM has seen a compilation
149      unit, it keeps its identity constant (as long as you do not delete
150      those crucial CM/GUID/* files).  This means that when you change
151      an interface, compile, then go back to the old interface, and
152      compile again, you arrive at the original pid.
153    
154      There now also is a mechanism that instructs CM to use the plain
155      environment hash as a module's pid (effectively making its GUID
156      the empty string).  For this, "noguid" must be specified as an
157      option to the .sml file in question within its .cm file.
158      This is most useful for code that is being generated by tools such
159      as ml-nlffigen (because during development programmers tend to
160      erase the tool's entire output directory tree including CM's cached
161      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
162      revert to the old, broken behavior of SML/NJ, but in specific cases
163      where there is no danger of interface confusion, its use is ok
164      (I think).
165    
166      ml-nlffigen by default generates "noguid" annotations.  They can be
167      turned off by specifying -guid in its command line.
168    
169    ----------------------------------------------------------------------
170    Name: Lal George
171    Date: 2002/03/12 12 14:42:36 EST
172    Tag: george-20020312-frequency-computation
173    Description:
174    
175    Integrated jump chaining and static block frequency into the
176    compiler. More details and numbers later.
177    
178    ----------------------------------------------------------------------
179    Name: Lal George
180    Date: 2002/03/11 11 22:38:53 EST
181    Tag: george-20020311-jump-chain-elim
182    Description:
183    
184    Tested the jump chain elimination on all architectures (except the
185    hppa).  This is on by default right now and is profitable for the
186    alpha and x86, however, it may not be profitable for the sparc and ppc
187    when compiling the compiler.
188    
189    The gc test will typically jump to a label at the end of the cluster,
190    where there is another jump to an external cluster containing the actual
191    code to invoke gc. This is to allow factoring of common gc invocation
192    sequences. That is to say, we generate:
193    
194            f:
195               testgc
196               ja   L1      % jump if above to L1
197    
198            L1:
199               jmp L2
200    
201    
202    After jump chain elimination the 'ja L1' instructions is converted to
203    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
204    up being implemented in their long form (if L2 is far away) using:
205    
206            jbe     L3      % jump if below or equal to L3
207            jmp     L2
208         L3:
209            ...
210    
211    
212    For large compilation units L2  may be far away.
213    
214    
215    ----------------------------------------------------------------------
216    Name: Matthias Blume
217    Date: 2002/03/11 13:30:00 EST
218    Tag: blume-20020311-mltreeeval
219    Description:
220    
221    A functor parameter was missing.
222    
223    ----------------------------------------------------------------------
224    Name: Allen Leung
225    Date: 2002/03/11 10:30:00 EST
226    Tag: leunga-20020311-runtime-string0
227    Description:
228    
229       The representation of the empty string now points to a
230    legal null terminated C string instead of unit.  It is now possible
231    to convert an ML string into C string with InlineT.CharVector.getData.
232    This compiles into one single machine instruction.
233    
234    ----------------------------------------------------------------------
235    Name: Allen Leung
236    Date: 2002/03/10 23:55:00 EST
237    Tag: leunga-20020310-x86-call
238    Description:
239    
240       Added machine generation for CALL instruction (relative displacement mode)
241    
242    ----------------------------------------------------------------------
243    Name: Matthias Blume
244    Date: 2002/03/08 16:05:00
245    Tag: blume-20020308-entrypoints
246    Description:
247    
248    Version number bumped to 110.39.1.  NEW BOOTFILES!
249    
250    Entrypoints: non-zero offset into a code object where execution should begin.
251    
252    - Added the notion of an entrypoint to CodeObj.
253    - Added reading/writing of entrypoint info to Binfile.
254    - Made runtime system bootloader aware of entrypoints.
255    - Use the address of the label of the first function given to mlriscGen
256      as the entrypoint.  This address is currently always 0, but it will
257      not be 0 once we turn on block placement.
258    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
259      with entry points) from mlriscGen.
260    
261    ----------------------------------------------------------------------
262    Name: Allen Leung
263    Date: 2002/03/07 20:45:00 EST
264    Tag: leunga-20020307-x86-cmov
265    Description:
266    
267       Bug fixes for CMOVcc on x86.
268    
269       1. Added machine code generation for CMOVcc
270       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
271       3. CMOVcc cannot have an immediate operand as argument.
272    
273    ----------------------------------------------------------------------
274    Name: Matthias Blume
275    Date: 2002/03/07 16:15:00 EST
276    Tag: blume-20020307-controls
277    Description:
278    
279    This is a very large but mostly boring patch which makes (almost)
280    every tuneable compiler knob (i.e., pretty much everything under
281    Control.* plus a few other things) configurable via both the command
282    line and environment variables in the style CM did its configuration
283    until now.
284    
285    Try starting sml with '-h' (or, if you are brave, '-H')
286    
287    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
288    implements the underlying generic mechanism.
289    
290    The interface to some of the existing such facilities has changed somewhat.
291    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
292    (The getFoo interface is still there for backward-compatibility, but its
293    use is deprecated.)
294    
295    The ml-build script passes -Cxxx=yyy command-line arguments through so
296    that one can now twiddle the compiler settings when using this "batch"
297    compiler.
298    
299    TODO items:
300    
301    We should go through and throw out all controls that are no longer
302    connected to anything.  Moreover, we should go through and provide
303    meaningful (and correct!) documentation strings for those controls
304    that still are connected.
305    
306    Currently, multiple calls to Controls.new are accepted (only the first
307    has any effect).  Eventually we should make sure that every control
308    is being made (via Controls.new) exactly once.  Future access can then
309    be done using Controls.acc.
310    
311    Finally, it would probably be a good idea to use the getter-setter
312    interface to controls rather than ref cells.  For the time being, both
313    styles are provided by the Controls module, but getter-setter pairs are
314    better if thread-safety is of any concern because they can be wrapped.
315    
316    *****************************************
317    
318    One bug fix: The function blockPlacement in three of the MLRISC
319    backpatch files used to be hard-wired to one of two possibilities at
320    link time (according to the value of the placementFlag).  But (I
321    think) it should rather sense the flag every time.
322    
323    *****************************************
324    
325    Other assorted changes (by other people who did not supply a HISTORY entry):
326    
327    1. the cross-module inliner now works much better (Monnier)
328    2. representation of weights, frequencies, and probabilities in MLRISC
329       changed in preparation of using those for weighted block placement
330       (Reppy, George)
331    
332    ----------------------------------------------------------------------
333    Name: Lal George
334    Date: 2002/03/07 14:44:24 EST 2002
335    Tag: george-20020307-weighted-block-placement
336    
337    Tested the weighted block placement optimization on all architectures
338    (except the hppa) using AMPL to generate the block and edge frequencies.
339    Changes were required in the machine properties to correctly
340    categorize trap instructions. There is an MLRISC flag
341    "weighted-block-placement" that can be used to enable weighted block
342    placement, but this will be ineffective without block/edge
343    frequencies (coming soon).
344    
345    
346    ----------------------------------------------------------------------
347    Name: Lal George
348    Date: 2002/03/05 17:24:48 EST
349    Tag: george-20020305-linkage-cluster
350    
351    In order to support the block placement optimization, a new cluster
352    is generated as the very first cluster (called the linkage cluster).
353    It contains a single jump to the 'real' entry point for the compilation
354    unit. Block placement has no effect on the linkage cluster itself, but
355    all the other clusters  have full freedom in the manner in which they
356    reorder blocks or functions.
357    
358    On the x86 the typical linkage code that is generated is:
359       ----------------------
360            .align 2
361       L0:
362            addl    $L1-L0, 72(%esp)
363            jmp     L1
364    
365    
366            .align  2
367       L1:
368       ----------------------
369    
370    72(%esp) is the memory location for the stdlink register. This
371    must contain the address of the CPS function being called. In the
372    above example, it contains the address of  L0; before
373    calling L1 (the real entry point for the compilation unit), it
374    must contain the address for L1, and hence
375    
376            addl $L1-L0, 72(%esp)
377    
378    I have tested this on all architectures except the hppa.The increase
379    in code size is of course negligible
380    
381    ----------------------------------------------------------------------
382    Name: Allen Leung
383    Date: 2002/03/03 13:20:00 EST
384    Tag: leunga-20020303-mlrisc-tools
385    
386      Added #[ ... ] expressions to mlrisc tools
387    
388    ----------------------------------------------------------------------
389    Name: Matthias Blume
390    Date: 2002/02/27 12:29:00 EST
391    Tag: blume-20020227-cdebug
392    Description:
393    
394    - made types in structure C and C_Debug to be equal
395    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
396    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
397    
398    ----------------------------------------------------------------------
399    Name: Matthias Blume
400    Date: 2002/02/26 12:00:00 EST
401    Tag: blume-20020226-ffi
402    Description:
403    
404    1. Fixed a minor bug in CM's "noweb" tool:
405       If numbering is turned off, then truly don't number (i.e., do not
406       supply the -L option to noweb).  The previous behavior was to supply
407       -L'' -- which caused noweb to use the "default" line numbering scheme.
408       Thanks to Chris Richards for pointing this out (and supplying the fix).
409    
410    2. Once again, I reworked some aspects of the FFI:
411    
412       A. The incomplete/complete type business:
413    
414       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
415         gone!
416       - ML types representing an incomplete type are now *equal* to
417         ML types representing their corresponding complete types (just like
418         in C).  This is still safe because ml-nlffigen will not generate
419         RTTI for incomplete types, nor will it generate functions that
420         require access to such RTTI.   But when ML code generated from both
421         incomplete and complete versions of the C type meet, the ML types
422         are trivially interoperable.
423    
424         NOTE:  These changes restore the full generality of the translation
425         (which was previously lost when I eliminated functorization)!
426    
427       B. Enum types:
428    
429       - Structure C now has a type constructor "enum" that is similar to
430         how the "su" constructor works.  However, "enum" is not a phantom
431         type because each "T enum" has values (and is isomorphic to
432         MLRep.Signed.int).
433       - There are generic access operations for enum objects (using
434         MLRep.Signed.int).
435       - ml-nlffigen will generate a structure E_foo for each "enum foo".
436         * The structure contains the definition of type "mlrep" (the ML-side
437         representation type of the enum).  Normally, mlrep is the same
438         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
439         then mlrep will be defined as a datatype -- thus facilitating
440         pattern matching on mlrep values.
441         ("-ec" will be suppressed if there are duplicate values in an
442          enumeration.)
443         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
444         will be generated for each C enum constant xxx.
445         * Conversion functions m2i and i2m convert between mlrep and
446         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
447         * Coversion functions c and ml convert between mlrep and "tag enum".
448         * Access functions (get/set) fetch and store mlrep values.
449       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
450         enumerations are merged into one single enumeration represented by
451         structure E_'.
452    
453    ----------------------------------------------------------------------
454    Name: Allen Leung
455    Date: 2002/02/25 04:45:00 EST
456    Tag: leunga-20020225-cps-spill
457    
458    This is a new implementation of the CPS spill phase.
459    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
460    In case of problems, replace it with the old file spill.sml
461    
462    The current compiler runs into some serious performance problems when
463    constructing a large record.  This can happen when we try to compile a
464    structure with many items.  Even a very simple structure like the following
465    makes the compiler slow down.
466    
467        structure Foo = struct
468           val x_1 = 0w1 : Word32.int
469           val x_2 = 0w2 : Word32.int
470           val x_3 = 0w3 : Word32.int
471           ...
472           val x_N = 0wN : Word32.int
473        end
474    
475    The following table shows the compile time, from N=1000 to N=4000,
476    with the old compiler:
477    
478    N
479    1000   CPS 100 spill                           0.04u  0.00s  0.00g
480           MLRISC ra                               0.06u  0.00s  0.05g
481              (spills = 0 reloads = 0)
482           TOTAL                                   0.63u  0.07s  0.21g
483    
484    1100   CPS 100 spill                           8.25u  0.32s  0.64g
485           MLRISC ra                               5.68u  0.59s  3.93g
486              (spills = 0 reloads = 0)
487           TOTAL                                   14.71u  0.99s  4.81g
488    
489    1500   CPS 100 spill                           58.55u  2.34s  1.74g
490           MLRISC ra                               5.54u  0.65s  3.91g
491              (spills = 543 reloads = 1082)
492           TOTAL                                   65.40u  3.13s  6.00g
493    
494    2000   CPS 100 spill                           126.69u  4.84s  3.08g
495           MLRISC ra                               0.80u  0.10s  0.55g
496              (spills = 42 reloads = 84)
497           TOTAL                                   129.42u  5.10s  4.13g
498    
499    3000   CPS 100 spill                           675.59u  19.03s  11.64g
500           MLRISC ra                               2.69u  0.27s  1.38g
501              (spills = 62 reloads = 124)
502           TOTAL                                   682.48u  19.61s  13.99g
503    
504    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
505           MLRISC ra                               4.96u  0.27s  2.72g
506              (spills = 85 reloads = 170)
507           TOTAL                                   2375.26u  57.21s  48.00g
508    
509    As you can see the old cps spill module suffers from some serious
510    performance problem.  But since I cannot decipher the old code fully,
511    instead of patching the problems up, I'm reimplementing it
512    with a different algorithm.  The new code is more modular,
513    smaller when compiled, and substantially faster
514    (O(n log n) time and O(n) space).  Timing of the new spill module:
515    
516    4000  CPS 100 spill                           0.02u  0.00s  0.00g
517          MLRISC ra                               0.25u  0.02s  0.15g
518             (spills=1 reloads=3)
519          TOTAL                                   7.74u  0.34s  1.62g
520    
521    Implementation details:
522    
523    As far as I can tell, the purpose of the CPS spill module is to make sure the
524    number of live variables at any program point (the bandwidth)
525    does not exceed a certain limit, which is determined by the
526    size of the spill area.
527    
528    When the bandwidth is too large, we decrease the register pressure by
529    packing live variables into spill records.  How we achieve this is
530    completely different than what we did in the old code.
531    
532    First, there is something about the MLRiscGen code generator
533    that we should be aware of:
534    
535    o MLRiscGen performs code motion!
536    
537       In particular, it will move floating point computations and
538       address computations involving only the heap pointer to
539       their use sites (if there is only a single use).
540       What this means is that if we have a CPS record construction
541       statement
542    
543           RECORD(k,vl,w,e)
544    
545       we should never count the new record address w as live if w
546       has only one use (which is often the case).
547    
548       We should do something similar to floating point, but the transformation
549       there is much more complex, so I won't deal with that.
550    
551    Secondly, there are now two new cps primops at our disposal:
552    
553     1. rawrecord of record_kind option
554        This pure operator allocates some uninitialized storage from the heap.
555        There are two forms:
556    
557         rawrecord NONE [INT n]  allocates a tagless record of length n
558         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
559                                     and initializes the tag.
560    
561     2. rawupdate of cty
562          rawupdate cty (v,i,x)
563          Assigns to x to the ith component of record v.
564          The storelist is not updated.
565    
566    We use these new primops for both spilling and increment record construction.
567    
568     1. Spilling.
569    
570        This is implemented with a linear scan algorithm (but generalized
571        to trees).  The algorithm will create a single spill record at the
572        beginning of the cps function and use rawupdate to spill to it,
573        and SELECT or SELp to reload from it.  So both spills and reloads
574        are fine-grain operations.  In contrast, in the old algorithm
575        "spills" have to be bundled together in records.
576    
577        Ideally, we should sink the spill record construction to where
578        it is needed.  We can even split the spill record into multiple ones
579        at the places where they are needed.  But CPS is not a good
580        representation for global code motion, so I'll keep it simple and
581        am not attempting this.
582    
583     2. Incremental record construction (aka record splitting).
584    
585        Long records with many component values which are simulatenously live
586        (recall that single use record addresses are not considered to
587         be live) are constructed with rawrecord and rawupdate.
588        We allocate space on the heap with rawrecord first, then gradually
589        fill it in with rawupdate.  This is the technique suggested to me
590        by Matthias.
591    
592        Some restrictions on when this is applicable:
593        1. It is not a VECTOR record.  The code generator currently does not handle
594           this case. VECTOR record uses double indirection like arrays.
595        2. All the record component values are defined in the same "basic block"
596           as the record constructor.  This is to prevent speculative
597           record construction.
598    
599    ----------------------------------------------------------------------
600    Name: Allen Leung
601    Date: 2002/02/22 01:02:00 EST
602    Tag: leunga-20020222-mlrisc-tools
603    
604    Minor bug fixes in the parser and rewriter
605    
606    ----------------------------------------------------------------------
607  Name: Allen Leung  Name: Allen Leung
608  Date: 2002/02/21 20:20:00 EST  Date: 2002/02/21 20:20:00 EST
609  Tag: leunga-20020221-peephole  Tag: leunga-20020221-peephole

Legend:
Removed from v.1085  
changed lines
  Added in v.1152

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0