Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1086, Fri Feb 22 05:56:29 2002 UTC revision 1155, Wed Mar 20 20:52:51 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Matthias Blume
17    Date: 2002/03/20 15:55:00 EST
18    Tag: blume-20020320-kmp
19    Description:
20    
21    Implemented Knuth-Morris-Pratt string matching in PreString and used
22    it for String.isSubstring, Substring.isSubstring, and
23    Substring.position.
24    
25    (Might need some stress-testing.  Simple examples worked fine.)
26    
27    ----------------------------------------------------------------------
28    Name: Matthias Blume
29    Date: 2002/03/19 16:37:00 EST
30    Tag: blume-20020319-witnesses
31    Description:
32    
33    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
34    
35    This implements a generic mechanism for changing constness qualifiers
36    anywhere within big C types without resorting to outright "casts".
37    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
38    modify the constness at the outermost level.)
39    The implementation of "convert" is based on the idea of "witness"
40    values -- values that are not used by the operation but whose types
41    "testify" to their applicability.  On the implementation side, "convert"
42    is simply a projection (returning its second curried argument).  With
43    cross-module inlining, it should not result in any machine code being
44    generated.
45    
46    ----------------------------------------------------------------------
47    Name: Matthias Blume
48    Date: 2002/03/15 16:40:00 EST
49    Tag: blume-20020315-basis
50    Description:
51    
52    Provided (preliminary?) implementations for
53    
54      {String,Substring}.{concatWith,isSuffix,isSubstring}
55    
56    and
57    
58      Substring.full
59    
60    Those are in the Basis spec but they were missing in SML/NJ.
61    
62    ----------------------------------------------------------------------
63    Name: Matthias Blume
64    Date: 2002/03/14 21:30:00 EST
65    Tag: blume-20020314-controls
66    Description:
67    
68    Controls:
69    ---------
70    
71    1. Factored out the recently-added Controls : CONTROLS stuff and put
72       it into its own library $/controls-lib.cm.  The source tree for
73       this is under src/smlnj-lib/Controls.
74    
75    2. Changed the names of types and functions in this interface, so they
76       make a bit more "sense":
77    
78          module -> registry
79          'a registry -> 'a group
80    
81    3. The interface now deals in ref cells only.  The getter/setter interface
82       is (mostly) gone.
83    
84    4. Added a function that lets one register an already-existing ref cell.
85    
86    5. Made the corresponding modifications to the rest of the code so that
87       everything compiles again.
88    
89    6. Changed the implementation of Controls.MLRISC back to something closer
90       to the original.  In particular, this module (and therefore MLRISC)
91       does not depend on Controls.  There now is some link-time code in
92       int-sys.sml that registers the MLRISC controls with the Controls
93       module.
94    
95    CM:
96    ---
97    
98      * One can now specify the lambda-split aggressiveness in init.cmi.
99    
100    ----------------------------------------------------------------------
101    Name: Allen Leung
102    Date: 2002/03/13 17:30:00 EST
103    Tag: leunga-20020313-x86-fp-unary
104    Description:
105    
106    Bug fix for:
107    
108    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
109    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
110    > - fun f(x,(y,z)) = Real.~ y;
111    > [autoloading]
112    > [autoloading done]
113    >       fchsl   (%eax), 184(%esp)
114    > Error: MLRisc bug: X86MCEmitter.emitInstr
115    >
116    > uncaught exception Error
117    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
118    
119    The problem was that the code generator did not generate any fp registers
120    in this case, and the ra didn't know that it needed to run the X86FP phase to
121    translate the pseudo fp instruction.   This only happened with unary fp
122    operators in certain situations.
123    
124    ----------------------------------------------------------------------
125    Name: Matthias Blume
126    Date: 2002/03/13 14:00:00 EST
127    Tag: blume-20020313-overload-etc
128    Description:
129    
130    1. Added _overload as a synonym for overload for backward compatibility.
131       (Control.overloadKW must be true for either version to be accepted.)
132    
133    2. Fixed bug in install script that caused more things to be installed
134       than what was requested in config/targets.
135    
136    3. Made CM aware of the (_)overload construct so that autoloading
137       works.
138    
139    ----------------------------------------------------------------------
140    Name: Matthias Blume
141    Date: 2002/03/12 22:03:00 EST
142    Tag: blume-20020312-url
143    Description:
144    
145    Forgot to update BOOT and srcarchiveurl.
146    
147    ----------------------------------------------------------------------
148    Name: Matthias Blume
149    Date: 2002/03/12 17:30:00 EST
150    Tag: blume-20020312-version110392
151    Description:
152    
153    Yet another version number bump (because of small changes to the
154    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
155    
156    Changes:
157    
158      The new pid generation scheme described a few weeks ago was overly
159      complicated.  I implemented a new mechanism that is simpler and
160      provides a bit more "stability":  Once CM has seen a compilation
161      unit, it keeps its identity constant (as long as you do not delete
162      those crucial CM/GUID/* files).  This means that when you change
163      an interface, compile, then go back to the old interface, and
164      compile again, you arrive at the original pid.
165    
166      There now also is a mechanism that instructs CM to use the plain
167      environment hash as a module's pid (effectively making its GUID
168      the empty string).  For this, "noguid" must be specified as an
169      option to the .sml file in question within its .cm file.
170      This is most useful for code that is being generated by tools such
171      as ml-nlffigen (because during development programmers tend to
172      erase the tool's entire output directory tree including CM's cached
173      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
174      revert to the old, broken behavior of SML/NJ, but in specific cases
175      where there is no danger of interface confusion, its use is ok
176      (I think).
177    
178      ml-nlffigen by default generates "noguid" annotations.  They can be
179      turned off by specifying -guid in its command line.
180    
181    ----------------------------------------------------------------------
182    Name: Lal George
183    Date: 2002/03/12 12 14:42:36 EST
184    Tag: george-20020312-frequency-computation
185    Description:
186    
187    Integrated jump chaining and static block frequency into the
188    compiler. More details and numbers later.
189    
190    ----------------------------------------------------------------------
191    Name: Lal George
192    Date: 2002/03/11 11 22:38:53 EST
193    Tag: george-20020311-jump-chain-elim
194    Description:
195    
196    Tested the jump chain elimination on all architectures (except the
197    hppa).  This is on by default right now and is profitable for the
198    alpha and x86, however, it may not be profitable for the sparc and ppc
199    when compiling the compiler.
200    
201    The gc test will typically jump to a label at the end of the cluster,
202    where there is another jump to an external cluster containing the actual
203    code to invoke gc. This is to allow factoring of common gc invocation
204    sequences. That is to say, we generate:
205    
206            f:
207               testgc
208               ja   L1      % jump if above to L1
209    
210            L1:
211               jmp L2
212    
213    
214    After jump chain elimination the 'ja L1' instructions is converted to
215    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
216    up being implemented in their long form (if L2 is far away) using:
217    
218            jbe     L3      % jump if below or equal to L3
219            jmp     L2
220         L3:
221            ...
222    
223    
224    For large compilation units L2  may be far away.
225    
226    
227    ----------------------------------------------------------------------
228    Name: Matthias Blume
229    Date: 2002/03/11 13:30:00 EST
230    Tag: blume-20020311-mltreeeval
231    Description:
232    
233    A functor parameter was missing.
234    
235    ----------------------------------------------------------------------
236    Name: Allen Leung
237    Date: 2002/03/11 10:30:00 EST
238    Tag: leunga-20020311-runtime-string0
239    Description:
240    
241       The representation of the empty string now points to a
242    legal null terminated C string instead of unit.  It is now possible
243    to convert an ML string into C string with InlineT.CharVector.getData.
244    This compiles into one single machine instruction.
245    
246    ----------------------------------------------------------------------
247    Name: Allen Leung
248    Date: 2002/03/10 23:55:00 EST
249    Tag: leunga-20020310-x86-call
250    Description:
251    
252       Added machine generation for CALL instruction (relative displacement mode)
253    
254    ----------------------------------------------------------------------
255    Name: Matthias Blume
256    Date: 2002/03/08 16:05:00
257    Tag: blume-20020308-entrypoints
258    Description:
259    
260    Version number bumped to 110.39.1.  NEW BOOTFILES!
261    
262    Entrypoints: non-zero offset into a code object where execution should begin.
263    
264    - Added the notion of an entrypoint to CodeObj.
265    - Added reading/writing of entrypoint info to Binfile.
266    - Made runtime system bootloader aware of entrypoints.
267    - Use the address of the label of the first function given to mlriscGen
268      as the entrypoint.  This address is currently always 0, but it will
269      not be 0 once we turn on block placement.
270    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
271      with entry points) from mlriscGen.
272    
273    ----------------------------------------------------------------------
274    Name: Allen Leung
275    Date: 2002/03/07 20:45:00 EST
276    Tag: leunga-20020307-x86-cmov
277    Description:
278    
279       Bug fixes for CMOVcc on x86.
280    
281       1. Added machine code generation for CMOVcc
282       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
283       3. CMOVcc cannot have an immediate operand as argument.
284    
285    ----------------------------------------------------------------------
286    Name: Matthias Blume
287    Date: 2002/03/07 16:15:00 EST
288    Tag: blume-20020307-controls
289    Description:
290    
291    This is a very large but mostly boring patch which makes (almost)
292    every tuneable compiler knob (i.e., pretty much everything under
293    Control.* plus a few other things) configurable via both the command
294    line and environment variables in the style CM did its configuration
295    until now.
296    
297    Try starting sml with '-h' (or, if you are brave, '-H')
298    
299    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
300    implements the underlying generic mechanism.
301    
302    The interface to some of the existing such facilities has changed somewhat.
303    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
304    (The getFoo interface is still there for backward-compatibility, but its
305    use is deprecated.)
306    
307    The ml-build script passes -Cxxx=yyy command-line arguments through so
308    that one can now twiddle the compiler settings when using this "batch"
309    compiler.
310    
311    TODO items:
312    
313    We should go through and throw out all controls that are no longer
314    connected to anything.  Moreover, we should go through and provide
315    meaningful (and correct!) documentation strings for those controls
316    that still are connected.
317    
318    Currently, multiple calls to Controls.new are accepted (only the first
319    has any effect).  Eventually we should make sure that every control
320    is being made (via Controls.new) exactly once.  Future access can then
321    be done using Controls.acc.
322    
323    Finally, it would probably be a good idea to use the getter-setter
324    interface to controls rather than ref cells.  For the time being, both
325    styles are provided by the Controls module, but getter-setter pairs are
326    better if thread-safety is of any concern because they can be wrapped.
327    
328    *****************************************
329    
330    One bug fix: The function blockPlacement in three of the MLRISC
331    backpatch files used to be hard-wired to one of two possibilities at
332    link time (according to the value of the placementFlag).  But (I
333    think) it should rather sense the flag every time.
334    
335    *****************************************
336    
337    Other assorted changes (by other people who did not supply a HISTORY entry):
338    
339    1. the cross-module inliner now works much better (Monnier)
340    2. representation of weights, frequencies, and probabilities in MLRISC
341       changed in preparation of using those for weighted block placement
342       (Reppy, George)
343    
344    ----------------------------------------------------------------------
345    Name: Lal George
346    Date: 2002/03/07 14:44:24 EST 2002
347    Tag: george-20020307-weighted-block-placement
348    
349    Tested the weighted block placement optimization on all architectures
350    (except the hppa) using AMPL to generate the block and edge frequencies.
351    Changes were required in the machine properties to correctly
352    categorize trap instructions. There is an MLRISC flag
353    "weighted-block-placement" that can be used to enable weighted block
354    placement, but this will be ineffective without block/edge
355    frequencies (coming soon).
356    
357    
358    ----------------------------------------------------------------------
359    Name: Lal George
360    Date: 2002/03/05 17:24:48 EST
361    Tag: george-20020305-linkage-cluster
362    
363    In order to support the block placement optimization, a new cluster
364    is generated as the very first cluster (called the linkage cluster).
365    It contains a single jump to the 'real' entry point for the compilation
366    unit. Block placement has no effect on the linkage cluster itself, but
367    all the other clusters  have full freedom in the manner in which they
368    reorder blocks or functions.
369    
370    On the x86 the typical linkage code that is generated is:
371       ----------------------
372            .align 2
373       L0:
374            addl    $L1-L0, 72(%esp)
375            jmp     L1
376    
377    
378            .align  2
379       L1:
380       ----------------------
381    
382    72(%esp) is the memory location for the stdlink register. This
383    must contain the address of the CPS function being called. In the
384    above example, it contains the address of  L0; before
385    calling L1 (the real entry point for the compilation unit), it
386    must contain the address for L1, and hence
387    
388            addl $L1-L0, 72(%esp)
389    
390    I have tested this on all architectures except the hppa.The increase
391    in code size is of course negligible
392    
393    ----------------------------------------------------------------------
394    Name: Allen Leung
395    Date: 2002/03/03 13:20:00 EST
396    Tag: leunga-20020303-mlrisc-tools
397    
398      Added #[ ... ] expressions to mlrisc tools
399    
400    ----------------------------------------------------------------------
401    Name: Matthias Blume
402    Date: 2002/02/27 12:29:00 EST
403    Tag: blume-20020227-cdebug
404    Description:
405    
406    - made types in structure C and C_Debug to be equal
407    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
408    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
409    
410    ----------------------------------------------------------------------
411    Name: Matthias Blume
412    Date: 2002/02/26 12:00:00 EST
413    Tag: blume-20020226-ffi
414    Description:
415    
416    1. Fixed a minor bug in CM's "noweb" tool:
417       If numbering is turned off, then truly don't number (i.e., do not
418       supply the -L option to noweb).  The previous behavior was to supply
419       -L'' -- which caused noweb to use the "default" line numbering scheme.
420       Thanks to Chris Richards for pointing this out (and supplying the fix).
421    
422    2. Once again, I reworked some aspects of the FFI:
423    
424       A. The incomplete/complete type business:
425    
426       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
427         gone!
428       - ML types representing an incomplete type are now *equal* to
429         ML types representing their corresponding complete types (just like
430         in C).  This is still safe because ml-nlffigen will not generate
431         RTTI for incomplete types, nor will it generate functions that
432         require access to such RTTI.   But when ML code generated from both
433         incomplete and complete versions of the C type meet, the ML types
434         are trivially interoperable.
435    
436         NOTE:  These changes restore the full generality of the translation
437         (which was previously lost when I eliminated functorization)!
438    
439       B. Enum types:
440    
441       - Structure C now has a type constructor "enum" that is similar to
442         how the "su" constructor works.  However, "enum" is not a phantom
443         type because each "T enum" has values (and is isomorphic to
444         MLRep.Signed.int).
445       - There are generic access operations for enum objects (using
446         MLRep.Signed.int).
447       - ml-nlffigen will generate a structure E_foo for each "enum foo".
448         * The structure contains the definition of type "mlrep" (the ML-side
449         representation type of the enum).  Normally, mlrep is the same
450         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
451         then mlrep will be defined as a datatype -- thus facilitating
452         pattern matching on mlrep values.
453         ("-ec" will be suppressed if there are duplicate values in an
454          enumeration.)
455         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
456         will be generated for each C enum constant xxx.
457         * Conversion functions m2i and i2m convert between mlrep and
458         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
459         * Coversion functions c and ml convert between mlrep and "tag enum".
460         * Access functions (get/set) fetch and store mlrep values.
461       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
462         enumerations are merged into one single enumeration represented by
463         structure E_'.
464    
465    ----------------------------------------------------------------------
466    Name: Allen Leung
467    Date: 2002/02/25 04:45:00 EST
468    Tag: leunga-20020225-cps-spill
469    
470    This is a new implementation of the CPS spill phase.
471    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
472    In case of problems, replace it with the old file spill.sml
473    
474    The current compiler runs into some serious performance problems when
475    constructing a large record.  This can happen when we try to compile a
476    structure with many items.  Even a very simple structure like the following
477    makes the compiler slow down.
478    
479        structure Foo = struct
480           val x_1 = 0w1 : Word32.int
481           val x_2 = 0w2 : Word32.int
482           val x_3 = 0w3 : Word32.int
483           ...
484           val x_N = 0wN : Word32.int
485        end
486    
487    The following table shows the compile time, from N=1000 to N=4000,
488    with the old compiler:
489    
490    N
491    1000   CPS 100 spill                           0.04u  0.00s  0.00g
492           MLRISC ra                               0.06u  0.00s  0.05g
493              (spills = 0 reloads = 0)
494           TOTAL                                   0.63u  0.07s  0.21g
495    
496    1100   CPS 100 spill                           8.25u  0.32s  0.64g
497           MLRISC ra                               5.68u  0.59s  3.93g
498              (spills = 0 reloads = 0)
499           TOTAL                                   14.71u  0.99s  4.81g
500    
501    1500   CPS 100 spill                           58.55u  2.34s  1.74g
502           MLRISC ra                               5.54u  0.65s  3.91g
503              (spills = 543 reloads = 1082)
504           TOTAL                                   65.40u  3.13s  6.00g
505    
506    2000   CPS 100 spill                           126.69u  4.84s  3.08g
507           MLRISC ra                               0.80u  0.10s  0.55g
508              (spills = 42 reloads = 84)
509           TOTAL                                   129.42u  5.10s  4.13g
510    
511    3000   CPS 100 spill                           675.59u  19.03s  11.64g
512           MLRISC ra                               2.69u  0.27s  1.38g
513              (spills = 62 reloads = 124)
514           TOTAL                                   682.48u  19.61s  13.99g
515    
516    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
517           MLRISC ra                               4.96u  0.27s  2.72g
518              (spills = 85 reloads = 170)
519           TOTAL                                   2375.26u  57.21s  48.00g
520    
521    As you can see the old cps spill module suffers from some serious
522    performance problem.  But since I cannot decipher the old code fully,
523    instead of patching the problems up, I'm reimplementing it
524    with a different algorithm.  The new code is more modular,
525    smaller when compiled, and substantially faster
526    (O(n log n) time and O(n) space).  Timing of the new spill module:
527    
528    4000  CPS 100 spill                           0.02u  0.00s  0.00g
529          MLRISC ra                               0.25u  0.02s  0.15g
530             (spills=1 reloads=3)
531          TOTAL                                   7.74u  0.34s  1.62g
532    
533    Implementation details:
534    
535    As far as I can tell, the purpose of the CPS spill module is to make sure the
536    number of live variables at any program point (the bandwidth)
537    does not exceed a certain limit, which is determined by the
538    size of the spill area.
539    
540    When the bandwidth is too large, we decrease the register pressure by
541    packing live variables into spill records.  How we achieve this is
542    completely different than what we did in the old code.
543    
544    First, there is something about the MLRiscGen code generator
545    that we should be aware of:
546    
547    o MLRiscGen performs code motion!
548    
549       In particular, it will move floating point computations and
550       address computations involving only the heap pointer to
551       their use sites (if there is only a single use).
552       What this means is that if we have a CPS record construction
553       statement
554    
555           RECORD(k,vl,w,e)
556    
557       we should never count the new record address w as live if w
558       has only one use (which is often the case).
559    
560       We should do something similar to floating point, but the transformation
561       there is much more complex, so I won't deal with that.
562    
563    Secondly, there are now two new cps primops at our disposal:
564    
565     1. rawrecord of record_kind option
566        This pure operator allocates some uninitialized storage from the heap.
567        There are two forms:
568    
569         rawrecord NONE [INT n]  allocates a tagless record of length n
570         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
571                                     and initializes the tag.
572    
573     2. rawupdate of cty
574          rawupdate cty (v,i,x)
575          Assigns to x to the ith component of record v.
576          The storelist is not updated.
577    
578    We use these new primops for both spilling and increment record construction.
579    
580     1. Spilling.
581    
582        This is implemented with a linear scan algorithm (but generalized
583        to trees).  The algorithm will create a single spill record at the
584        beginning of the cps function and use rawupdate to spill to it,
585        and SELECT or SELp to reload from it.  So both spills and reloads
586        are fine-grain operations.  In contrast, in the old algorithm
587        "spills" have to be bundled together in records.
588    
589        Ideally, we should sink the spill record construction to where
590        it is needed.  We can even split the spill record into multiple ones
591        at the places where they are needed.  But CPS is not a good
592        representation for global code motion, so I'll keep it simple and
593        am not attempting this.
594    
595     2. Incremental record construction (aka record splitting).
596    
597        Long records with many component values which are simulatenously live
598        (recall that single use record addresses are not considered to
599         be live) are constructed with rawrecord and rawupdate.
600        We allocate space on the heap with rawrecord first, then gradually
601        fill it in with rawupdate.  This is the technique suggested to me
602        by Matthias.
603    
604        Some restrictions on when this is applicable:
605        1. It is not a VECTOR record.  The code generator currently does not handle
606           this case. VECTOR record uses double indirection like arrays.
607        2. All the record component values are defined in the same "basic block"
608           as the record constructor.  This is to prevent speculative
609           record construction.
610    
611    ----------------------------------------------------------------------
612  Name: Allen Leung  Name: Allen Leung
613  Date: 2002/02/22 01:02:00 EST  Date: 2002/02/22 01:02:00 EST
614  Tag: leunga-20020222-mlrisc-tools  Tag: leunga-20020222-mlrisc-tools

Legend:
Removed from v.1086  
changed lines
  Added in v.1155

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0