Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1063, Wed Feb 13 21:41:53 2002 UTC revision 1179, Tue Mar 26 03:04:46 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Matthias Blume  Name: Matthias Blume
17    Date: 2002/03/25 22:06:00 EST
18    Tag: blume-20020325-cprotobug
19    Description:
20    
21    Fixed a bug in cproto (c prototype decoder).
22    
23    ----------------------------------------------------------------------
24    Name: Matthias Blume
25    Date: 2002/03/25 16:00:00 EST
26    Tag: blume-20020325-raw-primops
27    Description:
28    
29    I did some cleanup to Allen's new primop code and
30    replaced yesterday's bootfiles with new ones.
31    (But they are stored in the same place.)
32    
33    ----------------------------------------------------------------------
34    Name: Matthias Blume
35    Date: 2002/03/24 22:40:00 EST
36    Tag: blume-20020324-bootfiles
37    Description:
38    
39    Made the bootfiles that Allen asked for.
40    
41    ----------------------------------------------------------------------
42    Name: Allen Leung
43    Date: 2002/03/23 15:50:00 EST
44    Tag: leunga-20020323-flint-cps-rcc-primops
45    Description:
46    
47      1. Changes to FLINT primops:
48    
49        (* make a call to a C-function;
50         * The primop carries C function prototype information and specifies
51         * which of its (ML-) arguments are floating point. C prototype
52         * information is for use by the backend, ML information is for
53         * use by the CPS converter. *)
54      | RAW_CCALL of { c_proto: CTypes.c_proto,
55                       ml_args: ccall_type list,
56                       ml_res_opt: ccall_type option,
57                       reentrant : bool
58                     } option
59       (* Allocate uninitialized storage on the heap.
60        * The record is meant to hold short-lived C objects, i.e., they
61        * are not ML pointers.  With the tag, the representation is
62        * the same as RECORD with tag tag_raw32 (sz=4), or tag_fblock (sz=8)
63        *)
64      | RAW_RECORD of {tag:bool,sz:int}
65      and ccall_type = CCALL_INT32 | CCALL_REAL64 | CCALL_ML_PTR
66    
67      2.  These CPS primops are now overloaded:
68    
69           rawload of {kind:numkind}
70           rawstore of {kind:numkind}
71    
72          The one argument form is:
73    
74             rawload {kind} address
75    
76          The two argument form is:
77    
78             rawload {kind} [ml object, byte-offset]
79    
80      3. RAW_CCALL/RCC now takes two extra arguments:
81    
82         a. The first is whether the C call is reentrant, i.e., whether
83            ML state should be saved and restored.
84         b. The second argument is a string argument specifying the name of
85            library and the C function.
86    
87         These things are currently not handled in the code generator, yet.
88    
89      4. In CProto,
90    
91         An encoding type of "bool" means "ml object" and is mapped into
92         C prototype of PTR.  Note that "bool" is different than "string",
93         even though "string" is also mapped into PTR, because "bool"
94         is assigned an CPS type of BOGt, while "string" is assigned INT32t.
95    
96      5. Pickler/unpicker
97    
98         Changed to handle RAW_RECORD and newest RAW_CCALL
99    
100      6. MLRiscGen,
101    
102         1. Changed to handle the new rawload/rawstore/rawrecord operators.
103         2. Code for handling C Calls has been moved to a new module CPSCCalls,
104            in the file CodeGen/cpscompile/cps-c-calls.sml
105    
106      7. Added the conditional move operator
107    
108             condmove of branch
109    
110         to cps.  Generation of this is still buggy so it is currently
111         disabled.
112    
113    ----------------------------------------------------------------------
114    Name: Lal George
115    Date: 2002/03/22 14:18:25 EST
116    Tag: george-20020322-cps-branch-prob
117    Description:
118    
119    Implemented the Ball-Larus branch prediction-heuristics, and
120    incorporated graphical viewers for control flow graphs.
121    
122    Ball-Larus Heuristics:
123    ---------------------
124    See the file compiler/CodeGen/cpscompile/cpsBranchProb.sml.
125    
126    By design it uses the Dempster-Shafer theory for combining
127    probabilities.  For example, in the function:
128    
129        fun f(n,acc) = if n = 0 then acc else f(n-1, n*acc)
130    
131    the ball-larus heuristics predicts that the n=0 is unlikely
132    (OH-heuristic), and the 'then' branch is unlikely because of the
133    RH-heuristic -- giving the 'then' branch an even lower combined
134    probability using the Dempster-Shafer theory.
135    
136    Finally, John Reppy's loop analysis in MLRISC, further lowers the
137    probability of the 'then' branch because of the loop in the else
138    branch.
139    
140    
141    Graphical Viewing:
142    ------------------
143    I merely plugged in Allen's graphical viewers into the compiler. The
144    additional code is not much. At the top level, saying:
145    
146            Control.MLRISC.getFlag "cfg-graphical-view" := true;
147    
148    will display the graphical view of the control flow graph just before
149    back-patching.  daVinci must be in your path for this to work. If
150    daVinci is not available, then the default viewer can be changed
151    using:
152    
153            Control.MLRISC.getString "viewer"
154    
155    which can be set to "dot" or "vcg" for the corresponding viewers. Of
156    course, these viewers must be in your path.
157    
158    The above will display the compilation unit at the level of clusters,
159    many of which are small, boring, and un-interesting. Also setting:
160    
161            Control.MLRISC.getInt "cfg-graphical-view_size"
162    
163    will display clusters that are larger than the value set by the above.
164    
165    
166    ----------------------------------------------------------------------
167    Name: Matthias Blume
168    Date: 2002/03/21 22:20:00 EST
169    Tag: blume-20020321-kmp-bugfix
170    Description:
171    
172    Changed the interface to the KMP routine in PreString and fixed
173    a minor bug in one place where it was used.
174    
175    ----------------------------------------------------------------------
176    Name: Allen Leung
177    Date: 2002/03/21 20:30:00 EST
178    Tag: leunga-20020321-cfg
179    Description:
180    
181      Fixed a potential problem in cfg edge splitting.
182    
183    ----------------------------------------------------------------------
184    Name: Allen Leung
185    Date: 2002/03/21 17:15:00 EST
186    Tag: leunga-20020321-x86-fp-cfg
187    Description:
188    
189      1. Recoded the buggy parts of x86-fp.
190    
191         a. All the block reordering code has been removed.
192            We now depend on the block placement phases to do this work.
193    
194         b. Critical edge splitting code has been simplified and moved into the
195            CFG modules, as where they belong.
196    
197         Both of these were quite buggy and complex.  The code is now much, much
198         simpler.
199    
200      2. X86 backend.
201    
202         a. Added instructions for 64-bit support.  Instruction selection for
203            64-bit has not been committed, however, since that
204            requires changes to MLTREE which haven't been approved by
205            Lal and John.
206    
207         b. Added support for FUCOMI and FUCOMIP when generating code for
208            PentiumPro and above.  We only generate these instructions in
209            the fast-fp mode.
210    
211         c. Added cases for JP and JNP in X86FreqProps.
212    
213      3. CFG
214    
215         CFG now has a bunch of methods for edge splitting and merging.
216    
217      4. Machine description.
218    
219         John's simplification of MLTREE_BASIS.fcond broke a few machine
220         description things:
221    
222         rtl-build.{sig,sml} and hppa.mdl fixed.
223    
224         NOTE: the machine description stuff in the repository is still broken.
225               Again, I can't put my fixes in because that involves
226               changes to MLTREE.
227    
228    ----------------------------------------------------------------------
229    Name: Matthias Blume
230    Date: 2002/03/20 15:55:00 EST
231    Tag: blume-20020320-kmp
232    Description:
233    
234    Implemented Knuth-Morris-Pratt string matching in PreString and used
235    it for String.isSubstring, Substring.isSubstring, and
236    Substring.position.
237    
238    (Might need some stress-testing.  Simple examples worked fine.)
239    
240    ----------------------------------------------------------------------
241    Name: Matthias Blume
242    Date: 2002/03/19 16:37:00 EST
243    Tag: blume-20020319-witnesses
244    Description:
245    
246    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
247    
248    This implements a generic mechanism for changing constness qualifiers
249    anywhere within big C types without resorting to outright "casts".
250    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
251    modify the constness at the outermost level.)
252    The implementation of "convert" is based on the idea of "witness"
253    values -- values that are not used by the operation but whose types
254    "testify" to their applicability.  On the implementation side, "convert"
255    is simply a projection (returning its second curried argument).  With
256    cross-module inlining, it should not result in any machine code being
257    generated.
258    
259    ----------------------------------------------------------------------
260    Name: Matthias Blume
261    Date: 2002/03/15 16:40:00 EST
262    Tag: blume-20020315-basis
263    Description:
264    
265    Provided (preliminary?) implementations for
266    
267      {String,Substring}.{concatWith,isSuffix,isSubstring}
268    
269    and
270    
271      Substring.full
272    
273    Those are in the Basis spec but they were missing in SML/NJ.
274    
275    ----------------------------------------------------------------------
276    Name: Matthias Blume
277    Date: 2002/03/14 21:30:00 EST
278    Tag: blume-20020314-controls
279    Description:
280    
281    Controls:
282    ---------
283    
284    1. Factored out the recently-added Controls : CONTROLS stuff and put
285       it into its own library $/controls-lib.cm.  The source tree for
286       this is under src/smlnj-lib/Controls.
287    
288    2. Changed the names of types and functions in this interface, so they
289       make a bit more "sense":
290    
291          module -> registry
292          'a registry -> 'a group
293    
294    3. The interface now deals in ref cells only.  The getter/setter interface
295       is (mostly) gone.
296    
297    4. Added a function that lets one register an already-existing ref cell.
298    
299    5. Made the corresponding modifications to the rest of the code so that
300       everything compiles again.
301    
302    6. Changed the implementation of Controls.MLRISC back to something closer
303       to the original.  In particular, this module (and therefore MLRISC)
304       does not depend on Controls.  There now is some link-time code in
305       int-sys.sml that registers the MLRISC controls with the Controls
306       module.
307    
308    CM:
309    ---
310    
311      * One can now specify the lambda-split aggressiveness in init.cmi.
312    
313    ----------------------------------------------------------------------
314    Name: Allen Leung
315    Date: 2002/03/13 17:30:00 EST
316    Tag: leunga-20020313-x86-fp-unary
317    Description:
318    
319    Bug fix for:
320    
321    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
322    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
323    > - fun f(x,(y,z)) = Real.~ y;
324    > [autoloading]
325    > [autoloading done]
326    >       fchsl   (%eax), 184(%esp)
327    > Error: MLRisc bug: X86MCEmitter.emitInstr
328    >
329    > uncaught exception Error
330    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
331    
332    The problem was that the code generator did not generate any fp registers
333    in this case, and the ra didn't know that it needed to run the X86FP phase to
334    translate the pseudo fp instruction.   This only happened with unary fp
335    operators in certain situations.
336    
337    ----------------------------------------------------------------------
338    Name: Matthias Blume
339    Date: 2002/03/13 14:00:00 EST
340    Tag: blume-20020313-overload-etc
341    Description:
342    
343    1. Added _overload as a synonym for overload for backward compatibility.
344       (Control.overloadKW must be true for either version to be accepted.)
345    
346    2. Fixed bug in install script that caused more things to be installed
347       than what was requested in config/targets.
348    
349    3. Made CM aware of the (_)overload construct so that autoloading
350       works.
351    
352    ----------------------------------------------------------------------
353    Name: Matthias Blume
354    Date: 2002/03/12 22:03:00 EST
355    Tag: blume-20020312-url
356    Description:
357    
358    Forgot to update BOOT and srcarchiveurl.
359    
360    ----------------------------------------------------------------------
361    Name: Matthias Blume
362    Date: 2002/03/12 17:30:00 EST
363    Tag: blume-20020312-version110392
364    Description:
365    
366    Yet another version number bump (because of small changes to the
367    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
368    
369    Changes:
370    
371      The new pid generation scheme described a few weeks ago was overly
372      complicated.  I implemented a new mechanism that is simpler and
373      provides a bit more "stability":  Once CM has seen a compilation
374      unit, it keeps its identity constant (as long as you do not delete
375      those crucial CM/GUID/* files).  This means that when you change
376      an interface, compile, then go back to the old interface, and
377      compile again, you arrive at the original pid.
378    
379      There now also is a mechanism that instructs CM to use the plain
380      environment hash as a module's pid (effectively making its GUID
381      the empty string).  For this, "noguid" must be specified as an
382      option to the .sml file in question within its .cm file.
383      This is most useful for code that is being generated by tools such
384      as ml-nlffigen (because during development programmers tend to
385      erase the tool's entire output directory tree including CM's cached
386      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
387      revert to the old, broken behavior of SML/NJ, but in specific cases
388      where there is no danger of interface confusion, its use is ok
389      (I think).
390    
391      ml-nlffigen by default generates "noguid" annotations.  They can be
392      turned off by specifying -guid in its command line.
393    
394    ----------------------------------------------------------------------
395    Name: Lal George
396    Date: 2002/03/12 12 14:42:36 EST
397    Tag: george-20020312-frequency-computation
398    Description:
399    
400    Integrated jump chaining and static block frequency into the
401    compiler. More details and numbers later.
402    
403    ----------------------------------------------------------------------
404    Name: Lal George
405    Date: 2002/03/11 11 22:38:53 EST
406    Tag: george-20020311-jump-chain-elim
407    Description:
408    
409    Tested the jump chain elimination on all architectures (except the
410    hppa).  This is on by default right now and is profitable for the
411    alpha and x86, however, it may not be profitable for the sparc and ppc
412    when compiling the compiler.
413    
414    The gc test will typically jump to a label at the end of the cluster,
415    where there is another jump to an external cluster containing the actual
416    code to invoke gc. This is to allow factoring of common gc invocation
417    sequences. That is to say, we generate:
418    
419            f:
420               testgc
421               ja   L1      % jump if above to L1
422    
423            L1:
424               jmp L2
425    
426    
427    After jump chain elimination the 'ja L1' instructions is converted to
428    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
429    up being implemented in their long form (if L2 is far away) using:
430    
431            jbe     L3      % jump if below or equal to L3
432            jmp     L2
433         L3:
434            ...
435    
436    
437    For large compilation units L2  may be far away.
438    
439    
440    ----------------------------------------------------------------------
441    Name: Matthias Blume
442    Date: 2002/03/11 13:30:00 EST
443    Tag: blume-20020311-mltreeeval
444    Description:
445    
446    A functor parameter was missing.
447    
448    ----------------------------------------------------------------------
449    Name: Allen Leung
450    Date: 2002/03/11 10:30:00 EST
451    Tag: leunga-20020311-runtime-string0
452    Description:
453    
454       The representation of the empty string now points to a
455    legal null terminated C string instead of unit.  It is now possible
456    to convert an ML string into C string with InlineT.CharVector.getData.
457    This compiles into one single machine instruction.
458    
459    ----------------------------------------------------------------------
460    Name: Allen Leung
461    Date: 2002/03/10 23:55:00 EST
462    Tag: leunga-20020310-x86-call
463    Description:
464    
465       Added machine generation for CALL instruction (relative displacement mode)
466    
467    ----------------------------------------------------------------------
468    Name: Matthias Blume
469    Date: 2002/03/08 16:05:00
470    Tag: blume-20020308-entrypoints
471    Description:
472    
473    Version number bumped to 110.39.1.  NEW BOOTFILES!
474    
475    Entrypoints: non-zero offset into a code object where execution should begin.
476    
477    - Added the notion of an entrypoint to CodeObj.
478    - Added reading/writing of entrypoint info to Binfile.
479    - Made runtime system bootloader aware of entrypoints.
480    - Use the address of the label of the first function given to mlriscGen
481      as the entrypoint.  This address is currently always 0, but it will
482      not be 0 once we turn on block placement.
483    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
484      with entry points) from mlriscGen.
485    
486    ----------------------------------------------------------------------
487    Name: Allen Leung
488    Date: 2002/03/07 20:45:00 EST
489    Tag: leunga-20020307-x86-cmov
490    Description:
491    
492       Bug fixes for CMOVcc on x86.
493    
494       1. Added machine code generation for CMOVcc
495       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
496       3. CMOVcc cannot have an immediate operand as argument.
497    
498    ----------------------------------------------------------------------
499    Name: Matthias Blume
500    Date: 2002/03/07 16:15:00 EST
501    Tag: blume-20020307-controls
502    Description:
503    
504    This is a very large but mostly boring patch which makes (almost)
505    every tuneable compiler knob (i.e., pretty much everything under
506    Control.* plus a few other things) configurable via both the command
507    line and environment variables in the style CM did its configuration
508    until now.
509    
510    Try starting sml with '-h' (or, if you are brave, '-H')
511    
512    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
513    implements the underlying generic mechanism.
514    
515    The interface to some of the existing such facilities has changed somewhat.
516    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
517    (The getFoo interface is still there for backward-compatibility, but its
518    use is deprecated.)
519    
520    The ml-build script passes -Cxxx=yyy command-line arguments through so
521    that one can now twiddle the compiler settings when using this "batch"
522    compiler.
523    
524    TODO items:
525    
526    We should go through and throw out all controls that are no longer
527    connected to anything.  Moreover, we should go through and provide
528    meaningful (and correct!) documentation strings for those controls
529    that still are connected.
530    
531    Currently, multiple calls to Controls.new are accepted (only the first
532    has any effect).  Eventually we should make sure that every control
533    is being made (via Controls.new) exactly once.  Future access can then
534    be done using Controls.acc.
535    
536    Finally, it would probably be a good idea to use the getter-setter
537    interface to controls rather than ref cells.  For the time being, both
538    styles are provided by the Controls module, but getter-setter pairs are
539    better if thread-safety is of any concern because they can be wrapped.
540    
541    *****************************************
542    
543    One bug fix: The function blockPlacement in three of the MLRISC
544    backpatch files used to be hard-wired to one of two possibilities at
545    link time (according to the value of the placementFlag).  But (I
546    think) it should rather sense the flag every time.
547    
548    *****************************************
549    
550    Other assorted changes (by other people who did not supply a HISTORY entry):
551    
552    1. the cross-module inliner now works much better (Monnier)
553    2. representation of weights, frequencies, and probabilities in MLRISC
554       changed in preparation of using those for weighted block placement
555       (Reppy, George)
556    
557    ----------------------------------------------------------------------
558    Name: Lal George
559    Date: 2002/03/07 14:44:24 EST 2002
560    Tag: george-20020307-weighted-block-placement
561    
562    Tested the weighted block placement optimization on all architectures
563    (except the hppa) using AMPL to generate the block and edge frequencies.
564    Changes were required in the machine properties to correctly
565    categorize trap instructions. There is an MLRISC flag
566    "weighted-block-placement" that can be used to enable weighted block
567    placement, but this will be ineffective without block/edge
568    frequencies (coming soon).
569    
570    
571    ----------------------------------------------------------------------
572    Name: Lal George
573    Date: 2002/03/05 17:24:48 EST
574    Tag: george-20020305-linkage-cluster
575    
576    In order to support the block placement optimization, a new cluster
577    is generated as the very first cluster (called the linkage cluster).
578    It contains a single jump to the 'real' entry point for the compilation
579    unit. Block placement has no effect on the linkage cluster itself, but
580    all the other clusters  have full freedom in the manner in which they
581    reorder blocks or functions.
582    
583    On the x86 the typical linkage code that is generated is:
584       ----------------------
585            .align 2
586       L0:
587            addl    $L1-L0, 72(%esp)
588            jmp     L1
589    
590    
591            .align  2
592       L1:
593       ----------------------
594    
595    72(%esp) is the memory location for the stdlink register. This
596    must contain the address of the CPS function being called. In the
597    above example, it contains the address of  L0; before
598    calling L1 (the real entry point for the compilation unit), it
599    must contain the address for L1, and hence
600    
601            addl $L1-L0, 72(%esp)
602    
603    I have tested this on all architectures except the hppa.The increase
604    in code size is of course negligible
605    
606    ----------------------------------------------------------------------
607    Name: Allen Leung
608    Date: 2002/03/03 13:20:00 EST
609    Tag: leunga-20020303-mlrisc-tools
610    
611      Added #[ ... ] expressions to mlrisc tools
612    
613    ----------------------------------------------------------------------
614    Name: Matthias Blume
615    Date: 2002/02/27 12:29:00 EST
616    Tag: blume-20020227-cdebug
617    Description:
618    
619    - made types in structure C and C_Debug to be equal
620    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
621    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
622    
623    ----------------------------------------------------------------------
624    Name: Matthias Blume
625    Date: 2002/02/26 12:00:00 EST
626    Tag: blume-20020226-ffi
627    Description:
628    
629    1. Fixed a minor bug in CM's "noweb" tool:
630       If numbering is turned off, then truly don't number (i.e., do not
631       supply the -L option to noweb).  The previous behavior was to supply
632       -L'' -- which caused noweb to use the "default" line numbering scheme.
633       Thanks to Chris Richards for pointing this out (and supplying the fix).
634    
635    2. Once again, I reworked some aspects of the FFI:
636    
637       A. The incomplete/complete type business:
638    
639       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
640         gone!
641       - ML types representing an incomplete type are now *equal* to
642         ML types representing their corresponding complete types (just like
643         in C).  This is still safe because ml-nlffigen will not generate
644         RTTI for incomplete types, nor will it generate functions that
645         require access to such RTTI.   But when ML code generated from both
646         incomplete and complete versions of the C type meet, the ML types
647         are trivially interoperable.
648    
649         NOTE:  These changes restore the full generality of the translation
650         (which was previously lost when I eliminated functorization)!
651    
652       B. Enum types:
653    
654       - Structure C now has a type constructor "enum" that is similar to
655         how the "su" constructor works.  However, "enum" is not a phantom
656         type because each "T enum" has values (and is isomorphic to
657         MLRep.Signed.int).
658       - There are generic access operations for enum objects (using
659         MLRep.Signed.int).
660       - ml-nlffigen will generate a structure E_foo for each "enum foo".
661         * The structure contains the definition of type "mlrep" (the ML-side
662         representation type of the enum).  Normally, mlrep is the same
663         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
664         then mlrep will be defined as a datatype -- thus facilitating
665         pattern matching on mlrep values.
666         ("-ec" will be suppressed if there are duplicate values in an
667          enumeration.)
668         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
669         will be generated for each C enum constant xxx.
670         * Conversion functions m2i and i2m convert between mlrep and
671         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
672         * Coversion functions c and ml convert between mlrep and "tag enum".
673         * Access functions (get/set) fetch and store mlrep values.
674       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
675         enumerations are merged into one single enumeration represented by
676         structure E_'.
677    
678    ----------------------------------------------------------------------
679    Name: Allen Leung
680    Date: 2002/02/25 04:45:00 EST
681    Tag: leunga-20020225-cps-spill
682    
683    This is a new implementation of the CPS spill phase.
684    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
685    In case of problems, replace it with the old file spill.sml
686    
687    The current compiler runs into some serious performance problems when
688    constructing a large record.  This can happen when we try to compile a
689    structure with many items.  Even a very simple structure like the following
690    makes the compiler slow down.
691    
692        structure Foo = struct
693           val x_1 = 0w1 : Word32.int
694           val x_2 = 0w2 : Word32.int
695           val x_3 = 0w3 : Word32.int
696           ...
697           val x_N = 0wN : Word32.int
698        end
699    
700    The following table shows the compile time, from N=1000 to N=4000,
701    with the old compiler:
702    
703    N
704    1000   CPS 100 spill                           0.04u  0.00s  0.00g
705           MLRISC ra                               0.06u  0.00s  0.05g
706              (spills = 0 reloads = 0)
707           TOTAL                                   0.63u  0.07s  0.21g
708    
709    1100   CPS 100 spill                           8.25u  0.32s  0.64g
710           MLRISC ra                               5.68u  0.59s  3.93g
711              (spills = 0 reloads = 0)
712           TOTAL                                   14.71u  0.99s  4.81g
713    
714    1500   CPS 100 spill                           58.55u  2.34s  1.74g
715           MLRISC ra                               5.54u  0.65s  3.91g
716              (spills = 543 reloads = 1082)
717           TOTAL                                   65.40u  3.13s  6.00g
718    
719    2000   CPS 100 spill                           126.69u  4.84s  3.08g
720           MLRISC ra                               0.80u  0.10s  0.55g
721              (spills = 42 reloads = 84)
722           TOTAL                                   129.42u  5.10s  4.13g
723    
724    3000   CPS 100 spill                           675.59u  19.03s  11.64g
725           MLRISC ra                               2.69u  0.27s  1.38g
726              (spills = 62 reloads = 124)
727           TOTAL                                   682.48u  19.61s  13.99g
728    
729    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
730           MLRISC ra                               4.96u  0.27s  2.72g
731              (spills = 85 reloads = 170)
732           TOTAL                                   2375.26u  57.21s  48.00g
733    
734    As you can see the old cps spill module suffers from some serious
735    performance problem.  But since I cannot decipher the old code fully,
736    instead of patching the problems up, I'm reimplementing it
737    with a different algorithm.  The new code is more modular,
738    smaller when compiled, and substantially faster
739    (O(n log n) time and O(n) space).  Timing of the new spill module:
740    
741    4000  CPS 100 spill                           0.02u  0.00s  0.00g
742          MLRISC ra                               0.25u  0.02s  0.15g
743             (spills=1 reloads=3)
744          TOTAL                                   7.74u  0.34s  1.62g
745    
746    Implementation details:
747    
748    As far as I can tell, the purpose of the CPS spill module is to make sure the
749    number of live variables at any program point (the bandwidth)
750    does not exceed a certain limit, which is determined by the
751    size of the spill area.
752    
753    When the bandwidth is too large, we decrease the register pressure by
754    packing live variables into spill records.  How we achieve this is
755    completely different than what we did in the old code.
756    
757    First, there is something about the MLRiscGen code generator
758    that we should be aware of:
759    
760    o MLRiscGen performs code motion!
761    
762       In particular, it will move floating point computations and
763       address computations involving only the heap pointer to
764       their use sites (if there is only a single use).
765       What this means is that if we have a CPS record construction
766       statement
767    
768           RECORD(k,vl,w,e)
769    
770       we should never count the new record address w as live if w
771       has only one use (which is often the case).
772    
773       We should do something similar to floating point, but the transformation
774       there is much more complex, so I won't deal with that.
775    
776    Secondly, there are now two new cps primops at our disposal:
777    
778     1. rawrecord of record_kind option
779        This pure operator allocates some uninitialized storage from the heap.
780        There are two forms:
781    
782         rawrecord NONE [INT n]  allocates a tagless record of length n
783         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
784                                     and initializes the tag.
785    
786     2. rawupdate of cty
787          rawupdate cty (v,i,x)
788          Assigns to x to the ith component of record v.
789          The storelist is not updated.
790    
791    We use these new primops for both spilling and increment record construction.
792    
793     1. Spilling.
794    
795        This is implemented with a linear scan algorithm (but generalized
796        to trees).  The algorithm will create a single spill record at the
797        beginning of the cps function and use rawupdate to spill to it,
798        and SELECT or SELp to reload from it.  So both spills and reloads
799        are fine-grain operations.  In contrast, in the old algorithm
800        "spills" have to be bundled together in records.
801    
802        Ideally, we should sink the spill record construction to where
803        it is needed.  We can even split the spill record into multiple ones
804        at the places where they are needed.  But CPS is not a good
805        representation for global code motion, so I'll keep it simple and
806        am not attempting this.
807    
808     2. Incremental record construction (aka record splitting).
809    
810        Long records with many component values which are simulatenously live
811        (recall that single use record addresses are not considered to
812         be live) are constructed with rawrecord and rawupdate.
813        We allocate space on the heap with rawrecord first, then gradually
814        fill it in with rawupdate.  This is the technique suggested to me
815        by Matthias.
816    
817        Some restrictions on when this is applicable:
818        1. It is not a VECTOR record.  The code generator currently does not handle
819           this case. VECTOR record uses double indirection like arrays.
820        2. All the record component values are defined in the same "basic block"
821           as the record constructor.  This is to prevent speculative
822           record construction.
823    
824    ----------------------------------------------------------------------
825    Name: Allen Leung
826    Date: 2002/02/22 01:02:00 EST
827    Tag: leunga-20020222-mlrisc-tools
828    
829    Minor bug fixes in the parser and rewriter
830    
831    ----------------------------------------------------------------------
832    Name: Allen Leung
833    Date: 2002/02/21 20:20:00 EST
834    Tag: leunga-20020221-peephole
835    
836    Regenerated the peephole files.  Some contained typos in the specification
837    and some didn't compile because of pretty printing bugs in the old version
838    of 'nowhere'.
839    
840    ----------------------------------------------------------------------
841    Name: Allen Leung
842    Date: 2002/02/19 20:20:00 EST
843    Tag: leunga-20020219-mlrisc-tools
844    Description:
845    
846       Minor bug fixes to the mlrisc-tools library:
847    
848       1.  Fixed up parsing colon suffixed keywords
849       2.  Added the ability to shut the error messages up
850       3.  Reimplemented the pretty printer and fixed up/improved
851           the pretty printing of handle and -> types.
852       4.  Fixed up generation of literal symbols in the nowhere tool.
853       5.  Added some SML keywords to to sml.sty
854    
855    ----------------------------------------------------------------------
856    Name: Matthias Blume
857    Date: 2002/02/19 16:20:00 EST
858    Tag: blume-20020219-cmffi
859    Description:
860    
861    A wild mix of changes, some minor, some major:
862    
863    * All C FFI-related libraries are now anchored under $c:
864        $/c.cm      --> $c/c.cm
865        $/c-int.cm  --> $c/internals/c-int.cm
866        $/memory.cm --> $c/memory/memory.cm
867    
868    * "make" tool (in CM) now treats its argument pathname slightly
869      differently:
870        1. If the native expansion is an absolute name, then before invoking
871           the "make" command on it, CM will apply OS.Path.mkRelative
872           (with relativeTo = OS.FileSys.getDir()) to it.
873        2. The argument will be passed through to subsequent phases of CM
874           processing without "going native".  In particular, if the argument
875           was an anchored path, then "make" will not lose track of that anchor.
876    
877    * Compiler backends now "know" their respective C calling conventions
878      instead of having to be told about it by ml-nlffigen.  This relieves
879      ml-nlffigen from one of its burdens.
880    
881    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
882    
883    * Export C_DEBUG and C_Debug from $c/c.cm.
884    
885    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
886      subtyping relationship between incomplete pointers and their complete
887      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
888      with the convention of instantiating 'o with ('t, 'c) obj whenever
889      the pointer target type is complete.  In the incomplete case, 'o
890      will be instantiated with some "'c iobj" -- a type obtained by
891      using one of the functors PointerToIncompleteType or PointerToCompleteType.
892    
893      Operations that work on both incomplete and complete pointer types are
894      typed as taking an 'o ptr while operations that require the target to
895      be known are typed as taking some ('t, 'c) obj ptr.
896    
897      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
898      where void is an eqtype without any values.  This makes it possible
899      to work on voidptr values using functions meant to operate on light
900      incomplete pointers.
901    
902    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
903      been vastly simplified.
904    
905    ----------------------------------------------------------------------
906    Name: Matthias Blume
907    Date: 2002/02/19 10:48:00 EST
908    Tag: blume-20020219-pqfix
909    Description:
910    
911    Applied Chris Okasaki's bug fix for priority queues.
912    
913    ----------------------------------------------------------------------
914    Name: Matthias Blume
915    Date: 2002/02/15 17:05:00
916    Tag: Release_110_39
917    Description:
918    
919    Last-minute retagging is becoming a tradition... :-(
920    
921    This is the working release 110.39.
922    
923    ----------------------------------------------------------------------
924    Name: Matthias Blume
925    Date: 2002/02/15 16:00:00 EST
926    Tag: Release_110_39-orig
927    Description:
928    
929    Working release 110.39.  New bootfiles.
930    
931    (Update: There was a small bug in the installer so it wouldn't work
932    with all shells.  So I retagged. -Matthias)
933    
934    ----------------------------------------------------------------------
935    Name: Matthias Blume
936    Date: 2002/02/15 14:17:00 EST
937    Tag: blume-20020215-showbindings
938    Description:
939    
940    Added EnvRef.listBoundSymbols and CM.State.showBindings.  Especially
941    the latter can be useful for exploring what bindings are available at
942    the interactive prompt.  (The first function returns only the list
943    of symbols that are really bound, the second prints those but also the
944    ones that CM's autoloading mechanism knows about.)
945    
946    ----------------------------------------------------------------------
947    Name: Matthias Blume
948    Date: 2002/02/15 12:08:00 EST
949    Tag: blume-20020215-iptrs
950    Description:
951    
952    Two improvements to ml-nlffigen:
953    
954      1. Write files only if they do not exist or if their current contents
955         do not coincide with what's being written.  (That is, avoid messing
956         with the time stamps unless absolutely necessary.)
957    
958      2. Implement a "repository" mechanism for generated files related
959         to "incomplete pointer types".   See the README file for details.
960    
961    ----------------------------------------------------------------------
962    Name: Matthias Blume
963    Date: 2002/02/14 11:50:00 EST
964    Tag: blume-20020214-quote
965    Description:
966    
967    Added a type 't t_' to tag.sml (in ml-nlffi-lib.cm).  This is required
968    because of the new and improved tag generation scheme.  (Thanks to Allen
969    Leung for pointing it out.)
970    
971    ----------------------------------------------------------------------
972    Name: Lal George
973    Date: 2002/02/14 09:55:27 EST 2002
974    Tag: george-20020214-isabelle-bug
975    Description:
976    
977    Fixed the MLRISC bug sent by Markus Wenzel regarding the compilation
978    of Isabelle on the x86.
979    
980    From Allen:
981    -----------
982     I've found the problem:
983    
984         in ra-core.sml, I use the counter "blocked" to keep track of the
985         true number of elements in the freeze queue.  When the counter goes
986         to zero, I skip examining the queue.  But I've messed up the
987         bookkeeping in combine():
988    
989             else ();
990             case !ucol of
991               PSEUDO => (if !cntv > 0 then
992                     (if !cntu > 0 then blocked := !blocked - 1 else ();
993                                        ^^^^^^^^^^^^^^^^^^^^^^^
994                      moveu := mergeMoveList(!movev, !moveu)
995                     )
996                  else ();
997    
998         combine() is called to coalesce two nodes u and v.
999         I think I was thinking that if the move counts of u and v are both
1000         greater than zero then after they are coalesced then one node is
1001         removed from the freeze queue.  Apparently I was thinking that
1002         both u and v are of low degree, but that's clearly not necessarily true.
1003    
1004    
1005    02/12/2002:
1006        Here's the patch.  HOL now compiles.
1007    
1008        I don't know how this impact on performance (compile
1009        time or runtime).  This bug caused the RA (especially on the x86)
1010        to go thru the potential spill phase when there are still nodes on the
1011        freeze queue.
1012    
1013    
1014    
1015    
1016    ----------------------------------------------------------------------
1017    Name: Matthias Blume
1018    Date: 2002/02/13 22:40:00 EST
1019    Tag: blume-20020213-fptr-rtti
1020    Description:
1021    
1022    Fixed a bug in ml-nlffigen that was introduced with one of the previous
1023    updates.
1024    
1025    ----------------------------------------------------------------------
1026    Name: Matthias Blume
1027  Date: 2002/02/13 16:41:00 EST  Date: 2002/02/13 16:41:00 EST
1028  Tag: blume-20020213-cmlpq  Tag: blume-20020213-cmlpq
1029  Description:  Description:

Legend:
Removed from v.1063  
changed lines
  Added in v.1179

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0