Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1073, Fri Feb 15 22:07:38 2002 UTC revision 1178, Mon Mar 25 20:51:48 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Matthias Blume  Name: Matthias Blume
17    Date: 2002/03/25 16:00:00 EST
18    Tag: blume-20020325-raw-primops
19    Description:
20    
21    I did some cleanup to Allen's new primop code and
22    replaced yesterday's bootfiles with new ones.
23    (But they are stored in the same place.)
24    
25    ----------------------------------------------------------------------
26    Name: Matthias Blume
27    Date: 2002/03/24 22:40:00 EST
28    Tag: blume-20020324-bootfiles
29    Description:
30    
31    Made the bootfiles that Allen asked for.
32    
33    ----------------------------------------------------------------------
34    Name: Allen Leung
35    Date: 2002/03/23 15:50:00 EST
36    Tag: leunga-20020323-flint-cps-rcc-primops
37    Description:
38    
39      1. Changes to FLINT primops:
40    
41        (* make a call to a C-function;
42         * The primop carries C function prototype information and specifies
43         * which of its (ML-) arguments are floating point. C prototype
44         * information is for use by the backend, ML information is for
45         * use by the CPS converter. *)
46      | RAW_CCALL of { c_proto: CTypes.c_proto,
47                       ml_args: ccall_type list,
48                       ml_res_opt: ccall_type option,
49                       reentrant : bool
50                     } option
51       (* Allocate uninitialized storage on the heap.
52        * The record is meant to hold short-lived C objects, i.e., they
53        * are not ML pointers.  With the tag, the representation is
54        * the same as RECORD with tag tag_raw32 (sz=4), or tag_fblock (sz=8)
55        *)
56      | RAW_RECORD of {tag:bool,sz:int}
57      and ccall_type = CCALL_INT32 | CCALL_REAL64 | CCALL_ML_PTR
58    
59      2.  These CPS primops are now overloaded:
60    
61           rawload of {kind:numkind}
62           rawstore of {kind:numkind}
63    
64          The one argument form is:
65    
66             rawload {kind} address
67    
68          The two argument form is:
69    
70             rawload {kind} [ml object, byte-offset]
71    
72      3. RAW_CCALL/RCC now takes two extra arguments:
73    
74         a. The first is whether the C call is reentrant, i.e., whether
75            ML state should be saved and restored.
76         b. The second argument is a string argument specifying the name of
77            library and the C function.
78    
79         These things are currently not handled in the code generator, yet.
80    
81      4. In CProto,
82    
83         An encoding type of "bool" means "ml object" and is mapped into
84         C prototype of PTR.  Note that "bool" is different than "string",
85         even though "string" is also mapped into PTR, because "bool"
86         is assigned an CPS type of BOGt, while "string" is assigned INT32t.
87    
88      5. Pickler/unpicker
89    
90         Changed to handle RAW_RECORD and newest RAW_CCALL
91    
92      6. MLRiscGen,
93    
94         1. Changed to handle the new rawload/rawstore/rawrecord operators.
95         2. Code for handling C Calls has been moved to a new module CPSCCalls,
96            in the file CodeGen/cpscompile/cps-c-calls.sml
97    
98      7. Added the conditional move operator
99    
100             condmove of branch
101    
102         to cps.  Generation of this is still buggy so it is currently
103         disabled.
104    
105    ----------------------------------------------------------------------
106    Name: Lal George
107    Date: 2002/03/22 14:18:25 EST
108    Tag: george-20020322-cps-branch-prob
109    Description:
110    
111    Implemented the Ball-Larus branch prediction-heuristics, and
112    incorporated graphical viewers for control flow graphs.
113    
114    Ball-Larus Heuristics:
115    ---------------------
116    See the file compiler/CodeGen/cpscompile/cpsBranchProb.sml.
117    
118    By design it uses the Dempster-Shafer theory for combining
119    probabilities.  For example, in the function:
120    
121        fun f(n,acc) = if n = 0 then acc else f(n-1, n*acc)
122    
123    the ball-larus heuristics predicts that the n=0 is unlikely
124    (OH-heuristic), and the 'then' branch is unlikely because of the
125    RH-heuristic -- giving the 'then' branch an even lower combined
126    probability using the Dempster-Shafer theory.
127    
128    Finally, John Reppy's loop analysis in MLRISC, further lowers the
129    probability of the 'then' branch because of the loop in the else
130    branch.
131    
132    
133    Graphical Viewing:
134    ------------------
135    I merely plugged in Allen's graphical viewers into the compiler. The
136    additional code is not much. At the top level, saying:
137    
138            Control.MLRISC.getFlag "cfg-graphical-view" := true;
139    
140    will display the graphical view of the control flow graph just before
141    back-patching.  daVinci must be in your path for this to work. If
142    daVinci is not available, then the default viewer can be changed
143    using:
144    
145            Control.MLRISC.getString "viewer"
146    
147    which can be set to "dot" or "vcg" for the corresponding viewers. Of
148    course, these viewers must be in your path.
149    
150    The above will display the compilation unit at the level of clusters,
151    many of which are small, boring, and un-interesting. Also setting:
152    
153            Control.MLRISC.getInt "cfg-graphical-view_size"
154    
155    will display clusters that are larger than the value set by the above.
156    
157    
158    ----------------------------------------------------------------------
159    Name: Matthias Blume
160    Date: 2002/03/21 22:20:00 EST
161    Tag: blume-20020321-kmp-bugfix
162    Description:
163    
164    Changed the interface to the KMP routine in PreString and fixed
165    a minor bug in one place where it was used.
166    
167    ----------------------------------------------------------------------
168    Name: Allen Leung
169    Date: 2002/03/21 20:30:00 EST
170    Tag: leunga-20020321-cfg
171    Description:
172    
173      Fixed a potential problem in cfg edge splitting.
174    
175    ----------------------------------------------------------------------
176    Name: Allen Leung
177    Date: 2002/03/21 17:15:00 EST
178    Tag: leunga-20020321-x86-fp-cfg
179    Description:
180    
181      1. Recoded the buggy parts of x86-fp.
182    
183         a. All the block reordering code has been removed.
184            We now depend on the block placement phases to do this work.
185    
186         b. Critical edge splitting code has been simplified and moved into the
187            CFG modules, as where they belong.
188    
189         Both of these were quite buggy and complex.  The code is now much, much
190         simpler.
191    
192      2. X86 backend.
193    
194         a. Added instructions for 64-bit support.  Instruction selection for
195            64-bit has not been committed, however, since that
196            requires changes to MLTREE which haven't been approved by
197            Lal and John.
198    
199         b. Added support for FUCOMI and FUCOMIP when generating code for
200            PentiumPro and above.  We only generate these instructions in
201            the fast-fp mode.
202    
203         c. Added cases for JP and JNP in X86FreqProps.
204    
205      3. CFG
206    
207         CFG now has a bunch of methods for edge splitting and merging.
208    
209      4. Machine description.
210    
211         John's simplification of MLTREE_BASIS.fcond broke a few machine
212         description things:
213    
214         rtl-build.{sig,sml} and hppa.mdl fixed.
215    
216         NOTE: the machine description stuff in the repository is still broken.
217               Again, I can't put my fixes in because that involves
218               changes to MLTREE.
219    
220    ----------------------------------------------------------------------
221    Name: Matthias Blume
222    Date: 2002/03/20 15:55:00 EST
223    Tag: blume-20020320-kmp
224    Description:
225    
226    Implemented Knuth-Morris-Pratt string matching in PreString and used
227    it for String.isSubstring, Substring.isSubstring, and
228    Substring.position.
229    
230    (Might need some stress-testing.  Simple examples worked fine.)
231    
232    ----------------------------------------------------------------------
233    Name: Matthias Blume
234    Date: 2002/03/19 16:37:00 EST
235    Tag: blume-20020319-witnesses
236    Description:
237    
238    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
239    
240    This implements a generic mechanism for changing constness qualifiers
241    anywhere within big C types without resorting to outright "casts".
242    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
243    modify the constness at the outermost level.)
244    The implementation of "convert" is based on the idea of "witness"
245    values -- values that are not used by the operation but whose types
246    "testify" to their applicability.  On the implementation side, "convert"
247    is simply a projection (returning its second curried argument).  With
248    cross-module inlining, it should not result in any machine code being
249    generated.
250    
251    ----------------------------------------------------------------------
252    Name: Matthias Blume
253    Date: 2002/03/15 16:40:00 EST
254    Tag: blume-20020315-basis
255    Description:
256    
257    Provided (preliminary?) implementations for
258    
259      {String,Substring}.{concatWith,isSuffix,isSubstring}
260    
261    and
262    
263      Substring.full
264    
265    Those are in the Basis spec but they were missing in SML/NJ.
266    
267    ----------------------------------------------------------------------
268    Name: Matthias Blume
269    Date: 2002/03/14 21:30:00 EST
270    Tag: blume-20020314-controls
271    Description:
272    
273    Controls:
274    ---------
275    
276    1. Factored out the recently-added Controls : CONTROLS stuff and put
277       it into its own library $/controls-lib.cm.  The source tree for
278       this is under src/smlnj-lib/Controls.
279    
280    2. Changed the names of types and functions in this interface, so they
281       make a bit more "sense":
282    
283          module -> registry
284          'a registry -> 'a group
285    
286    3. The interface now deals in ref cells only.  The getter/setter interface
287       is (mostly) gone.
288    
289    4. Added a function that lets one register an already-existing ref cell.
290    
291    5. Made the corresponding modifications to the rest of the code so that
292       everything compiles again.
293    
294    6. Changed the implementation of Controls.MLRISC back to something closer
295       to the original.  In particular, this module (and therefore MLRISC)
296       does not depend on Controls.  There now is some link-time code in
297       int-sys.sml that registers the MLRISC controls with the Controls
298       module.
299    
300    CM:
301    ---
302    
303      * One can now specify the lambda-split aggressiveness in init.cmi.
304    
305    ----------------------------------------------------------------------
306    Name: Allen Leung
307    Date: 2002/03/13 17:30:00 EST
308    Tag: leunga-20020313-x86-fp-unary
309    Description:
310    
311    Bug fix for:
312    
313    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
314    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
315    > - fun f(x,(y,z)) = Real.~ y;
316    > [autoloading]
317    > [autoloading done]
318    >       fchsl   (%eax), 184(%esp)
319    > Error: MLRisc bug: X86MCEmitter.emitInstr
320    >
321    > uncaught exception Error
322    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
323    
324    The problem was that the code generator did not generate any fp registers
325    in this case, and the ra didn't know that it needed to run the X86FP phase to
326    translate the pseudo fp instruction.   This only happened with unary fp
327    operators in certain situations.
328    
329    ----------------------------------------------------------------------
330    Name: Matthias Blume
331    Date: 2002/03/13 14:00:00 EST
332    Tag: blume-20020313-overload-etc
333    Description:
334    
335    1. Added _overload as a synonym for overload for backward compatibility.
336       (Control.overloadKW must be true for either version to be accepted.)
337    
338    2. Fixed bug in install script that caused more things to be installed
339       than what was requested in config/targets.
340    
341    3. Made CM aware of the (_)overload construct so that autoloading
342       works.
343    
344    ----------------------------------------------------------------------
345    Name: Matthias Blume
346    Date: 2002/03/12 22:03:00 EST
347    Tag: blume-20020312-url
348    Description:
349    
350    Forgot to update BOOT and srcarchiveurl.
351    
352    ----------------------------------------------------------------------
353    Name: Matthias Blume
354    Date: 2002/03/12 17:30:00 EST
355    Tag: blume-20020312-version110392
356    Description:
357    
358    Yet another version number bump (because of small changes to the
359    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
360    
361    Changes:
362    
363      The new pid generation scheme described a few weeks ago was overly
364      complicated.  I implemented a new mechanism that is simpler and
365      provides a bit more "stability":  Once CM has seen a compilation
366      unit, it keeps its identity constant (as long as you do not delete
367      those crucial CM/GUID/* files).  This means that when you change
368      an interface, compile, then go back to the old interface, and
369      compile again, you arrive at the original pid.
370    
371      There now also is a mechanism that instructs CM to use the plain
372      environment hash as a module's pid (effectively making its GUID
373      the empty string).  For this, "noguid" must be specified as an
374      option to the .sml file in question within its .cm file.
375      This is most useful for code that is being generated by tools such
376      as ml-nlffigen (because during development programmers tend to
377      erase the tool's entire output directory tree including CM's cached
378      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
379      revert to the old, broken behavior of SML/NJ, but in specific cases
380      where there is no danger of interface confusion, its use is ok
381      (I think).
382    
383      ml-nlffigen by default generates "noguid" annotations.  They can be
384      turned off by specifying -guid in its command line.
385    
386    ----------------------------------------------------------------------
387    Name: Lal George
388    Date: 2002/03/12 12 14:42:36 EST
389    Tag: george-20020312-frequency-computation
390    Description:
391    
392    Integrated jump chaining and static block frequency into the
393    compiler. More details and numbers later.
394    
395    ----------------------------------------------------------------------
396    Name: Lal George
397    Date: 2002/03/11 11 22:38:53 EST
398    Tag: george-20020311-jump-chain-elim
399    Description:
400    
401    Tested the jump chain elimination on all architectures (except the
402    hppa).  This is on by default right now and is profitable for the
403    alpha and x86, however, it may not be profitable for the sparc and ppc
404    when compiling the compiler.
405    
406    The gc test will typically jump to a label at the end of the cluster,
407    where there is another jump to an external cluster containing the actual
408    code to invoke gc. This is to allow factoring of common gc invocation
409    sequences. That is to say, we generate:
410    
411            f:
412               testgc
413               ja   L1      % jump if above to L1
414    
415            L1:
416               jmp L2
417    
418    
419    After jump chain elimination the 'ja L1' instructions is converted to
420    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
421    up being implemented in their long form (if L2 is far away) using:
422    
423            jbe     L3      % jump if below or equal to L3
424            jmp     L2
425         L3:
426            ...
427    
428    
429    For large compilation units L2  may be far away.
430    
431    
432    ----------------------------------------------------------------------
433    Name: Matthias Blume
434    Date: 2002/03/11 13:30:00 EST
435    Tag: blume-20020311-mltreeeval
436    Description:
437    
438    A functor parameter was missing.
439    
440    ----------------------------------------------------------------------
441    Name: Allen Leung
442    Date: 2002/03/11 10:30:00 EST
443    Tag: leunga-20020311-runtime-string0
444    Description:
445    
446       The representation of the empty string now points to a
447    legal null terminated C string instead of unit.  It is now possible
448    to convert an ML string into C string with InlineT.CharVector.getData.
449    This compiles into one single machine instruction.
450    
451    ----------------------------------------------------------------------
452    Name: Allen Leung
453    Date: 2002/03/10 23:55:00 EST
454    Tag: leunga-20020310-x86-call
455    Description:
456    
457       Added machine generation for CALL instruction (relative displacement mode)
458    
459    ----------------------------------------------------------------------
460    Name: Matthias Blume
461    Date: 2002/03/08 16:05:00
462    Tag: blume-20020308-entrypoints
463    Description:
464    
465    Version number bumped to 110.39.1.  NEW BOOTFILES!
466    
467    Entrypoints: non-zero offset into a code object where execution should begin.
468    
469    - Added the notion of an entrypoint to CodeObj.
470    - Added reading/writing of entrypoint info to Binfile.
471    - Made runtime system bootloader aware of entrypoints.
472    - Use the address of the label of the first function given to mlriscGen
473      as the entrypoint.  This address is currently always 0, but it will
474      not be 0 once we turn on block placement.
475    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
476      with entry points) from mlriscGen.
477    
478    ----------------------------------------------------------------------
479    Name: Allen Leung
480    Date: 2002/03/07 20:45:00 EST
481    Tag: leunga-20020307-x86-cmov
482    Description:
483    
484       Bug fixes for CMOVcc on x86.
485    
486       1. Added machine code generation for CMOVcc
487       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
488       3. CMOVcc cannot have an immediate operand as argument.
489    
490    ----------------------------------------------------------------------
491    Name: Matthias Blume
492    Date: 2002/03/07 16:15:00 EST
493    Tag: blume-20020307-controls
494    Description:
495    
496    This is a very large but mostly boring patch which makes (almost)
497    every tuneable compiler knob (i.e., pretty much everything under
498    Control.* plus a few other things) configurable via both the command
499    line and environment variables in the style CM did its configuration
500    until now.
501    
502    Try starting sml with '-h' (or, if you are brave, '-H')
503    
504    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
505    implements the underlying generic mechanism.
506    
507    The interface to some of the existing such facilities has changed somewhat.
508    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
509    (The getFoo interface is still there for backward-compatibility, but its
510    use is deprecated.)
511    
512    The ml-build script passes -Cxxx=yyy command-line arguments through so
513    that one can now twiddle the compiler settings when using this "batch"
514    compiler.
515    
516    TODO items:
517    
518    We should go through and throw out all controls that are no longer
519    connected to anything.  Moreover, we should go through and provide
520    meaningful (and correct!) documentation strings for those controls
521    that still are connected.
522    
523    Currently, multiple calls to Controls.new are accepted (only the first
524    has any effect).  Eventually we should make sure that every control
525    is being made (via Controls.new) exactly once.  Future access can then
526    be done using Controls.acc.
527    
528    Finally, it would probably be a good idea to use the getter-setter
529    interface to controls rather than ref cells.  For the time being, both
530    styles are provided by the Controls module, but getter-setter pairs are
531    better if thread-safety is of any concern because they can be wrapped.
532    
533    *****************************************
534    
535    One bug fix: The function blockPlacement in three of the MLRISC
536    backpatch files used to be hard-wired to one of two possibilities at
537    link time (according to the value of the placementFlag).  But (I
538    think) it should rather sense the flag every time.
539    
540    *****************************************
541    
542    Other assorted changes (by other people who did not supply a HISTORY entry):
543    
544    1. the cross-module inliner now works much better (Monnier)
545    2. representation of weights, frequencies, and probabilities in MLRISC
546       changed in preparation of using those for weighted block placement
547       (Reppy, George)
548    
549    ----------------------------------------------------------------------
550    Name: Lal George
551    Date: 2002/03/07 14:44:24 EST 2002
552    Tag: george-20020307-weighted-block-placement
553    
554    Tested the weighted block placement optimization on all architectures
555    (except the hppa) using AMPL to generate the block and edge frequencies.
556    Changes were required in the machine properties to correctly
557    categorize trap instructions. There is an MLRISC flag
558    "weighted-block-placement" that can be used to enable weighted block
559    placement, but this will be ineffective without block/edge
560    frequencies (coming soon).
561    
562    
563    ----------------------------------------------------------------------
564    Name: Lal George
565    Date: 2002/03/05 17:24:48 EST
566    Tag: george-20020305-linkage-cluster
567    
568    In order to support the block placement optimization, a new cluster
569    is generated as the very first cluster (called the linkage cluster).
570    It contains a single jump to the 'real' entry point for the compilation
571    unit. Block placement has no effect on the linkage cluster itself, but
572    all the other clusters  have full freedom in the manner in which they
573    reorder blocks or functions.
574    
575    On the x86 the typical linkage code that is generated is:
576       ----------------------
577            .align 2
578       L0:
579            addl    $L1-L0, 72(%esp)
580            jmp     L1
581    
582    
583            .align  2
584       L1:
585       ----------------------
586    
587    72(%esp) is the memory location for the stdlink register. This
588    must contain the address of the CPS function being called. In the
589    above example, it contains the address of  L0; before
590    calling L1 (the real entry point for the compilation unit), it
591    must contain the address for L1, and hence
592    
593            addl $L1-L0, 72(%esp)
594    
595    I have tested this on all architectures except the hppa.The increase
596    in code size is of course negligible
597    
598    ----------------------------------------------------------------------
599    Name: Allen Leung
600    Date: 2002/03/03 13:20:00 EST
601    Tag: leunga-20020303-mlrisc-tools
602    
603      Added #[ ... ] expressions to mlrisc tools
604    
605    ----------------------------------------------------------------------
606    Name: Matthias Blume
607    Date: 2002/02/27 12:29:00 EST
608    Tag: blume-20020227-cdebug
609    Description:
610    
611    - made types in structure C and C_Debug to be equal
612    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
613    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
614    
615    ----------------------------------------------------------------------
616    Name: Matthias Blume
617    Date: 2002/02/26 12:00:00 EST
618    Tag: blume-20020226-ffi
619    Description:
620    
621    1. Fixed a minor bug in CM's "noweb" tool:
622       If numbering is turned off, then truly don't number (i.e., do not
623       supply the -L option to noweb).  The previous behavior was to supply
624       -L'' -- which caused noweb to use the "default" line numbering scheme.
625       Thanks to Chris Richards for pointing this out (and supplying the fix).
626    
627    2. Once again, I reworked some aspects of the FFI:
628    
629       A. The incomplete/complete type business:
630    
631       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
632         gone!
633       - ML types representing an incomplete type are now *equal* to
634         ML types representing their corresponding complete types (just like
635         in C).  This is still safe because ml-nlffigen will not generate
636         RTTI for incomplete types, nor will it generate functions that
637         require access to such RTTI.   But when ML code generated from both
638         incomplete and complete versions of the C type meet, the ML types
639         are trivially interoperable.
640    
641         NOTE:  These changes restore the full generality of the translation
642         (which was previously lost when I eliminated functorization)!
643    
644       B. Enum types:
645    
646       - Structure C now has a type constructor "enum" that is similar to
647         how the "su" constructor works.  However, "enum" is not a phantom
648         type because each "T enum" has values (and is isomorphic to
649         MLRep.Signed.int).
650       - There are generic access operations for enum objects (using
651         MLRep.Signed.int).
652       - ml-nlffigen will generate a structure E_foo for each "enum foo".
653         * The structure contains the definition of type "mlrep" (the ML-side
654         representation type of the enum).  Normally, mlrep is the same
655         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
656         then mlrep will be defined as a datatype -- thus facilitating
657         pattern matching on mlrep values.
658         ("-ec" will be suppressed if there are duplicate values in an
659          enumeration.)
660         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
661         will be generated for each C enum constant xxx.
662         * Conversion functions m2i and i2m convert between mlrep and
663         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
664         * Coversion functions c and ml convert between mlrep and "tag enum".
665         * Access functions (get/set) fetch and store mlrep values.
666       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
667         enumerations are merged into one single enumeration represented by
668         structure E_'.
669    
670    ----------------------------------------------------------------------
671    Name: Allen Leung
672    Date: 2002/02/25 04:45:00 EST
673    Tag: leunga-20020225-cps-spill
674    
675    This is a new implementation of the CPS spill phase.
676    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
677    In case of problems, replace it with the old file spill.sml
678    
679    The current compiler runs into some serious performance problems when
680    constructing a large record.  This can happen when we try to compile a
681    structure with many items.  Even a very simple structure like the following
682    makes the compiler slow down.
683    
684        structure Foo = struct
685           val x_1 = 0w1 : Word32.int
686           val x_2 = 0w2 : Word32.int
687           val x_3 = 0w3 : Word32.int
688           ...
689           val x_N = 0wN : Word32.int
690        end
691    
692    The following table shows the compile time, from N=1000 to N=4000,
693    with the old compiler:
694    
695    N
696    1000   CPS 100 spill                           0.04u  0.00s  0.00g
697           MLRISC ra                               0.06u  0.00s  0.05g
698              (spills = 0 reloads = 0)
699           TOTAL                                   0.63u  0.07s  0.21g
700    
701    1100   CPS 100 spill                           8.25u  0.32s  0.64g
702           MLRISC ra                               5.68u  0.59s  3.93g
703              (spills = 0 reloads = 0)
704           TOTAL                                   14.71u  0.99s  4.81g
705    
706    1500   CPS 100 spill                           58.55u  2.34s  1.74g
707           MLRISC ra                               5.54u  0.65s  3.91g
708              (spills = 543 reloads = 1082)
709           TOTAL                                   65.40u  3.13s  6.00g
710    
711    2000   CPS 100 spill                           126.69u  4.84s  3.08g
712           MLRISC ra                               0.80u  0.10s  0.55g
713              (spills = 42 reloads = 84)
714           TOTAL                                   129.42u  5.10s  4.13g
715    
716    3000   CPS 100 spill                           675.59u  19.03s  11.64g
717           MLRISC ra                               2.69u  0.27s  1.38g
718              (spills = 62 reloads = 124)
719           TOTAL                                   682.48u  19.61s  13.99g
720    
721    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
722           MLRISC ra                               4.96u  0.27s  2.72g
723              (spills = 85 reloads = 170)
724           TOTAL                                   2375.26u  57.21s  48.00g
725    
726    As you can see the old cps spill module suffers from some serious
727    performance problem.  But since I cannot decipher the old code fully,
728    instead of patching the problems up, I'm reimplementing it
729    with a different algorithm.  The new code is more modular,
730    smaller when compiled, and substantially faster
731    (O(n log n) time and O(n) space).  Timing of the new spill module:
732    
733    4000  CPS 100 spill                           0.02u  0.00s  0.00g
734          MLRISC ra                               0.25u  0.02s  0.15g
735             (spills=1 reloads=3)
736          TOTAL                                   7.74u  0.34s  1.62g
737    
738    Implementation details:
739    
740    As far as I can tell, the purpose of the CPS spill module is to make sure the
741    number of live variables at any program point (the bandwidth)
742    does not exceed a certain limit, which is determined by the
743    size of the spill area.
744    
745    When the bandwidth is too large, we decrease the register pressure by
746    packing live variables into spill records.  How we achieve this is
747    completely different than what we did in the old code.
748    
749    First, there is something about the MLRiscGen code generator
750    that we should be aware of:
751    
752    o MLRiscGen performs code motion!
753    
754       In particular, it will move floating point computations and
755       address computations involving only the heap pointer to
756       their use sites (if there is only a single use).
757       What this means is that if we have a CPS record construction
758       statement
759    
760           RECORD(k,vl,w,e)
761    
762       we should never count the new record address w as live if w
763       has only one use (which is often the case).
764    
765       We should do something similar to floating point, but the transformation
766       there is much more complex, so I won't deal with that.
767    
768    Secondly, there are now two new cps primops at our disposal:
769    
770     1. rawrecord of record_kind option
771        This pure operator allocates some uninitialized storage from the heap.
772        There are two forms:
773    
774         rawrecord NONE [INT n]  allocates a tagless record of length n
775         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
776                                     and initializes the tag.
777    
778     2. rawupdate of cty
779          rawupdate cty (v,i,x)
780          Assigns to x to the ith component of record v.
781          The storelist is not updated.
782    
783    We use these new primops for both spilling and increment record construction.
784    
785     1. Spilling.
786    
787        This is implemented with a linear scan algorithm (but generalized
788        to trees).  The algorithm will create a single spill record at the
789        beginning of the cps function and use rawupdate to spill to it,
790        and SELECT or SELp to reload from it.  So both spills and reloads
791        are fine-grain operations.  In contrast, in the old algorithm
792        "spills" have to be bundled together in records.
793    
794        Ideally, we should sink the spill record construction to where
795        it is needed.  We can even split the spill record into multiple ones
796        at the places where they are needed.  But CPS is not a good
797        representation for global code motion, so I'll keep it simple and
798        am not attempting this.
799    
800     2. Incremental record construction (aka record splitting).
801    
802        Long records with many component values which are simulatenously live
803        (recall that single use record addresses are not considered to
804         be live) are constructed with rawrecord and rawupdate.
805        We allocate space on the heap with rawrecord first, then gradually
806        fill it in with rawupdate.  This is the technique suggested to me
807        by Matthias.
808    
809        Some restrictions on when this is applicable:
810        1. It is not a VECTOR record.  The code generator currently does not handle
811           this case. VECTOR record uses double indirection like arrays.
812        2. All the record component values are defined in the same "basic block"
813           as the record constructor.  This is to prevent speculative
814           record construction.
815    
816    ----------------------------------------------------------------------
817    Name: Allen Leung
818    Date: 2002/02/22 01:02:00 EST
819    Tag: leunga-20020222-mlrisc-tools
820    
821    Minor bug fixes in the parser and rewriter
822    
823    ----------------------------------------------------------------------
824    Name: Allen Leung
825    Date: 2002/02/21 20:20:00 EST
826    Tag: leunga-20020221-peephole
827    
828    Regenerated the peephole files.  Some contained typos in the specification
829    and some didn't compile because of pretty printing bugs in the old version
830    of 'nowhere'.
831    
832    ----------------------------------------------------------------------
833    Name: Allen Leung
834    Date: 2002/02/19 20:20:00 EST
835    Tag: leunga-20020219-mlrisc-tools
836    Description:
837    
838       Minor bug fixes to the mlrisc-tools library:
839    
840       1.  Fixed up parsing colon suffixed keywords
841       2.  Added the ability to shut the error messages up
842       3.  Reimplemented the pretty printer and fixed up/improved
843           the pretty printing of handle and -> types.
844       4.  Fixed up generation of literal symbols in the nowhere tool.
845       5.  Added some SML keywords to to sml.sty
846    
847    ----------------------------------------------------------------------
848    Name: Matthias Blume
849    Date: 2002/02/19 16:20:00 EST
850    Tag: blume-20020219-cmffi
851    Description:
852    
853    A wild mix of changes, some minor, some major:
854    
855    * All C FFI-related libraries are now anchored under $c:
856        $/c.cm      --> $c/c.cm
857        $/c-int.cm  --> $c/internals/c-int.cm
858        $/memory.cm --> $c/memory/memory.cm
859    
860    * "make" tool (in CM) now treats its argument pathname slightly
861      differently:
862        1. If the native expansion is an absolute name, then before invoking
863           the "make" command on it, CM will apply OS.Path.mkRelative
864           (with relativeTo = OS.FileSys.getDir()) to it.
865        2. The argument will be passed through to subsequent phases of CM
866           processing without "going native".  In particular, if the argument
867           was an anchored path, then "make" will not lose track of that anchor.
868    
869    * Compiler backends now "know" their respective C calling conventions
870      instead of having to be told about it by ml-nlffigen.  This relieves
871      ml-nlffigen from one of its burdens.
872    
873    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
874    
875    * Export C_DEBUG and C_Debug from $c/c.cm.
876    
877    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
878      subtyping relationship between incomplete pointers and their complete
879      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
880      with the convention of instantiating 'o with ('t, 'c) obj whenever
881      the pointer target type is complete.  In the incomplete case, 'o
882      will be instantiated with some "'c iobj" -- a type obtained by
883      using one of the functors PointerToIncompleteType or PointerToCompleteType.
884    
885      Operations that work on both incomplete and complete pointer types are
886      typed as taking an 'o ptr while operations that require the target to
887      be known are typed as taking some ('t, 'c) obj ptr.
888    
889      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
890      where void is an eqtype without any values.  This makes it possible
891      to work on voidptr values using functions meant to operate on light
892      incomplete pointers.
893    
894    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
895      been vastly simplified.
896    
897    ----------------------------------------------------------------------
898    Name: Matthias Blume
899    Date: 2002/02/19 10:48:00 EST
900    Tag: blume-20020219-pqfix
901    Description:
902    
903    Applied Chris Okasaki's bug fix for priority queues.
904    
905    ----------------------------------------------------------------------
906    Name: Matthias Blume
907  Date: 2002/02/15 17:05:00  Date: 2002/02/15 17:05:00
908  Tag: Release_110_39  Tag: Release_110_39
909  Description:  Description:

Legend:
Removed from v.1073  
changed lines
  Added in v.1178

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0