Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1073, Fri Feb 15 22:07:38 2002 UTC revision 1180, Tue Mar 26 22:24:24 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Matthias Blume  Name: Matthias Blume
17    Date: 2002/03/25 17:25:00 EST
18    Tag: blume-20020325-divmod
19    Description:
20    
21    I improved (hopefully without breaking them) the implementation of Int.div,
22    Int.mod, and Int.rem.   For this, the code in translate.sml now takes
23    advantage of the following observations:
24    
25      Let  q = x quot y      r = x rem y
26           d = x div  y      m = x mod y
27    
28    where "quot" is the round-to-zero version of integer division that
29    hardware usually provides.  Then we have:
30    
31         r = x - q * y        where neither the * nor the - will overflow
32         d = if q >= 0 orelse x = q * y then q else q - 1
33                              where neither the * nor the - will overflow
34         m = if q >= 0 orelse r = 0 then r else r + y
35                              where the + will not overflow
36    
37    This results in substantial simplification of the generated code.
38    The following table shows the number of CFG nodes and edges generated
39    for
40            fun f (x, y) = x OPER y
41            (* with OPER \in div, mod, quot, rem *)
42    
43    
44        OPER | nodes(old) | edges(old) | nodes(new) | edges(new)
45        --------------------------------------------------------
46         div |         24 |         39 |         12 |         16
47         mod |         41 |         71 |         12 |         16
48        quot |          8 |         10 |          8 |         10
49         rem |         10 |         14 |          8 |         10
50    
51    
52    ----------------------------------------------------------------------
53    Name: Matthias Blume
54    Date: 2002/03/25 22:06:00 EST
55    Tag: blume-20020325-cprotobug
56    Description:
57    
58    Fixed a bug in cproto (c prototype decoder).
59    
60    ----------------------------------------------------------------------
61    Name: Matthias Blume
62    Date: 2002/03/25 16:00:00 EST
63    Tag: blume-20020325-raw-primops
64    Description:
65    
66    I did some cleanup to Allen's new primop code and
67    replaced yesterday's bootfiles with new ones.
68    (But they are stored in the same place.)
69    
70    ----------------------------------------------------------------------
71    Name: Matthias Blume
72    Date: 2002/03/24 22:40:00 EST
73    Tag: blume-20020324-bootfiles
74    Description:
75    
76    Made the bootfiles that Allen asked for.
77    
78    ----------------------------------------------------------------------
79    Name: Allen Leung
80    Date: 2002/03/23 15:50:00 EST
81    Tag: leunga-20020323-flint-cps-rcc-primops
82    Description:
83    
84      1. Changes to FLINT primops:
85    
86        (* make a call to a C-function;
87         * The primop carries C function prototype information and specifies
88         * which of its (ML-) arguments are floating point. C prototype
89         * information is for use by the backend, ML information is for
90         * use by the CPS converter. *)
91      | RAW_CCALL of { c_proto: CTypes.c_proto,
92                       ml_args: ccall_type list,
93                       ml_res_opt: ccall_type option,
94                       reentrant : bool
95                     } option
96       (* Allocate uninitialized storage on the heap.
97        * The record is meant to hold short-lived C objects, i.e., they
98        * are not ML pointers.  With the tag, the representation is
99        * the same as RECORD with tag tag_raw32 (sz=4), or tag_fblock (sz=8)
100        *)
101      | RAW_RECORD of {tag:bool,sz:int}
102      and ccall_type = CCALL_INT32 | CCALL_REAL64 | CCALL_ML_PTR
103    
104      2.  These CPS primops are now overloaded:
105    
106           rawload of {kind:numkind}
107           rawstore of {kind:numkind}
108    
109          The one argument form is:
110    
111             rawload {kind} address
112    
113          The two argument form is:
114    
115             rawload {kind} [ml object, byte-offset]
116    
117      3. RAW_CCALL/RCC now takes two extra arguments:
118    
119         a. The first is whether the C call is reentrant, i.e., whether
120            ML state should be saved and restored.
121         b. The second argument is a string argument specifying the name of
122            library and the C function.
123    
124         These things are currently not handled in the code generator, yet.
125    
126      4. In CProto,
127    
128         An encoding type of "bool" means "ml object" and is mapped into
129         C prototype of PTR.  Note that "bool" is different than "string",
130         even though "string" is also mapped into PTR, because "bool"
131         is assigned an CPS type of BOGt, while "string" is assigned INT32t.
132    
133      5. Pickler/unpicker
134    
135         Changed to handle RAW_RECORD and newest RAW_CCALL
136    
137      6. MLRiscGen,
138    
139         1. Changed to handle the new rawload/rawstore/rawrecord operators.
140         2. Code for handling C Calls has been moved to a new module CPSCCalls,
141            in the file CodeGen/cpscompile/cps-c-calls.sml
142    
143      7. Added the conditional move operator
144    
145             condmove of branch
146    
147         to cps.  Generation of this is still buggy so it is currently
148         disabled.
149    
150    ----------------------------------------------------------------------
151    Name: Lal George
152    Date: 2002/03/22 14:18:25 EST
153    Tag: george-20020322-cps-branch-prob
154    Description:
155    
156    Implemented the Ball-Larus branch prediction-heuristics, and
157    incorporated graphical viewers for control flow graphs.
158    
159    Ball-Larus Heuristics:
160    ---------------------
161    See the file compiler/CodeGen/cpscompile/cpsBranchProb.sml.
162    
163    By design it uses the Dempster-Shafer theory for combining
164    probabilities.  For example, in the function:
165    
166        fun f(n,acc) = if n = 0 then acc else f(n-1, n*acc)
167    
168    the ball-larus heuristics predicts that the n=0 is unlikely
169    (OH-heuristic), and the 'then' branch is unlikely because of the
170    RH-heuristic -- giving the 'then' branch an even lower combined
171    probability using the Dempster-Shafer theory.
172    
173    Finally, John Reppy's loop analysis in MLRISC, further lowers the
174    probability of the 'then' branch because of the loop in the else
175    branch.
176    
177    
178    Graphical Viewing:
179    ------------------
180    I merely plugged in Allen's graphical viewers into the compiler. The
181    additional code is not much. At the top level, saying:
182    
183            Control.MLRISC.getFlag "cfg-graphical-view" := true;
184    
185    will display the graphical view of the control flow graph just before
186    back-patching.  daVinci must be in your path for this to work. If
187    daVinci is not available, then the default viewer can be changed
188    using:
189    
190            Control.MLRISC.getString "viewer"
191    
192    which can be set to "dot" or "vcg" for the corresponding viewers. Of
193    course, these viewers must be in your path.
194    
195    The above will display the compilation unit at the level of clusters,
196    many of which are small, boring, and un-interesting. Also setting:
197    
198            Control.MLRISC.getInt "cfg-graphical-view_size"
199    
200    will display clusters that are larger than the value set by the above.
201    
202    
203    ----------------------------------------------------------------------
204    Name: Matthias Blume
205    Date: 2002/03/21 22:20:00 EST
206    Tag: blume-20020321-kmp-bugfix
207    Description:
208    
209    Changed the interface to the KMP routine in PreString and fixed
210    a minor bug in one place where it was used.
211    
212    ----------------------------------------------------------------------
213    Name: Allen Leung
214    Date: 2002/03/21 20:30:00 EST
215    Tag: leunga-20020321-cfg
216    Description:
217    
218      Fixed a potential problem in cfg edge splitting.
219    
220    ----------------------------------------------------------------------
221    Name: Allen Leung
222    Date: 2002/03/21 17:15:00 EST
223    Tag: leunga-20020321-x86-fp-cfg
224    Description:
225    
226      1. Recoded the buggy parts of x86-fp.
227    
228         a. All the block reordering code has been removed.
229            We now depend on the block placement phases to do this work.
230    
231         b. Critical edge splitting code has been simplified and moved into the
232            CFG modules, as where they belong.
233    
234         Both of these were quite buggy and complex.  The code is now much, much
235         simpler.
236    
237      2. X86 backend.
238    
239         a. Added instructions for 64-bit support.  Instruction selection for
240            64-bit has not been committed, however, since that
241            requires changes to MLTREE which haven't been approved by
242            Lal and John.
243    
244         b. Added support for FUCOMI and FUCOMIP when generating code for
245            PentiumPro and above.  We only generate these instructions in
246            the fast-fp mode.
247    
248         c. Added cases for JP and JNP in X86FreqProps.
249    
250      3. CFG
251    
252         CFG now has a bunch of methods for edge splitting and merging.
253    
254      4. Machine description.
255    
256         John's simplification of MLTREE_BASIS.fcond broke a few machine
257         description things:
258    
259         rtl-build.{sig,sml} and hppa.mdl fixed.
260    
261         NOTE: the machine description stuff in the repository is still broken.
262               Again, I can't put my fixes in because that involves
263               changes to MLTREE.
264    
265    ----------------------------------------------------------------------
266    Name: Matthias Blume
267    Date: 2002/03/20 15:55:00 EST
268    Tag: blume-20020320-kmp
269    Description:
270    
271    Implemented Knuth-Morris-Pratt string matching in PreString and used
272    it for String.isSubstring, Substring.isSubstring, and
273    Substring.position.
274    
275    (Might need some stress-testing.  Simple examples worked fine.)
276    
277    ----------------------------------------------------------------------
278    Name: Matthias Blume
279    Date: 2002/03/19 16:37:00 EST
280    Tag: blume-20020319-witnesses
281    Description:
282    
283    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
284    
285    This implements a generic mechanism for changing constness qualifiers
286    anywhere within big C types without resorting to outright "casts".
287    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
288    modify the constness at the outermost level.)
289    The implementation of "convert" is based on the idea of "witness"
290    values -- values that are not used by the operation but whose types
291    "testify" to their applicability.  On the implementation side, "convert"
292    is simply a projection (returning its second curried argument).  With
293    cross-module inlining, it should not result in any machine code being
294    generated.
295    
296    ----------------------------------------------------------------------
297    Name: Matthias Blume
298    Date: 2002/03/15 16:40:00 EST
299    Tag: blume-20020315-basis
300    Description:
301    
302    Provided (preliminary?) implementations for
303    
304      {String,Substring}.{concatWith,isSuffix,isSubstring}
305    
306    and
307    
308      Substring.full
309    
310    Those are in the Basis spec but they were missing in SML/NJ.
311    
312    ----------------------------------------------------------------------
313    Name: Matthias Blume
314    Date: 2002/03/14 21:30:00 EST
315    Tag: blume-20020314-controls
316    Description:
317    
318    Controls:
319    ---------
320    
321    1. Factored out the recently-added Controls : CONTROLS stuff and put
322       it into its own library $/controls-lib.cm.  The source tree for
323       this is under src/smlnj-lib/Controls.
324    
325    2. Changed the names of types and functions in this interface, so they
326       make a bit more "sense":
327    
328          module -> registry
329          'a registry -> 'a group
330    
331    3. The interface now deals in ref cells only.  The getter/setter interface
332       is (mostly) gone.
333    
334    4. Added a function that lets one register an already-existing ref cell.
335    
336    5. Made the corresponding modifications to the rest of the code so that
337       everything compiles again.
338    
339    6. Changed the implementation of Controls.MLRISC back to something closer
340       to the original.  In particular, this module (and therefore MLRISC)
341       does not depend on Controls.  There now is some link-time code in
342       int-sys.sml that registers the MLRISC controls with the Controls
343       module.
344    
345    CM:
346    ---
347    
348      * One can now specify the lambda-split aggressiveness in init.cmi.
349    
350    ----------------------------------------------------------------------
351    Name: Allen Leung
352    Date: 2002/03/13 17:30:00 EST
353    Tag: leunga-20020313-x86-fp-unary
354    Description:
355    
356    Bug fix for:
357    
358    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
359    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
360    > - fun f(x,(y,z)) = Real.~ y;
361    > [autoloading]
362    > [autoloading done]
363    >       fchsl   (%eax), 184(%esp)
364    > Error: MLRisc bug: X86MCEmitter.emitInstr
365    >
366    > uncaught exception Error
367    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
368    
369    The problem was that the code generator did not generate any fp registers
370    in this case, and the ra didn't know that it needed to run the X86FP phase to
371    translate the pseudo fp instruction.   This only happened with unary fp
372    operators in certain situations.
373    
374    ----------------------------------------------------------------------
375    Name: Matthias Blume
376    Date: 2002/03/13 14:00:00 EST
377    Tag: blume-20020313-overload-etc
378    Description:
379    
380    1. Added _overload as a synonym for overload for backward compatibility.
381       (Control.overloadKW must be true for either version to be accepted.)
382    
383    2. Fixed bug in install script that caused more things to be installed
384       than what was requested in config/targets.
385    
386    3. Made CM aware of the (_)overload construct so that autoloading
387       works.
388    
389    ----------------------------------------------------------------------
390    Name: Matthias Blume
391    Date: 2002/03/12 22:03:00 EST
392    Tag: blume-20020312-url
393    Description:
394    
395    Forgot to update BOOT and srcarchiveurl.
396    
397    ----------------------------------------------------------------------
398    Name: Matthias Blume
399    Date: 2002/03/12 17:30:00 EST
400    Tag: blume-20020312-version110392
401    Description:
402    
403    Yet another version number bump (because of small changes to the
404    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
405    
406    Changes:
407    
408      The new pid generation scheme described a few weeks ago was overly
409      complicated.  I implemented a new mechanism that is simpler and
410      provides a bit more "stability":  Once CM has seen a compilation
411      unit, it keeps its identity constant (as long as you do not delete
412      those crucial CM/GUID/* files).  This means that when you change
413      an interface, compile, then go back to the old interface, and
414      compile again, you arrive at the original pid.
415    
416      There now also is a mechanism that instructs CM to use the plain
417      environment hash as a module's pid (effectively making its GUID
418      the empty string).  For this, "noguid" must be specified as an
419      option to the .sml file in question within its .cm file.
420      This is most useful for code that is being generated by tools such
421      as ml-nlffigen (because during development programmers tend to
422      erase the tool's entire output directory tree including CM's cached
423      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
424      revert to the old, broken behavior of SML/NJ, but in specific cases
425      where there is no danger of interface confusion, its use is ok
426      (I think).
427    
428      ml-nlffigen by default generates "noguid" annotations.  They can be
429      turned off by specifying -guid in its command line.
430    
431    ----------------------------------------------------------------------
432    Name: Lal George
433    Date: 2002/03/12 12 14:42:36 EST
434    Tag: george-20020312-frequency-computation
435    Description:
436    
437    Integrated jump chaining and static block frequency into the
438    compiler. More details and numbers later.
439    
440    ----------------------------------------------------------------------
441    Name: Lal George
442    Date: 2002/03/11 11 22:38:53 EST
443    Tag: george-20020311-jump-chain-elim
444    Description:
445    
446    Tested the jump chain elimination on all architectures (except the
447    hppa).  This is on by default right now and is profitable for the
448    alpha and x86, however, it may not be profitable for the sparc and ppc
449    when compiling the compiler.
450    
451    The gc test will typically jump to a label at the end of the cluster,
452    where there is another jump to an external cluster containing the actual
453    code to invoke gc. This is to allow factoring of common gc invocation
454    sequences. That is to say, we generate:
455    
456            f:
457               testgc
458               ja   L1      % jump if above to L1
459    
460            L1:
461               jmp L2
462    
463    
464    After jump chain elimination the 'ja L1' instructions is converted to
465    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
466    up being implemented in their long form (if L2 is far away) using:
467    
468            jbe     L3      % jump if below or equal to L3
469            jmp     L2
470         L3:
471            ...
472    
473    
474    For large compilation units L2  may be far away.
475    
476    
477    ----------------------------------------------------------------------
478    Name: Matthias Blume
479    Date: 2002/03/11 13:30:00 EST
480    Tag: blume-20020311-mltreeeval
481    Description:
482    
483    A functor parameter was missing.
484    
485    ----------------------------------------------------------------------
486    Name: Allen Leung
487    Date: 2002/03/11 10:30:00 EST
488    Tag: leunga-20020311-runtime-string0
489    Description:
490    
491       The representation of the empty string now points to a
492    legal null terminated C string instead of unit.  It is now possible
493    to convert an ML string into C string with InlineT.CharVector.getData.
494    This compiles into one single machine instruction.
495    
496    ----------------------------------------------------------------------
497    Name: Allen Leung
498    Date: 2002/03/10 23:55:00 EST
499    Tag: leunga-20020310-x86-call
500    Description:
501    
502       Added machine generation for CALL instruction (relative displacement mode)
503    
504    ----------------------------------------------------------------------
505    Name: Matthias Blume
506    Date: 2002/03/08 16:05:00
507    Tag: blume-20020308-entrypoints
508    Description:
509    
510    Version number bumped to 110.39.1.  NEW BOOTFILES!
511    
512    Entrypoints: non-zero offset into a code object where execution should begin.
513    
514    - Added the notion of an entrypoint to CodeObj.
515    - Added reading/writing of entrypoint info to Binfile.
516    - Made runtime system bootloader aware of entrypoints.
517    - Use the address of the label of the first function given to mlriscGen
518      as the entrypoint.  This address is currently always 0, but it will
519      not be 0 once we turn on block placement.
520    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
521      with entry points) from mlriscGen.
522    
523    ----------------------------------------------------------------------
524    Name: Allen Leung
525    Date: 2002/03/07 20:45:00 EST
526    Tag: leunga-20020307-x86-cmov
527    Description:
528    
529       Bug fixes for CMOVcc on x86.
530    
531       1. Added machine code generation for CMOVcc
532       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
533       3. CMOVcc cannot have an immediate operand as argument.
534    
535    ----------------------------------------------------------------------
536    Name: Matthias Blume
537    Date: 2002/03/07 16:15:00 EST
538    Tag: blume-20020307-controls
539    Description:
540    
541    This is a very large but mostly boring patch which makes (almost)
542    every tuneable compiler knob (i.e., pretty much everything under
543    Control.* plus a few other things) configurable via both the command
544    line and environment variables in the style CM did its configuration
545    until now.
546    
547    Try starting sml with '-h' (or, if you are brave, '-H')
548    
549    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
550    implements the underlying generic mechanism.
551    
552    The interface to some of the existing such facilities has changed somewhat.
553    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
554    (The getFoo interface is still there for backward-compatibility, but its
555    use is deprecated.)
556    
557    The ml-build script passes -Cxxx=yyy command-line arguments through so
558    that one can now twiddle the compiler settings when using this "batch"
559    compiler.
560    
561    TODO items:
562    
563    We should go through and throw out all controls that are no longer
564    connected to anything.  Moreover, we should go through and provide
565    meaningful (and correct!) documentation strings for those controls
566    that still are connected.
567    
568    Currently, multiple calls to Controls.new are accepted (only the first
569    has any effect).  Eventually we should make sure that every control
570    is being made (via Controls.new) exactly once.  Future access can then
571    be done using Controls.acc.
572    
573    Finally, it would probably be a good idea to use the getter-setter
574    interface to controls rather than ref cells.  For the time being, both
575    styles are provided by the Controls module, but getter-setter pairs are
576    better if thread-safety is of any concern because they can be wrapped.
577    
578    *****************************************
579    
580    One bug fix: The function blockPlacement in three of the MLRISC
581    backpatch files used to be hard-wired to one of two possibilities at
582    link time (according to the value of the placementFlag).  But (I
583    think) it should rather sense the flag every time.
584    
585    *****************************************
586    
587    Other assorted changes (by other people who did not supply a HISTORY entry):
588    
589    1. the cross-module inliner now works much better (Monnier)
590    2. representation of weights, frequencies, and probabilities in MLRISC
591       changed in preparation of using those for weighted block placement
592       (Reppy, George)
593    
594    ----------------------------------------------------------------------
595    Name: Lal George
596    Date: 2002/03/07 14:44:24 EST 2002
597    Tag: george-20020307-weighted-block-placement
598    
599    Tested the weighted block placement optimization on all architectures
600    (except the hppa) using AMPL to generate the block and edge frequencies.
601    Changes were required in the machine properties to correctly
602    categorize trap instructions. There is an MLRISC flag
603    "weighted-block-placement" that can be used to enable weighted block
604    placement, but this will be ineffective without block/edge
605    frequencies (coming soon).
606    
607    
608    ----------------------------------------------------------------------
609    Name: Lal George
610    Date: 2002/03/05 17:24:48 EST
611    Tag: george-20020305-linkage-cluster
612    
613    In order to support the block placement optimization, a new cluster
614    is generated as the very first cluster (called the linkage cluster).
615    It contains a single jump to the 'real' entry point for the compilation
616    unit. Block placement has no effect on the linkage cluster itself, but
617    all the other clusters  have full freedom in the manner in which they
618    reorder blocks or functions.
619    
620    On the x86 the typical linkage code that is generated is:
621       ----------------------
622            .align 2
623       L0:
624            addl    $L1-L0, 72(%esp)
625            jmp     L1
626    
627    
628            .align  2
629       L1:
630       ----------------------
631    
632    72(%esp) is the memory location for the stdlink register. This
633    must contain the address of the CPS function being called. In the
634    above example, it contains the address of  L0; before
635    calling L1 (the real entry point for the compilation unit), it
636    must contain the address for L1, and hence
637    
638            addl $L1-L0, 72(%esp)
639    
640    I have tested this on all architectures except the hppa.The increase
641    in code size is of course negligible
642    
643    ----------------------------------------------------------------------
644    Name: Allen Leung
645    Date: 2002/03/03 13:20:00 EST
646    Tag: leunga-20020303-mlrisc-tools
647    
648      Added #[ ... ] expressions to mlrisc tools
649    
650    ----------------------------------------------------------------------
651    Name: Matthias Blume
652    Date: 2002/02/27 12:29:00 EST
653    Tag: blume-20020227-cdebug
654    Description:
655    
656    - made types in structure C and C_Debug to be equal
657    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
658    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
659    
660    ----------------------------------------------------------------------
661    Name: Matthias Blume
662    Date: 2002/02/26 12:00:00 EST
663    Tag: blume-20020226-ffi
664    Description:
665    
666    1. Fixed a minor bug in CM's "noweb" tool:
667       If numbering is turned off, then truly don't number (i.e., do not
668       supply the -L option to noweb).  The previous behavior was to supply
669       -L'' -- which caused noweb to use the "default" line numbering scheme.
670       Thanks to Chris Richards for pointing this out (and supplying the fix).
671    
672    2. Once again, I reworked some aspects of the FFI:
673    
674       A. The incomplete/complete type business:
675    
676       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
677         gone!
678       - ML types representing an incomplete type are now *equal* to
679         ML types representing their corresponding complete types (just like
680         in C).  This is still safe because ml-nlffigen will not generate
681         RTTI for incomplete types, nor will it generate functions that
682         require access to such RTTI.   But when ML code generated from both
683         incomplete and complete versions of the C type meet, the ML types
684         are trivially interoperable.
685    
686         NOTE:  These changes restore the full generality of the translation
687         (which was previously lost when I eliminated functorization)!
688    
689       B. Enum types:
690    
691       - Structure C now has a type constructor "enum" that is similar to
692         how the "su" constructor works.  However, "enum" is not a phantom
693         type because each "T enum" has values (and is isomorphic to
694         MLRep.Signed.int).
695       - There are generic access operations for enum objects (using
696         MLRep.Signed.int).
697       - ml-nlffigen will generate a structure E_foo for each "enum foo".
698         * The structure contains the definition of type "mlrep" (the ML-side
699         representation type of the enum).  Normally, mlrep is the same
700         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
701         then mlrep will be defined as a datatype -- thus facilitating
702         pattern matching on mlrep values.
703         ("-ec" will be suppressed if there are duplicate values in an
704          enumeration.)
705         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
706         will be generated for each C enum constant xxx.
707         * Conversion functions m2i and i2m convert between mlrep and
708         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
709         * Coversion functions c and ml convert between mlrep and "tag enum".
710         * Access functions (get/set) fetch and store mlrep values.
711       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
712         enumerations are merged into one single enumeration represented by
713         structure E_'.
714    
715    ----------------------------------------------------------------------
716    Name: Allen Leung
717    Date: 2002/02/25 04:45:00 EST
718    Tag: leunga-20020225-cps-spill
719    
720    This is a new implementation of the CPS spill phase.
721    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
722    In case of problems, replace it with the old file spill.sml
723    
724    The current compiler runs into some serious performance problems when
725    constructing a large record.  This can happen when we try to compile a
726    structure with many items.  Even a very simple structure like the following
727    makes the compiler slow down.
728    
729        structure Foo = struct
730           val x_1 = 0w1 : Word32.int
731           val x_2 = 0w2 : Word32.int
732           val x_3 = 0w3 : Word32.int
733           ...
734           val x_N = 0wN : Word32.int
735        end
736    
737    The following table shows the compile time, from N=1000 to N=4000,
738    with the old compiler:
739    
740    N
741    1000   CPS 100 spill                           0.04u  0.00s  0.00g
742           MLRISC ra                               0.06u  0.00s  0.05g
743              (spills = 0 reloads = 0)
744           TOTAL                                   0.63u  0.07s  0.21g
745    
746    1100   CPS 100 spill                           8.25u  0.32s  0.64g
747           MLRISC ra                               5.68u  0.59s  3.93g
748              (spills = 0 reloads = 0)
749           TOTAL                                   14.71u  0.99s  4.81g
750    
751    1500   CPS 100 spill                           58.55u  2.34s  1.74g
752           MLRISC ra                               5.54u  0.65s  3.91g
753              (spills = 543 reloads = 1082)
754           TOTAL                                   65.40u  3.13s  6.00g
755    
756    2000   CPS 100 spill                           126.69u  4.84s  3.08g
757           MLRISC ra                               0.80u  0.10s  0.55g
758              (spills = 42 reloads = 84)
759           TOTAL                                   129.42u  5.10s  4.13g
760    
761    3000   CPS 100 spill                           675.59u  19.03s  11.64g
762           MLRISC ra                               2.69u  0.27s  1.38g
763              (spills = 62 reloads = 124)
764           TOTAL                                   682.48u  19.61s  13.99g
765    
766    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
767           MLRISC ra                               4.96u  0.27s  2.72g
768              (spills = 85 reloads = 170)
769           TOTAL                                   2375.26u  57.21s  48.00g
770    
771    As you can see the old cps spill module suffers from some serious
772    performance problem.  But since I cannot decipher the old code fully,
773    instead of patching the problems up, I'm reimplementing it
774    with a different algorithm.  The new code is more modular,
775    smaller when compiled, and substantially faster
776    (O(n log n) time and O(n) space).  Timing of the new spill module:
777    
778    4000  CPS 100 spill                           0.02u  0.00s  0.00g
779          MLRISC ra                               0.25u  0.02s  0.15g
780             (spills=1 reloads=3)
781          TOTAL                                   7.74u  0.34s  1.62g
782    
783    Implementation details:
784    
785    As far as I can tell, the purpose of the CPS spill module is to make sure the
786    number of live variables at any program point (the bandwidth)
787    does not exceed a certain limit, which is determined by the
788    size of the spill area.
789    
790    When the bandwidth is too large, we decrease the register pressure by
791    packing live variables into spill records.  How we achieve this is
792    completely different than what we did in the old code.
793    
794    First, there is something about the MLRiscGen code generator
795    that we should be aware of:
796    
797    o MLRiscGen performs code motion!
798    
799       In particular, it will move floating point computations and
800       address computations involving only the heap pointer to
801       their use sites (if there is only a single use).
802       What this means is that if we have a CPS record construction
803       statement
804    
805           RECORD(k,vl,w,e)
806    
807       we should never count the new record address w as live if w
808       has only one use (which is often the case).
809    
810       We should do something similar to floating point, but the transformation
811       there is much more complex, so I won't deal with that.
812    
813    Secondly, there are now two new cps primops at our disposal:
814    
815     1. rawrecord of record_kind option
816        This pure operator allocates some uninitialized storage from the heap.
817        There are two forms:
818    
819         rawrecord NONE [INT n]  allocates a tagless record of length n
820         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
821                                     and initializes the tag.
822    
823     2. rawupdate of cty
824          rawupdate cty (v,i,x)
825          Assigns to x to the ith component of record v.
826          The storelist is not updated.
827    
828    We use these new primops for both spilling and increment record construction.
829    
830     1. Spilling.
831    
832        This is implemented with a linear scan algorithm (but generalized
833        to trees).  The algorithm will create a single spill record at the
834        beginning of the cps function and use rawupdate to spill to it,
835        and SELECT or SELp to reload from it.  So both spills and reloads
836        are fine-grain operations.  In contrast, in the old algorithm
837        "spills" have to be bundled together in records.
838    
839        Ideally, we should sink the spill record construction to where
840        it is needed.  We can even split the spill record into multiple ones
841        at the places where they are needed.  But CPS is not a good
842        representation for global code motion, so I'll keep it simple and
843        am not attempting this.
844    
845     2. Incremental record construction (aka record splitting).
846    
847        Long records with many component values which are simulatenously live
848        (recall that single use record addresses are not considered to
849         be live) are constructed with rawrecord and rawupdate.
850        We allocate space on the heap with rawrecord first, then gradually
851        fill it in with rawupdate.  This is the technique suggested to me
852        by Matthias.
853    
854        Some restrictions on when this is applicable:
855        1. It is not a VECTOR record.  The code generator currently does not handle
856           this case. VECTOR record uses double indirection like arrays.
857        2. All the record component values are defined in the same "basic block"
858           as the record constructor.  This is to prevent speculative
859           record construction.
860    
861    ----------------------------------------------------------------------
862    Name: Allen Leung
863    Date: 2002/02/22 01:02:00 EST
864    Tag: leunga-20020222-mlrisc-tools
865    
866    Minor bug fixes in the parser and rewriter
867    
868    ----------------------------------------------------------------------
869    Name: Allen Leung
870    Date: 2002/02/21 20:20:00 EST
871    Tag: leunga-20020221-peephole
872    
873    Regenerated the peephole files.  Some contained typos in the specification
874    and some didn't compile because of pretty printing bugs in the old version
875    of 'nowhere'.
876    
877    ----------------------------------------------------------------------
878    Name: Allen Leung
879    Date: 2002/02/19 20:20:00 EST
880    Tag: leunga-20020219-mlrisc-tools
881    Description:
882    
883       Minor bug fixes to the mlrisc-tools library:
884    
885       1.  Fixed up parsing colon suffixed keywords
886       2.  Added the ability to shut the error messages up
887       3.  Reimplemented the pretty printer and fixed up/improved
888           the pretty printing of handle and -> types.
889       4.  Fixed up generation of literal symbols in the nowhere tool.
890       5.  Added some SML keywords to to sml.sty
891    
892    ----------------------------------------------------------------------
893    Name: Matthias Blume
894    Date: 2002/02/19 16:20:00 EST
895    Tag: blume-20020219-cmffi
896    Description:
897    
898    A wild mix of changes, some minor, some major:
899    
900    * All C FFI-related libraries are now anchored under $c:
901        $/c.cm      --> $c/c.cm
902        $/c-int.cm  --> $c/internals/c-int.cm
903        $/memory.cm --> $c/memory/memory.cm
904    
905    * "make" tool (in CM) now treats its argument pathname slightly
906      differently:
907        1. If the native expansion is an absolute name, then before invoking
908           the "make" command on it, CM will apply OS.Path.mkRelative
909           (with relativeTo = OS.FileSys.getDir()) to it.
910        2. The argument will be passed through to subsequent phases of CM
911           processing without "going native".  In particular, if the argument
912           was an anchored path, then "make" will not lose track of that anchor.
913    
914    * Compiler backends now "know" their respective C calling conventions
915      instead of having to be told about it by ml-nlffigen.  This relieves
916      ml-nlffigen from one of its burdens.
917    
918    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
919    
920    * Export C_DEBUG and C_Debug from $c/c.cm.
921    
922    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
923      subtyping relationship between incomplete pointers and their complete
924      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
925      with the convention of instantiating 'o with ('t, 'c) obj whenever
926      the pointer target type is complete.  In the incomplete case, 'o
927      will be instantiated with some "'c iobj" -- a type obtained by
928      using one of the functors PointerToIncompleteType or PointerToCompleteType.
929    
930      Operations that work on both incomplete and complete pointer types are
931      typed as taking an 'o ptr while operations that require the target to
932      be known are typed as taking some ('t, 'c) obj ptr.
933    
934      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
935      where void is an eqtype without any values.  This makes it possible
936      to work on voidptr values using functions meant to operate on light
937      incomplete pointers.
938    
939    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
940      been vastly simplified.
941    
942    ----------------------------------------------------------------------
943    Name: Matthias Blume
944    Date: 2002/02/19 10:48:00 EST
945    Tag: blume-20020219-pqfix
946    Description:
947    
948    Applied Chris Okasaki's bug fix for priority queues.
949    
950    ----------------------------------------------------------------------
951    Name: Matthias Blume
952  Date: 2002/02/15 17:05:00  Date: 2002/02/15 17:05:00
953  Tag: Release_110_39  Tag: Release_110_39
954  Description:  Description:

Legend:
Removed from v.1073  
changed lines
  Added in v.1180

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0