Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1085, Fri Feb 22 00:15:55 2002 UTC revision 1185, Mon Apr 1 22:06:47 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Matthias Blume
17    Date: 2002/04/01 (no joke!) 17:07:00 EST
18    Tag: blume-20020401-x86div
19    Description:
20    
21    Added full support for div/mod/rem/quot on the x86, using the machine
22    instruction's two results (without clumsily recomputing the remainder)
23    directly where appropriate.
24    
25    Some more extensive power-of-two support was added to the x86 instruction
26    selector (avoiding expensive divs, mods, and muls where they can be
27    replaced with cheaper shifts and masks).  However, this sort of thing
28    ought to be done earlier, e.g., within the CPS optimizer so that
29    all architectures benefit from it.
30    
31    The compiler compiles to a fixed point, but changes might be somewhat
32    fragile nevertheless.  Please, report any strange things that you might
33    see wrt. div/mod/quot/rem...
34    
35    ----------------------------------------------------------------------
36    Name: Matthias Blume
37    Date: 2002/03/29 17:22:00
38    Tag: blume-20020329-div
39    Description:
40    
41    Fixed my broken div/mod logic.  Unfortunately, this means that the
42    inline code for div/mod now has one more comparison than before.
43    Fast paths (quotient > 0 or remainder = 0) are not affected, though.
44    The problem was with quotient = 0, because that alone does not tell
45    us which way the rounding went.  One then has to look at whether
46    remainder and divisor have the same sign...  :(
47    
48    Anyway, I replaced the bootfiles with fresh ones...
49    
50    ----------------------------------------------------------------------
51    Name: Matthias Blume
52    Date: 2002/03/29 14:10:00 EST
53    Tag: blume-20020329-inlprims
54    Description:
55    
56    NEW BOOTFILES!!!    Version number bumped to 110.39.3.
57    
58    Primops have changed. This means that the bin/boot-file formats have
59    changed as well.
60    
61    To make sure that there is no confusion, I made a new version.
62    
63    
64    CHANGES:
65    
66    * removed REMT from mltree (remainder should never overflow).
67    
68    * added primops to deal with divisions of all flavors to the frontend
69    
70    * handled these primops all the way through so they map to their respective
71      MLRISC support
72    
73    * used these primops in the implementation of Int, Int32, Word, Word32
74    
75    * removed INLDIV, INLMOD, and INLREM as they are no longer necessary
76    
77    * parameterized INLMIN, INLMAX, and INLABS by a numkind
78    
79    * translate.sml now deals with all flavors of INL{MIN,MAX,ABS}, including
80      floating point
81    
82    * used INL{MIN,MAX,ABS} in the implementation of Int, Int32, Word, Word32,
83      and Real (but Real.abs maps to a separate floating-point-only primop)
84    
85    
86    TODO items:
87    
88    * Hacked Alpha32 instruction selection, disabling the selection of REMx
89      instructions because the machine instruction encoder cannot handle
90      them.  (Hppa, PPC, and Sparc instruction selection did not handle
91      REM in the first place, and REM is supported by the x86 machine coder.)
92    
93    * Handle DIV and MOD with DIV_TO_NEGINF directly in the x86 instruction
94      selection phase.  (The two can be streamlined because the hardware
95      delivers both quotient and remainder at the same time anyway.)
96    
97    * Think about what to do with "valOf(Int32.minInt) div ~1" and friends.
98      (Currently the behavior is inconsistent both across architectures and
99      wrt. the draft Basis spec.)
100    
101    * Word8 should eventually be handled natively, too.
102    
103    * There seems to be one serious bug in mltree-gen.sml.  It appears, though,
104      as if there currently is no execution path that could trigger it in
105      SML/NJ.  (The assumptions underlying functions arith and promotable do not
106      hold for things like multiplication and division.)
107    
108    ----------------------------------------------------------------------
109    Name: Matthias Blume
110    Date: 2002/03/27 16:27:00 EST
111    Tag: blume-20020327-mlrisc-divisions
112    Description:
113    
114    Added support for all four division operations (ML's div, mod, quot,
115    and rem) to MLRISC.  In the course of doing so, I also rationalized
116    the naming (no more annoying switch-around of DIV and QUOT), by
117    parameterizing the operation by div_rounding_mode (which can be either
118    DIV_TO_ZERO or DIV_TO_NEGINF).
119    
120    The generic MLTreeGen functor takes care of compiling all four
121    operations down to only round-to-zero div.
122    
123    Missing pieces:
124    
125      * Doing something smarter than relying on MLTreeGen on architectures
126        like, e.g., the x86 where hardware division delivers both quotient and
127        remainder at the same time.  With this, the implementation of the
128        round-to-neginf operations could be further streamlined.
129    
130      * Remove inlining support for div/mod/rem from the frontend and replace it
131        with primops that get carried through to the backend.  Do this for all
132        int and word types.
133    
134    ----------------------------------------------------------------------
135    Name: Matthias Blume
136    Date: 2002/03/25 17:25:00 EST
137    Tag: blume-20020325-divmod
138    Description:
139    
140    I improved (hopefully without breaking them) the implementation of Int.div,
141    Int.mod, and Int.rem.   For this, the code in translate.sml now takes
142    advantage of the following observations:
143    
144      Let  q = x quot y      r = x rem y
145           d = x div  y      m = x mod y
146    
147    where "quot" is the round-to-zero version of integer division that
148    hardware usually provides.  Then we have:
149    
150         r = x - q * y        where neither the * nor the - will overflow
151         d = if q >= 0 orelse x = q * y then q else q - 1
152                              where neither the * nor the - will overflow
153         m = if q >= 0 orelse r = 0 then r else r + y
154                              where the + will not overflow
155    
156    This results in substantial simplification of the generated code.
157    The following table shows the number of CFG nodes and edges generated
158    for
159            fun f (x, y) = x OPER y
160            (* with OPER \in div, mod, quot, rem *)
161    
162    
163        OPER | nodes(old) | edges(old) | nodes(new) | edges(new)
164        --------------------------------------------------------
165         div |         24 |         39 |         12 |         16
166         mod |         41 |         71 |         12 |         16
167        quot |          8 |         10 |          8 |         10
168         rem |         10 |         14 |          8 |         10
169    
170    
171    ----------------------------------------------------------------------
172    Name: Matthias Blume
173    Date: 2002/03/25 22:06:00 EST
174    Tag: blume-20020325-cprotobug
175    Description:
176    
177    Fixed a bug in cproto (c prototype decoder).
178    
179    ----------------------------------------------------------------------
180    Name: Matthias Blume
181    Date: 2002/03/25 16:00:00 EST
182    Tag: blume-20020325-raw-primops
183    Description:
184    
185    I did some cleanup to Allen's new primop code and
186    replaced yesterday's bootfiles with new ones.
187    (But they are stored in the same place.)
188    
189    ----------------------------------------------------------------------
190    Name: Matthias Blume
191    Date: 2002/03/24 22:40:00 EST
192    Tag: blume-20020324-bootfiles
193    Description:
194    
195    Made the bootfiles that Allen asked for.
196    
197    ----------------------------------------------------------------------
198    Name: Allen Leung
199    Date: 2002/03/23 15:50:00 EST
200    Tag: leunga-20020323-flint-cps-rcc-primops
201    Description:
202    
203      1. Changes to FLINT primops:
204    
205        (* make a call to a C-function;
206         * The primop carries C function prototype information and specifies
207         * which of its (ML-) arguments are floating point. C prototype
208         * information is for use by the backend, ML information is for
209         * use by the CPS converter. *)
210      | RAW_CCALL of { c_proto: CTypes.c_proto,
211                       ml_args: ccall_type list,
212                       ml_res_opt: ccall_type option,
213                       reentrant : bool
214                     } option
215       (* Allocate uninitialized storage on the heap.
216        * The record is meant to hold short-lived C objects, i.e., they
217        * are not ML pointers.  With the tag, the representation is
218        * the same as RECORD with tag tag_raw32 (sz=4), or tag_fblock (sz=8)
219        *)
220      | RAW_RECORD of {tag:bool,sz:int}
221      and ccall_type = CCALL_INT32 | CCALL_REAL64 | CCALL_ML_PTR
222    
223      2.  These CPS primops are now overloaded:
224    
225           rawload of {kind:numkind}
226           rawstore of {kind:numkind}
227    
228          The one argument form is:
229    
230             rawload {kind} address
231    
232          The two argument form is:
233    
234             rawload {kind} [ml object, byte-offset]
235    
236      3. RAW_CCALL/RCC now takes two extra arguments:
237    
238         a. The first is whether the C call is reentrant, i.e., whether
239            ML state should be saved and restored.
240         b. The second argument is a string argument specifying the name of
241            library and the C function.
242    
243         These things are currently not handled in the code generator, yet.
244    
245      4. In CProto,
246    
247         An encoding type of "bool" means "ml object" and is mapped into
248         C prototype of PTR.  Note that "bool" is different than "string",
249         even though "string" is also mapped into PTR, because "bool"
250         is assigned an CPS type of BOGt, while "string" is assigned INT32t.
251    
252      5. Pickler/unpicker
253    
254         Changed to handle RAW_RECORD and newest RAW_CCALL
255    
256      6. MLRiscGen,
257    
258         1. Changed to handle the new rawload/rawstore/rawrecord operators.
259         2. Code for handling C Calls has been moved to a new module CPSCCalls,
260            in the file CodeGen/cpscompile/cps-c-calls.sml
261    
262      7. Added the conditional move operator
263    
264             condmove of branch
265    
266         to cps.  Generation of this is still buggy so it is currently
267         disabled.
268    
269    ----------------------------------------------------------------------
270    Name: Lal George
271    Date: 2002/03/22 14:18:25 EST
272    Tag: george-20020322-cps-branch-prob
273    Description:
274    
275    Implemented the Ball-Larus branch prediction-heuristics, and
276    incorporated graphical viewers for control flow graphs.
277    
278    Ball-Larus Heuristics:
279    ---------------------
280    See the file compiler/CodeGen/cpscompile/cpsBranchProb.sml.
281    
282    By design it uses the Dempster-Shafer theory for combining
283    probabilities.  For example, in the function:
284    
285        fun f(n,acc) = if n = 0 then acc else f(n-1, n*acc)
286    
287    the ball-larus heuristics predicts that the n=0 is unlikely
288    (OH-heuristic), and the 'then' branch is unlikely because of the
289    RH-heuristic -- giving the 'then' branch an even lower combined
290    probability using the Dempster-Shafer theory.
291    
292    Finally, John Reppy's loop analysis in MLRISC, further lowers the
293    probability of the 'then' branch because of the loop in the else
294    branch.
295    
296    
297    Graphical Viewing:
298    ------------------
299    I merely plugged in Allen's graphical viewers into the compiler. The
300    additional code is not much. At the top level, saying:
301    
302            Control.MLRISC.getFlag "cfg-graphical-view" := true;
303    
304    will display the graphical view of the control flow graph just before
305    back-patching.  daVinci must be in your path for this to work. If
306    daVinci is not available, then the default viewer can be changed
307    using:
308    
309            Control.MLRISC.getString "viewer"
310    
311    which can be set to "dot" or "vcg" for the corresponding viewers. Of
312    course, these viewers must be in your path.
313    
314    The above will display the compilation unit at the level of clusters,
315    many of which are small, boring, and un-interesting. Also setting:
316    
317            Control.MLRISC.getInt "cfg-graphical-view_size"
318    
319    will display clusters that are larger than the value set by the above.
320    
321    
322    ----------------------------------------------------------------------
323    Name: Matthias Blume
324    Date: 2002/03/21 22:20:00 EST
325    Tag: blume-20020321-kmp-bugfix
326    Description:
327    
328    Changed the interface to the KMP routine in PreString and fixed
329    a minor bug in one place where it was used.
330    
331    ----------------------------------------------------------------------
332    Name: Allen Leung
333    Date: 2002/03/21 20:30:00 EST
334    Tag: leunga-20020321-cfg
335    Description:
336    
337      Fixed a potential problem in cfg edge splitting.
338    
339    ----------------------------------------------------------------------
340    Name: Allen Leung
341    Date: 2002/03/21 17:15:00 EST
342    Tag: leunga-20020321-x86-fp-cfg
343    Description:
344    
345      1. Recoded the buggy parts of x86-fp.
346    
347         a. All the block reordering code has been removed.
348            We now depend on the block placement phases to do this work.
349    
350         b. Critical edge splitting code has been simplified and moved into the
351            CFG modules, as where they belong.
352    
353         Both of these were quite buggy and complex.  The code is now much, much
354         simpler.
355    
356      2. X86 backend.
357    
358         a. Added instructions for 64-bit support.  Instruction selection for
359            64-bit has not been committed, however, since that
360            requires changes to MLTREE which haven't been approved by
361            Lal and John.
362    
363         b. Added support for FUCOMI and FUCOMIP when generating code for
364            PentiumPro and above.  We only generate these instructions in
365            the fast-fp mode.
366    
367         c. Added cases for JP and JNP in X86FreqProps.
368    
369      3. CFG
370    
371         CFG now has a bunch of methods for edge splitting and merging.
372    
373      4. Machine description.
374    
375         John's simplification of MLTREE_BASIS.fcond broke a few machine
376         description things:
377    
378         rtl-build.{sig,sml} and hppa.mdl fixed.
379    
380         NOTE: the machine description stuff in the repository is still broken.
381               Again, I can't put my fixes in because that involves
382               changes to MLTREE.
383    
384    ----------------------------------------------------------------------
385    Name: Matthias Blume
386    Date: 2002/03/20 15:55:00 EST
387    Tag: blume-20020320-kmp
388    Description:
389    
390    Implemented Knuth-Morris-Pratt string matching in PreString and used
391    it for String.isSubstring, Substring.isSubstring, and
392    Substring.position.
393    
394    (Might need some stress-testing.  Simple examples worked fine.)
395    
396    ----------------------------------------------------------------------
397    Name: Matthias Blume
398    Date: 2002/03/19 16:37:00 EST
399    Tag: blume-20020319-witnesses
400    Description:
401    
402    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
403    
404    This implements a generic mechanism for changing constness qualifiers
405    anywhere within big C types without resorting to outright "casts".
406    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
407    modify the constness at the outermost level.)
408    The implementation of "convert" is based on the idea of "witness"
409    values -- values that are not used by the operation but whose types
410    "testify" to their applicability.  On the implementation side, "convert"
411    is simply a projection (returning its second curried argument).  With
412    cross-module inlining, it should not result in any machine code being
413    generated.
414    
415    ----------------------------------------------------------------------
416    Name: Matthias Blume
417    Date: 2002/03/15 16:40:00 EST
418    Tag: blume-20020315-basis
419    Description:
420    
421    Provided (preliminary?) implementations for
422    
423      {String,Substring}.{concatWith,isSuffix,isSubstring}
424    
425    and
426    
427      Substring.full
428    
429    Those are in the Basis spec but they were missing in SML/NJ.
430    
431    ----------------------------------------------------------------------
432    Name: Matthias Blume
433    Date: 2002/03/14 21:30:00 EST
434    Tag: blume-20020314-controls
435    Description:
436    
437    Controls:
438    ---------
439    
440    1. Factored out the recently-added Controls : CONTROLS stuff and put
441       it into its own library $/controls-lib.cm.  The source tree for
442       this is under src/smlnj-lib/Controls.
443    
444    2. Changed the names of types and functions in this interface, so they
445       make a bit more "sense":
446    
447          module -> registry
448          'a registry -> 'a group
449    
450    3. The interface now deals in ref cells only.  The getter/setter interface
451       is (mostly) gone.
452    
453    4. Added a function that lets one register an already-existing ref cell.
454    
455    5. Made the corresponding modifications to the rest of the code so that
456       everything compiles again.
457    
458    6. Changed the implementation of Controls.MLRISC back to something closer
459       to the original.  In particular, this module (and therefore MLRISC)
460       does not depend on Controls.  There now is some link-time code in
461       int-sys.sml that registers the MLRISC controls with the Controls
462       module.
463    
464    CM:
465    ---
466    
467      * One can now specify the lambda-split aggressiveness in init.cmi.
468    
469    ----------------------------------------------------------------------
470    Name: Allen Leung
471    Date: 2002/03/13 17:30:00 EST
472    Tag: leunga-20020313-x86-fp-unary
473    Description:
474    
475    Bug fix for:
476    
477    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
478    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
479    > - fun f(x,(y,z)) = Real.~ y;
480    > [autoloading]
481    > [autoloading done]
482    >       fchsl   (%eax), 184(%esp)
483    > Error: MLRisc bug: X86MCEmitter.emitInstr
484    >
485    > uncaught exception Error
486    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
487    
488    The problem was that the code generator did not generate any fp registers
489    in this case, and the ra didn't know that it needed to run the X86FP phase to
490    translate the pseudo fp instruction.   This only happened with unary fp
491    operators in certain situations.
492    
493    ----------------------------------------------------------------------
494    Name: Matthias Blume
495    Date: 2002/03/13 14:00:00 EST
496    Tag: blume-20020313-overload-etc
497    Description:
498    
499    1. Added _overload as a synonym for overload for backward compatibility.
500       (Control.overloadKW must be true for either version to be accepted.)
501    
502    2. Fixed bug in install script that caused more things to be installed
503       than what was requested in config/targets.
504    
505    3. Made CM aware of the (_)overload construct so that autoloading
506       works.
507    
508    ----------------------------------------------------------------------
509    Name: Matthias Blume
510    Date: 2002/03/12 22:03:00 EST
511    Tag: blume-20020312-url
512    Description:
513    
514    Forgot to update BOOT and srcarchiveurl.
515    
516    ----------------------------------------------------------------------
517    Name: Matthias Blume
518    Date: 2002/03/12 17:30:00 EST
519    Tag: blume-20020312-version110392
520    Description:
521    
522    Yet another version number bump (because of small changes to the
523    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
524    
525    Changes:
526    
527      The new pid generation scheme described a few weeks ago was overly
528      complicated.  I implemented a new mechanism that is simpler and
529      provides a bit more "stability":  Once CM has seen a compilation
530      unit, it keeps its identity constant (as long as you do not delete
531      those crucial CM/GUID/* files).  This means that when you change
532      an interface, compile, then go back to the old interface, and
533      compile again, you arrive at the original pid.
534    
535      There now also is a mechanism that instructs CM to use the plain
536      environment hash as a module's pid (effectively making its GUID
537      the empty string).  For this, "noguid" must be specified as an
538      option to the .sml file in question within its .cm file.
539      This is most useful for code that is being generated by tools such
540      as ml-nlffigen (because during development programmers tend to
541      erase the tool's entire output directory tree including CM's cached
542      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
543      revert to the old, broken behavior of SML/NJ, but in specific cases
544      where there is no danger of interface confusion, its use is ok
545      (I think).
546    
547      ml-nlffigen by default generates "noguid" annotations.  They can be
548      turned off by specifying -guid in its command line.
549    
550    ----------------------------------------------------------------------
551    Name: Lal George
552    Date: 2002/03/12 12 14:42:36 EST
553    Tag: george-20020312-frequency-computation
554    Description:
555    
556    Integrated jump chaining and static block frequency into the
557    compiler. More details and numbers later.
558    
559    ----------------------------------------------------------------------
560    Name: Lal George
561    Date: 2002/03/11 11 22:38:53 EST
562    Tag: george-20020311-jump-chain-elim
563    Description:
564    
565    Tested the jump chain elimination on all architectures (except the
566    hppa).  This is on by default right now and is profitable for the
567    alpha and x86, however, it may not be profitable for the sparc and ppc
568    when compiling the compiler.
569    
570    The gc test will typically jump to a label at the end of the cluster,
571    where there is another jump to an external cluster containing the actual
572    code to invoke gc. This is to allow factoring of common gc invocation
573    sequences. That is to say, we generate:
574    
575            f:
576               testgc
577               ja   L1      % jump if above to L1
578    
579            L1:
580               jmp L2
581    
582    
583    After jump chain elimination the 'ja L1' instructions is converted to
584    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
585    up being implemented in their long form (if L2 is far away) using:
586    
587            jbe     L3      % jump if below or equal to L3
588            jmp     L2
589         L3:
590            ...
591    
592    
593    For large compilation units L2  may be far away.
594    
595    
596    ----------------------------------------------------------------------
597    Name: Matthias Blume
598    Date: 2002/03/11 13:30:00 EST
599    Tag: blume-20020311-mltreeeval
600    Description:
601    
602    A functor parameter was missing.
603    
604    ----------------------------------------------------------------------
605    Name: Allen Leung
606    Date: 2002/03/11 10:30:00 EST
607    Tag: leunga-20020311-runtime-string0
608    Description:
609    
610       The representation of the empty string now points to a
611    legal null terminated C string instead of unit.  It is now possible
612    to convert an ML string into C string with InlineT.CharVector.getData.
613    This compiles into one single machine instruction.
614    
615    ----------------------------------------------------------------------
616    Name: Allen Leung
617    Date: 2002/03/10 23:55:00 EST
618    Tag: leunga-20020310-x86-call
619    Description:
620    
621       Added machine generation for CALL instruction (relative displacement mode)
622    
623    ----------------------------------------------------------------------
624    Name: Matthias Blume
625    Date: 2002/03/08 16:05:00
626    Tag: blume-20020308-entrypoints
627    Description:
628    
629    Version number bumped to 110.39.1.  NEW BOOTFILES!
630    
631    Entrypoints: non-zero offset into a code object where execution should begin.
632    
633    - Added the notion of an entrypoint to CodeObj.
634    - Added reading/writing of entrypoint info to Binfile.
635    - Made runtime system bootloader aware of entrypoints.
636    - Use the address of the label of the first function given to mlriscGen
637      as the entrypoint.  This address is currently always 0, but it will
638      not be 0 once we turn on block placement.
639    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
640      with entry points) from mlriscGen.
641    
642    ----------------------------------------------------------------------
643    Name: Allen Leung
644    Date: 2002/03/07 20:45:00 EST
645    Tag: leunga-20020307-x86-cmov
646    Description:
647    
648       Bug fixes for CMOVcc on x86.
649    
650       1. Added machine code generation for CMOVcc
651       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
652       3. CMOVcc cannot have an immediate operand as argument.
653    
654    ----------------------------------------------------------------------
655    Name: Matthias Blume
656    Date: 2002/03/07 16:15:00 EST
657    Tag: blume-20020307-controls
658    Description:
659    
660    This is a very large but mostly boring patch which makes (almost)
661    every tuneable compiler knob (i.e., pretty much everything under
662    Control.* plus a few other things) configurable via both the command
663    line and environment variables in the style CM did its configuration
664    until now.
665    
666    Try starting sml with '-h' (or, if you are brave, '-H')
667    
668    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
669    implements the underlying generic mechanism.
670    
671    The interface to some of the existing such facilities has changed somewhat.
672    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
673    (The getFoo interface is still there for backward-compatibility, but its
674    use is deprecated.)
675    
676    The ml-build script passes -Cxxx=yyy command-line arguments through so
677    that one can now twiddle the compiler settings when using this "batch"
678    compiler.
679    
680    TODO items:
681    
682    We should go through and throw out all controls that are no longer
683    connected to anything.  Moreover, we should go through and provide
684    meaningful (and correct!) documentation strings for those controls
685    that still are connected.
686    
687    Currently, multiple calls to Controls.new are accepted (only the first
688    has any effect).  Eventually we should make sure that every control
689    is being made (via Controls.new) exactly once.  Future access can then
690    be done using Controls.acc.
691    
692    Finally, it would probably be a good idea to use the getter-setter
693    interface to controls rather than ref cells.  For the time being, both
694    styles are provided by the Controls module, but getter-setter pairs are
695    better if thread-safety is of any concern because they can be wrapped.
696    
697    *****************************************
698    
699    One bug fix: The function blockPlacement in three of the MLRISC
700    backpatch files used to be hard-wired to one of two possibilities at
701    link time (according to the value of the placementFlag).  But (I
702    think) it should rather sense the flag every time.
703    
704    *****************************************
705    
706    Other assorted changes (by other people who did not supply a HISTORY entry):
707    
708    1. the cross-module inliner now works much better (Monnier)
709    2. representation of weights, frequencies, and probabilities in MLRISC
710       changed in preparation of using those for weighted block placement
711       (Reppy, George)
712    
713    ----------------------------------------------------------------------
714    Name: Lal George
715    Date: 2002/03/07 14:44:24 EST 2002
716    Tag: george-20020307-weighted-block-placement
717    
718    Tested the weighted block placement optimization on all architectures
719    (except the hppa) using AMPL to generate the block and edge frequencies.
720    Changes were required in the machine properties to correctly
721    categorize trap instructions. There is an MLRISC flag
722    "weighted-block-placement" that can be used to enable weighted block
723    placement, but this will be ineffective without block/edge
724    frequencies (coming soon).
725    
726    
727    ----------------------------------------------------------------------
728    Name: Lal George
729    Date: 2002/03/05 17:24:48 EST
730    Tag: george-20020305-linkage-cluster
731    
732    In order to support the block placement optimization, a new cluster
733    is generated as the very first cluster (called the linkage cluster).
734    It contains a single jump to the 'real' entry point for the compilation
735    unit. Block placement has no effect on the linkage cluster itself, but
736    all the other clusters  have full freedom in the manner in which they
737    reorder blocks or functions.
738    
739    On the x86 the typical linkage code that is generated is:
740       ----------------------
741            .align 2
742       L0:
743            addl    $L1-L0, 72(%esp)
744            jmp     L1
745    
746    
747            .align  2
748       L1:
749       ----------------------
750    
751    72(%esp) is the memory location for the stdlink register. This
752    must contain the address of the CPS function being called. In the
753    above example, it contains the address of  L0; before
754    calling L1 (the real entry point for the compilation unit), it
755    must contain the address for L1, and hence
756    
757            addl $L1-L0, 72(%esp)
758    
759    I have tested this on all architectures except the hppa.The increase
760    in code size is of course negligible
761    
762    ----------------------------------------------------------------------
763    Name: Allen Leung
764    Date: 2002/03/03 13:20:00 EST
765    Tag: leunga-20020303-mlrisc-tools
766    
767      Added #[ ... ] expressions to mlrisc tools
768    
769    ----------------------------------------------------------------------
770    Name: Matthias Blume
771    Date: 2002/02/27 12:29:00 EST
772    Tag: blume-20020227-cdebug
773    Description:
774    
775    - made types in structure C and C_Debug to be equal
776    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
777    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
778    
779    ----------------------------------------------------------------------
780    Name: Matthias Blume
781    Date: 2002/02/26 12:00:00 EST
782    Tag: blume-20020226-ffi
783    Description:
784    
785    1. Fixed a minor bug in CM's "noweb" tool:
786       If numbering is turned off, then truly don't number (i.e., do not
787       supply the -L option to noweb).  The previous behavior was to supply
788       -L'' -- which caused noweb to use the "default" line numbering scheme.
789       Thanks to Chris Richards for pointing this out (and supplying the fix).
790    
791    2. Once again, I reworked some aspects of the FFI:
792    
793       A. The incomplete/complete type business:
794    
795       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
796         gone!
797       - ML types representing an incomplete type are now *equal* to
798         ML types representing their corresponding complete types (just like
799         in C).  This is still safe because ml-nlffigen will not generate
800         RTTI for incomplete types, nor will it generate functions that
801         require access to such RTTI.   But when ML code generated from both
802         incomplete and complete versions of the C type meet, the ML types
803         are trivially interoperable.
804    
805         NOTE:  These changes restore the full generality of the translation
806         (which was previously lost when I eliminated functorization)!
807    
808       B. Enum types:
809    
810       - Structure C now has a type constructor "enum" that is similar to
811         how the "su" constructor works.  However, "enum" is not a phantom
812         type because each "T enum" has values (and is isomorphic to
813         MLRep.Signed.int).
814       - There are generic access operations for enum objects (using
815         MLRep.Signed.int).
816       - ml-nlffigen will generate a structure E_foo for each "enum foo".
817         * The structure contains the definition of type "mlrep" (the ML-side
818         representation type of the enum).  Normally, mlrep is the same
819         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
820         then mlrep will be defined as a datatype -- thus facilitating
821         pattern matching on mlrep values.
822         ("-ec" will be suppressed if there are duplicate values in an
823          enumeration.)
824         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
825         will be generated for each C enum constant xxx.
826         * Conversion functions m2i and i2m convert between mlrep and
827         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
828         * Coversion functions c and ml convert between mlrep and "tag enum".
829         * Access functions (get/set) fetch and store mlrep values.
830       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
831         enumerations are merged into one single enumeration represented by
832         structure E_'.
833    
834    ----------------------------------------------------------------------
835    Name: Allen Leung
836    Date: 2002/02/25 04:45:00 EST
837    Tag: leunga-20020225-cps-spill
838    
839    This is a new implementation of the CPS spill phase.
840    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
841    In case of problems, replace it with the old file spill.sml
842    
843    The current compiler runs into some serious performance problems when
844    constructing a large record.  This can happen when we try to compile a
845    structure with many items.  Even a very simple structure like the following
846    makes the compiler slow down.
847    
848        structure Foo = struct
849           val x_1 = 0w1 : Word32.int
850           val x_2 = 0w2 : Word32.int
851           val x_3 = 0w3 : Word32.int
852           ...
853           val x_N = 0wN : Word32.int
854        end
855    
856    The following table shows the compile time, from N=1000 to N=4000,
857    with the old compiler:
858    
859    N
860    1000   CPS 100 spill                           0.04u  0.00s  0.00g
861           MLRISC ra                               0.06u  0.00s  0.05g
862              (spills = 0 reloads = 0)
863           TOTAL                                   0.63u  0.07s  0.21g
864    
865    1100   CPS 100 spill                           8.25u  0.32s  0.64g
866           MLRISC ra                               5.68u  0.59s  3.93g
867              (spills = 0 reloads = 0)
868           TOTAL                                   14.71u  0.99s  4.81g
869    
870    1500   CPS 100 spill                           58.55u  2.34s  1.74g
871           MLRISC ra                               5.54u  0.65s  3.91g
872              (spills = 543 reloads = 1082)
873           TOTAL                                   65.40u  3.13s  6.00g
874    
875    2000   CPS 100 spill                           126.69u  4.84s  3.08g
876           MLRISC ra                               0.80u  0.10s  0.55g
877              (spills = 42 reloads = 84)
878           TOTAL                                   129.42u  5.10s  4.13g
879    
880    3000   CPS 100 spill                           675.59u  19.03s  11.64g
881           MLRISC ra                               2.69u  0.27s  1.38g
882              (spills = 62 reloads = 124)
883           TOTAL                                   682.48u  19.61s  13.99g
884    
885    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
886           MLRISC ra                               4.96u  0.27s  2.72g
887              (spills = 85 reloads = 170)
888           TOTAL                                   2375.26u  57.21s  48.00g
889    
890    As you can see the old cps spill module suffers from some serious
891    performance problem.  But since I cannot decipher the old code fully,
892    instead of patching the problems up, I'm reimplementing it
893    with a different algorithm.  The new code is more modular,
894    smaller when compiled, and substantially faster
895    (O(n log n) time and O(n) space).  Timing of the new spill module:
896    
897    4000  CPS 100 spill                           0.02u  0.00s  0.00g
898          MLRISC ra                               0.25u  0.02s  0.15g
899             (spills=1 reloads=3)
900          TOTAL                                   7.74u  0.34s  1.62g
901    
902    Implementation details:
903    
904    As far as I can tell, the purpose of the CPS spill module is to make sure the
905    number of live variables at any program point (the bandwidth)
906    does not exceed a certain limit, which is determined by the
907    size of the spill area.
908    
909    When the bandwidth is too large, we decrease the register pressure by
910    packing live variables into spill records.  How we achieve this is
911    completely different than what we did in the old code.
912    
913    First, there is something about the MLRiscGen code generator
914    that we should be aware of:
915    
916    o MLRiscGen performs code motion!
917    
918       In particular, it will move floating point computations and
919       address computations involving only the heap pointer to
920       their use sites (if there is only a single use).
921       What this means is that if we have a CPS record construction
922       statement
923    
924           RECORD(k,vl,w,e)
925    
926       we should never count the new record address w as live if w
927       has only one use (which is often the case).
928    
929       We should do something similar to floating point, but the transformation
930       there is much more complex, so I won't deal with that.
931    
932    Secondly, there are now two new cps primops at our disposal:
933    
934     1. rawrecord of record_kind option
935        This pure operator allocates some uninitialized storage from the heap.
936        There are two forms:
937    
938         rawrecord NONE [INT n]  allocates a tagless record of length n
939         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
940                                     and initializes the tag.
941    
942     2. rawupdate of cty
943          rawupdate cty (v,i,x)
944          Assigns to x to the ith component of record v.
945          The storelist is not updated.
946    
947    We use these new primops for both spilling and increment record construction.
948    
949     1. Spilling.
950    
951        This is implemented with a linear scan algorithm (but generalized
952        to trees).  The algorithm will create a single spill record at the
953        beginning of the cps function and use rawupdate to spill to it,
954        and SELECT or SELp to reload from it.  So both spills and reloads
955        are fine-grain operations.  In contrast, in the old algorithm
956        "spills" have to be bundled together in records.
957    
958        Ideally, we should sink the spill record construction to where
959        it is needed.  We can even split the spill record into multiple ones
960        at the places where they are needed.  But CPS is not a good
961        representation for global code motion, so I'll keep it simple and
962        am not attempting this.
963    
964     2. Incremental record construction (aka record splitting).
965    
966        Long records with many component values which are simulatenously live
967        (recall that single use record addresses are not considered to
968         be live) are constructed with rawrecord and rawupdate.
969        We allocate space on the heap with rawrecord first, then gradually
970        fill it in with rawupdate.  This is the technique suggested to me
971        by Matthias.
972    
973        Some restrictions on when this is applicable:
974        1. It is not a VECTOR record.  The code generator currently does not handle
975           this case. VECTOR record uses double indirection like arrays.
976        2. All the record component values are defined in the same "basic block"
977           as the record constructor.  This is to prevent speculative
978           record construction.
979    
980    ----------------------------------------------------------------------
981    Name: Allen Leung
982    Date: 2002/02/22 01:02:00 EST
983    Tag: leunga-20020222-mlrisc-tools
984    
985    Minor bug fixes in the parser and rewriter
986    
987    ----------------------------------------------------------------------
988  Name: Allen Leung  Name: Allen Leung
989  Date: 2002/02/21 20:20:00 EST  Date: 2002/02/21 20:20:00 EST
990  Tag: leunga-20020221-peephole  Tag: leunga-20020221-peephole

Legend:
Removed from v.1085  
changed lines
  Added in v.1185

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0