Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1067, Fri Feb 15 17:08:17 2002 UTC revision 1181, Wed Mar 27 21:27:27 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Matthias Blume  Name: Matthias Blume
17    Date: 2002/03/27 16:27:00 EST
18    Tag: blume-20020327-mlrisc-divisions
19    Description:
20    
21    Added support for all four division operations (ML's div, mod, quot,
22    and rem) to MLRISC.  In the course of doing so, I also rationalized
23    the naming (no more annoying switch-around of DIV and QUOT), by
24    parameterizing the operation by div_rounding_mode (which can be either
25    DIV_TO_ZERO or DIV_TO_NEGINF).
26    
27    The generic MLTreeGen functor takes care of compiling all four
28    operations down to only round-to-zero div.
29    
30    Missing pieces:
31    
32      * Doing something smarter than relying on MLTreeGen on architectures
33        like, e.g., the x86 where hardware division delivers both quotient and
34        remainder at the same time.  With this, the implementation of the
35        round-to-neginf operations could be further streamlined.
36    
37      * Remove inlining support for div/mod/rem from the frontend and replace it
38        with primops that get carried through to the backend.  Do this for all
39        int and word types.
40    
41    ----------------------------------------------------------------------
42    Name: Matthias Blume
43    Date: 2002/03/25 17:25:00 EST
44    Tag: blume-20020325-divmod
45    Description:
46    
47    I improved (hopefully without breaking them) the implementation of Int.div,
48    Int.mod, and Int.rem.   For this, the code in translate.sml now takes
49    advantage of the following observations:
50    
51      Let  q = x quot y      r = x rem y
52           d = x div  y      m = x mod y
53    
54    where "quot" is the round-to-zero version of integer division that
55    hardware usually provides.  Then we have:
56    
57         r = x - q * y        where neither the * nor the - will overflow
58         d = if q >= 0 orelse x = q * y then q else q - 1
59                              where neither the * nor the - will overflow
60         m = if q >= 0 orelse r = 0 then r else r + y
61                              where the + will not overflow
62    
63    This results in substantial simplification of the generated code.
64    The following table shows the number of CFG nodes and edges generated
65    for
66            fun f (x, y) = x OPER y
67            (* with OPER \in div, mod, quot, rem *)
68    
69    
70        OPER | nodes(old) | edges(old) | nodes(new) | edges(new)
71        --------------------------------------------------------
72         div |         24 |         39 |         12 |         16
73         mod |         41 |         71 |         12 |         16
74        quot |          8 |         10 |          8 |         10
75         rem |         10 |         14 |          8 |         10
76    
77    
78    ----------------------------------------------------------------------
79    Name: Matthias Blume
80    Date: 2002/03/25 22:06:00 EST
81    Tag: blume-20020325-cprotobug
82    Description:
83    
84    Fixed a bug in cproto (c prototype decoder).
85    
86    ----------------------------------------------------------------------
87    Name: Matthias Blume
88    Date: 2002/03/25 16:00:00 EST
89    Tag: blume-20020325-raw-primops
90    Description:
91    
92    I did some cleanup to Allen's new primop code and
93    replaced yesterday's bootfiles with new ones.
94    (But they are stored in the same place.)
95    
96    ----------------------------------------------------------------------
97    Name: Matthias Blume
98    Date: 2002/03/24 22:40:00 EST
99    Tag: blume-20020324-bootfiles
100    Description:
101    
102    Made the bootfiles that Allen asked for.
103    
104    ----------------------------------------------------------------------
105    Name: Allen Leung
106    Date: 2002/03/23 15:50:00 EST
107    Tag: leunga-20020323-flint-cps-rcc-primops
108    Description:
109    
110      1. Changes to FLINT primops:
111    
112        (* make a call to a C-function;
113         * The primop carries C function prototype information and specifies
114         * which of its (ML-) arguments are floating point. C prototype
115         * information is for use by the backend, ML information is for
116         * use by the CPS converter. *)
117      | RAW_CCALL of { c_proto: CTypes.c_proto,
118                       ml_args: ccall_type list,
119                       ml_res_opt: ccall_type option,
120                       reentrant : bool
121                     } option
122       (* Allocate uninitialized storage on the heap.
123        * The record is meant to hold short-lived C objects, i.e., they
124        * are not ML pointers.  With the tag, the representation is
125        * the same as RECORD with tag tag_raw32 (sz=4), or tag_fblock (sz=8)
126        *)
127      | RAW_RECORD of {tag:bool,sz:int}
128      and ccall_type = CCALL_INT32 | CCALL_REAL64 | CCALL_ML_PTR
129    
130      2.  These CPS primops are now overloaded:
131    
132           rawload of {kind:numkind}
133           rawstore of {kind:numkind}
134    
135          The one argument form is:
136    
137             rawload {kind} address
138    
139          The two argument form is:
140    
141             rawload {kind} [ml object, byte-offset]
142    
143      3. RAW_CCALL/RCC now takes two extra arguments:
144    
145         a. The first is whether the C call is reentrant, i.e., whether
146            ML state should be saved and restored.
147         b. The second argument is a string argument specifying the name of
148            library and the C function.
149    
150         These things are currently not handled in the code generator, yet.
151    
152      4. In CProto,
153    
154         An encoding type of "bool" means "ml object" and is mapped into
155         C prototype of PTR.  Note that "bool" is different than "string",
156         even though "string" is also mapped into PTR, because "bool"
157         is assigned an CPS type of BOGt, while "string" is assigned INT32t.
158    
159      5. Pickler/unpicker
160    
161         Changed to handle RAW_RECORD and newest RAW_CCALL
162    
163      6. MLRiscGen,
164    
165         1. Changed to handle the new rawload/rawstore/rawrecord operators.
166         2. Code for handling C Calls has been moved to a new module CPSCCalls,
167            in the file CodeGen/cpscompile/cps-c-calls.sml
168    
169      7. Added the conditional move operator
170    
171             condmove of branch
172    
173         to cps.  Generation of this is still buggy so it is currently
174         disabled.
175    
176    ----------------------------------------------------------------------
177    Name: Lal George
178    Date: 2002/03/22 14:18:25 EST
179    Tag: george-20020322-cps-branch-prob
180    Description:
181    
182    Implemented the Ball-Larus branch prediction-heuristics, and
183    incorporated graphical viewers for control flow graphs.
184    
185    Ball-Larus Heuristics:
186    ---------------------
187    See the file compiler/CodeGen/cpscompile/cpsBranchProb.sml.
188    
189    By design it uses the Dempster-Shafer theory for combining
190    probabilities.  For example, in the function:
191    
192        fun f(n,acc) = if n = 0 then acc else f(n-1, n*acc)
193    
194    the ball-larus heuristics predicts that the n=0 is unlikely
195    (OH-heuristic), and the 'then' branch is unlikely because of the
196    RH-heuristic -- giving the 'then' branch an even lower combined
197    probability using the Dempster-Shafer theory.
198    
199    Finally, John Reppy's loop analysis in MLRISC, further lowers the
200    probability of the 'then' branch because of the loop in the else
201    branch.
202    
203    
204    Graphical Viewing:
205    ------------------
206    I merely plugged in Allen's graphical viewers into the compiler. The
207    additional code is not much. At the top level, saying:
208    
209            Control.MLRISC.getFlag "cfg-graphical-view" := true;
210    
211    will display the graphical view of the control flow graph just before
212    back-patching.  daVinci must be in your path for this to work. If
213    daVinci is not available, then the default viewer can be changed
214    using:
215    
216            Control.MLRISC.getString "viewer"
217    
218    which can be set to "dot" or "vcg" for the corresponding viewers. Of
219    course, these viewers must be in your path.
220    
221    The above will display the compilation unit at the level of clusters,
222    many of which are small, boring, and un-interesting. Also setting:
223    
224            Control.MLRISC.getInt "cfg-graphical-view_size"
225    
226    will display clusters that are larger than the value set by the above.
227    
228    
229    ----------------------------------------------------------------------
230    Name: Matthias Blume
231    Date: 2002/03/21 22:20:00 EST
232    Tag: blume-20020321-kmp-bugfix
233    Description:
234    
235    Changed the interface to the KMP routine in PreString and fixed
236    a minor bug in one place where it was used.
237    
238    ----------------------------------------------------------------------
239    Name: Allen Leung
240    Date: 2002/03/21 20:30:00 EST
241    Tag: leunga-20020321-cfg
242    Description:
243    
244      Fixed a potential problem in cfg edge splitting.
245    
246    ----------------------------------------------------------------------
247    Name: Allen Leung
248    Date: 2002/03/21 17:15:00 EST
249    Tag: leunga-20020321-x86-fp-cfg
250    Description:
251    
252      1. Recoded the buggy parts of x86-fp.
253    
254         a. All the block reordering code has been removed.
255            We now depend on the block placement phases to do this work.
256    
257         b. Critical edge splitting code has been simplified and moved into the
258            CFG modules, as where they belong.
259    
260         Both of these were quite buggy and complex.  The code is now much, much
261         simpler.
262    
263      2. X86 backend.
264    
265         a. Added instructions for 64-bit support.  Instruction selection for
266            64-bit has not been committed, however, since that
267            requires changes to MLTREE which haven't been approved by
268            Lal and John.
269    
270         b. Added support for FUCOMI and FUCOMIP when generating code for
271            PentiumPro and above.  We only generate these instructions in
272            the fast-fp mode.
273    
274         c. Added cases for JP and JNP in X86FreqProps.
275    
276      3. CFG
277    
278         CFG now has a bunch of methods for edge splitting and merging.
279    
280      4. Machine description.
281    
282         John's simplification of MLTREE_BASIS.fcond broke a few machine
283         description things:
284    
285         rtl-build.{sig,sml} and hppa.mdl fixed.
286    
287         NOTE: the machine description stuff in the repository is still broken.
288               Again, I can't put my fixes in because that involves
289               changes to MLTREE.
290    
291    ----------------------------------------------------------------------
292    Name: Matthias Blume
293    Date: 2002/03/20 15:55:00 EST
294    Tag: blume-20020320-kmp
295    Description:
296    
297    Implemented Knuth-Morris-Pratt string matching in PreString and used
298    it for String.isSubstring, Substring.isSubstring, and
299    Substring.position.
300    
301    (Might need some stress-testing.  Simple examples worked fine.)
302    
303    ----------------------------------------------------------------------
304    Name: Matthias Blume
305    Date: 2002/03/19 16:37:00 EST
306    Tag: blume-20020319-witnesses
307    Description:
308    
309    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
310    
311    This implements a generic mechanism for changing constness qualifiers
312    anywhere within big C types without resorting to outright "casts".
313    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
314    modify the constness at the outermost level.)
315    The implementation of "convert" is based on the idea of "witness"
316    values -- values that are not used by the operation but whose types
317    "testify" to their applicability.  On the implementation side, "convert"
318    is simply a projection (returning its second curried argument).  With
319    cross-module inlining, it should not result in any machine code being
320    generated.
321    
322    ----------------------------------------------------------------------
323    Name: Matthias Blume
324    Date: 2002/03/15 16:40:00 EST
325    Tag: blume-20020315-basis
326    Description:
327    
328    Provided (preliminary?) implementations for
329    
330      {String,Substring}.{concatWith,isSuffix,isSubstring}
331    
332    and
333    
334      Substring.full
335    
336    Those are in the Basis spec but they were missing in SML/NJ.
337    
338    ----------------------------------------------------------------------
339    Name: Matthias Blume
340    Date: 2002/03/14 21:30:00 EST
341    Tag: blume-20020314-controls
342    Description:
343    
344    Controls:
345    ---------
346    
347    1. Factored out the recently-added Controls : CONTROLS stuff and put
348       it into its own library $/controls-lib.cm.  The source tree for
349       this is under src/smlnj-lib/Controls.
350    
351    2. Changed the names of types and functions in this interface, so they
352       make a bit more "sense":
353    
354          module -> registry
355          'a registry -> 'a group
356    
357    3. The interface now deals in ref cells only.  The getter/setter interface
358       is (mostly) gone.
359    
360    4. Added a function that lets one register an already-existing ref cell.
361    
362    5. Made the corresponding modifications to the rest of the code so that
363       everything compiles again.
364    
365    6. Changed the implementation of Controls.MLRISC back to something closer
366       to the original.  In particular, this module (and therefore MLRISC)
367       does not depend on Controls.  There now is some link-time code in
368       int-sys.sml that registers the MLRISC controls with the Controls
369       module.
370    
371    CM:
372    ---
373    
374      * One can now specify the lambda-split aggressiveness in init.cmi.
375    
376    ----------------------------------------------------------------------
377    Name: Allen Leung
378    Date: 2002/03/13 17:30:00 EST
379    Tag: leunga-20020313-x86-fp-unary
380    Description:
381    
382    Bug fix for:
383    
384    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
385    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
386    > - fun f(x,(y,z)) = Real.~ y;
387    > [autoloading]
388    > [autoloading done]
389    >       fchsl   (%eax), 184(%esp)
390    > Error: MLRisc bug: X86MCEmitter.emitInstr
391    >
392    > uncaught exception Error
393    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
394    
395    The problem was that the code generator did not generate any fp registers
396    in this case, and the ra didn't know that it needed to run the X86FP phase to
397    translate the pseudo fp instruction.   This only happened with unary fp
398    operators in certain situations.
399    
400    ----------------------------------------------------------------------
401    Name: Matthias Blume
402    Date: 2002/03/13 14:00:00 EST
403    Tag: blume-20020313-overload-etc
404    Description:
405    
406    1. Added _overload as a synonym for overload for backward compatibility.
407       (Control.overloadKW must be true for either version to be accepted.)
408    
409    2. Fixed bug in install script that caused more things to be installed
410       than what was requested in config/targets.
411    
412    3. Made CM aware of the (_)overload construct so that autoloading
413       works.
414    
415    ----------------------------------------------------------------------
416    Name: Matthias Blume
417    Date: 2002/03/12 22:03:00 EST
418    Tag: blume-20020312-url
419    Description:
420    
421    Forgot to update BOOT and srcarchiveurl.
422    
423    ----------------------------------------------------------------------
424    Name: Matthias Blume
425    Date: 2002/03/12 17:30:00 EST
426    Tag: blume-20020312-version110392
427    Description:
428    
429    Yet another version number bump (because of small changes to the
430    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
431    
432    Changes:
433    
434      The new pid generation scheme described a few weeks ago was overly
435      complicated.  I implemented a new mechanism that is simpler and
436      provides a bit more "stability":  Once CM has seen a compilation
437      unit, it keeps its identity constant (as long as you do not delete
438      those crucial CM/GUID/* files).  This means that when you change
439      an interface, compile, then go back to the old interface, and
440      compile again, you arrive at the original pid.
441    
442      There now also is a mechanism that instructs CM to use the plain
443      environment hash as a module's pid (effectively making its GUID
444      the empty string).  For this, "noguid" must be specified as an
445      option to the .sml file in question within its .cm file.
446      This is most useful for code that is being generated by tools such
447      as ml-nlffigen (because during development programmers tend to
448      erase the tool's entire output directory tree including CM's cached
449      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
450      revert to the old, broken behavior of SML/NJ, but in specific cases
451      where there is no danger of interface confusion, its use is ok
452      (I think).
453    
454      ml-nlffigen by default generates "noguid" annotations.  They can be
455      turned off by specifying -guid in its command line.
456    
457    ----------------------------------------------------------------------
458    Name: Lal George
459    Date: 2002/03/12 12 14:42:36 EST
460    Tag: george-20020312-frequency-computation
461    Description:
462    
463    Integrated jump chaining and static block frequency into the
464    compiler. More details and numbers later.
465    
466    ----------------------------------------------------------------------
467    Name: Lal George
468    Date: 2002/03/11 11 22:38:53 EST
469    Tag: george-20020311-jump-chain-elim
470    Description:
471    
472    Tested the jump chain elimination on all architectures (except the
473    hppa).  This is on by default right now and is profitable for the
474    alpha and x86, however, it may not be profitable for the sparc and ppc
475    when compiling the compiler.
476    
477    The gc test will typically jump to a label at the end of the cluster,
478    where there is another jump to an external cluster containing the actual
479    code to invoke gc. This is to allow factoring of common gc invocation
480    sequences. That is to say, we generate:
481    
482            f:
483               testgc
484               ja   L1      % jump if above to L1
485    
486            L1:
487               jmp L2
488    
489    
490    After jump chain elimination the 'ja L1' instructions is converted to
491    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
492    up being implemented in their long form (if L2 is far away) using:
493    
494            jbe     L3      % jump if below or equal to L3
495            jmp     L2
496         L3:
497            ...
498    
499    
500    For large compilation units L2  may be far away.
501    
502    
503    ----------------------------------------------------------------------
504    Name: Matthias Blume
505    Date: 2002/03/11 13:30:00 EST
506    Tag: blume-20020311-mltreeeval
507    Description:
508    
509    A functor parameter was missing.
510    
511    ----------------------------------------------------------------------
512    Name: Allen Leung
513    Date: 2002/03/11 10:30:00 EST
514    Tag: leunga-20020311-runtime-string0
515    Description:
516    
517       The representation of the empty string now points to a
518    legal null terminated C string instead of unit.  It is now possible
519    to convert an ML string into C string with InlineT.CharVector.getData.
520    This compiles into one single machine instruction.
521    
522    ----------------------------------------------------------------------
523    Name: Allen Leung
524    Date: 2002/03/10 23:55:00 EST
525    Tag: leunga-20020310-x86-call
526    Description:
527    
528       Added machine generation for CALL instruction (relative displacement mode)
529    
530    ----------------------------------------------------------------------
531    Name: Matthias Blume
532    Date: 2002/03/08 16:05:00
533    Tag: blume-20020308-entrypoints
534    Description:
535    
536    Version number bumped to 110.39.1.  NEW BOOTFILES!
537    
538    Entrypoints: non-zero offset into a code object where execution should begin.
539    
540    - Added the notion of an entrypoint to CodeObj.
541    - Added reading/writing of entrypoint info to Binfile.
542    - Made runtime system bootloader aware of entrypoints.
543    - Use the address of the label of the first function given to mlriscGen
544      as the entrypoint.  This address is currently always 0, but it will
545      not be 0 once we turn on block placement.
546    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
547      with entry points) from mlriscGen.
548    
549    ----------------------------------------------------------------------
550    Name: Allen Leung
551    Date: 2002/03/07 20:45:00 EST
552    Tag: leunga-20020307-x86-cmov
553    Description:
554    
555       Bug fixes for CMOVcc on x86.
556    
557       1. Added machine code generation for CMOVcc
558       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
559       3. CMOVcc cannot have an immediate operand as argument.
560    
561    ----------------------------------------------------------------------
562    Name: Matthias Blume
563    Date: 2002/03/07 16:15:00 EST
564    Tag: blume-20020307-controls
565    Description:
566    
567    This is a very large but mostly boring patch which makes (almost)
568    every tuneable compiler knob (i.e., pretty much everything under
569    Control.* plus a few other things) configurable via both the command
570    line and environment variables in the style CM did its configuration
571    until now.
572    
573    Try starting sml with '-h' (or, if you are brave, '-H')
574    
575    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
576    implements the underlying generic mechanism.
577    
578    The interface to some of the existing such facilities has changed somewhat.
579    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
580    (The getFoo interface is still there for backward-compatibility, but its
581    use is deprecated.)
582    
583    The ml-build script passes -Cxxx=yyy command-line arguments through so
584    that one can now twiddle the compiler settings when using this "batch"
585    compiler.
586    
587    TODO items:
588    
589    We should go through and throw out all controls that are no longer
590    connected to anything.  Moreover, we should go through and provide
591    meaningful (and correct!) documentation strings for those controls
592    that still are connected.
593    
594    Currently, multiple calls to Controls.new are accepted (only the first
595    has any effect).  Eventually we should make sure that every control
596    is being made (via Controls.new) exactly once.  Future access can then
597    be done using Controls.acc.
598    
599    Finally, it would probably be a good idea to use the getter-setter
600    interface to controls rather than ref cells.  For the time being, both
601    styles are provided by the Controls module, but getter-setter pairs are
602    better if thread-safety is of any concern because they can be wrapped.
603    
604    *****************************************
605    
606    One bug fix: The function blockPlacement in three of the MLRISC
607    backpatch files used to be hard-wired to one of two possibilities at
608    link time (according to the value of the placementFlag).  But (I
609    think) it should rather sense the flag every time.
610    
611    *****************************************
612    
613    Other assorted changes (by other people who did not supply a HISTORY entry):
614    
615    1. the cross-module inliner now works much better (Monnier)
616    2. representation of weights, frequencies, and probabilities in MLRISC
617       changed in preparation of using those for weighted block placement
618       (Reppy, George)
619    
620    ----------------------------------------------------------------------
621    Name: Lal George
622    Date: 2002/03/07 14:44:24 EST 2002
623    Tag: george-20020307-weighted-block-placement
624    
625    Tested the weighted block placement optimization on all architectures
626    (except the hppa) using AMPL to generate the block and edge frequencies.
627    Changes were required in the machine properties to correctly
628    categorize trap instructions. There is an MLRISC flag
629    "weighted-block-placement" that can be used to enable weighted block
630    placement, but this will be ineffective without block/edge
631    frequencies (coming soon).
632    
633    
634    ----------------------------------------------------------------------
635    Name: Lal George
636    Date: 2002/03/05 17:24:48 EST
637    Tag: george-20020305-linkage-cluster
638    
639    In order to support the block placement optimization, a new cluster
640    is generated as the very first cluster (called the linkage cluster).
641    It contains a single jump to the 'real' entry point for the compilation
642    unit. Block placement has no effect on the linkage cluster itself, but
643    all the other clusters  have full freedom in the manner in which they
644    reorder blocks or functions.
645    
646    On the x86 the typical linkage code that is generated is:
647       ----------------------
648            .align 2
649       L0:
650            addl    $L1-L0, 72(%esp)
651            jmp     L1
652    
653    
654            .align  2
655       L1:
656       ----------------------
657    
658    72(%esp) is the memory location for the stdlink register. This
659    must contain the address of the CPS function being called. In the
660    above example, it contains the address of  L0; before
661    calling L1 (the real entry point for the compilation unit), it
662    must contain the address for L1, and hence
663    
664            addl $L1-L0, 72(%esp)
665    
666    I have tested this on all architectures except the hppa.The increase
667    in code size is of course negligible
668    
669    ----------------------------------------------------------------------
670    Name: Allen Leung
671    Date: 2002/03/03 13:20:00 EST
672    Tag: leunga-20020303-mlrisc-tools
673    
674      Added #[ ... ] expressions to mlrisc tools
675    
676    ----------------------------------------------------------------------
677    Name: Matthias Blume
678    Date: 2002/02/27 12:29:00 EST
679    Tag: blume-20020227-cdebug
680    Description:
681    
682    - made types in structure C and C_Debug to be equal
683    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
684    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
685    
686    ----------------------------------------------------------------------
687    Name: Matthias Blume
688    Date: 2002/02/26 12:00:00 EST
689    Tag: blume-20020226-ffi
690    Description:
691    
692    1. Fixed a minor bug in CM's "noweb" tool:
693       If numbering is turned off, then truly don't number (i.e., do not
694       supply the -L option to noweb).  The previous behavior was to supply
695       -L'' -- which caused noweb to use the "default" line numbering scheme.
696       Thanks to Chris Richards for pointing this out (and supplying the fix).
697    
698    2. Once again, I reworked some aspects of the FFI:
699    
700       A. The incomplete/complete type business:
701    
702       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
703         gone!
704       - ML types representing an incomplete type are now *equal* to
705         ML types representing their corresponding complete types (just like
706         in C).  This is still safe because ml-nlffigen will not generate
707         RTTI for incomplete types, nor will it generate functions that
708         require access to such RTTI.   But when ML code generated from both
709         incomplete and complete versions of the C type meet, the ML types
710         are trivially interoperable.
711    
712         NOTE:  These changes restore the full generality of the translation
713         (which was previously lost when I eliminated functorization)!
714    
715       B. Enum types:
716    
717       - Structure C now has a type constructor "enum" that is similar to
718         how the "su" constructor works.  However, "enum" is not a phantom
719         type because each "T enum" has values (and is isomorphic to
720         MLRep.Signed.int).
721       - There are generic access operations for enum objects (using
722         MLRep.Signed.int).
723       - ml-nlffigen will generate a structure E_foo for each "enum foo".
724         * The structure contains the definition of type "mlrep" (the ML-side
725         representation type of the enum).  Normally, mlrep is the same
726         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
727         then mlrep will be defined as a datatype -- thus facilitating
728         pattern matching on mlrep values.
729         ("-ec" will be suppressed if there are duplicate values in an
730          enumeration.)
731         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
732         will be generated for each C enum constant xxx.
733         * Conversion functions m2i and i2m convert between mlrep and
734         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
735         * Coversion functions c and ml convert between mlrep and "tag enum".
736         * Access functions (get/set) fetch and store mlrep values.
737       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
738         enumerations are merged into one single enumeration represented by
739         structure E_'.
740    
741    ----------------------------------------------------------------------
742    Name: Allen Leung
743    Date: 2002/02/25 04:45:00 EST
744    Tag: leunga-20020225-cps-spill
745    
746    This is a new implementation of the CPS spill phase.
747    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
748    In case of problems, replace it with the old file spill.sml
749    
750    The current compiler runs into some serious performance problems when
751    constructing a large record.  This can happen when we try to compile a
752    structure with many items.  Even a very simple structure like the following
753    makes the compiler slow down.
754    
755        structure Foo = struct
756           val x_1 = 0w1 : Word32.int
757           val x_2 = 0w2 : Word32.int
758           val x_3 = 0w3 : Word32.int
759           ...
760           val x_N = 0wN : Word32.int
761        end
762    
763    The following table shows the compile time, from N=1000 to N=4000,
764    with the old compiler:
765    
766    N
767    1000   CPS 100 spill                           0.04u  0.00s  0.00g
768           MLRISC ra                               0.06u  0.00s  0.05g
769              (spills = 0 reloads = 0)
770           TOTAL                                   0.63u  0.07s  0.21g
771    
772    1100   CPS 100 spill                           8.25u  0.32s  0.64g
773           MLRISC ra                               5.68u  0.59s  3.93g
774              (spills = 0 reloads = 0)
775           TOTAL                                   14.71u  0.99s  4.81g
776    
777    1500   CPS 100 spill                           58.55u  2.34s  1.74g
778           MLRISC ra                               5.54u  0.65s  3.91g
779              (spills = 543 reloads = 1082)
780           TOTAL                                   65.40u  3.13s  6.00g
781    
782    2000   CPS 100 spill                           126.69u  4.84s  3.08g
783           MLRISC ra                               0.80u  0.10s  0.55g
784              (spills = 42 reloads = 84)
785           TOTAL                                   129.42u  5.10s  4.13g
786    
787    3000   CPS 100 spill                           675.59u  19.03s  11.64g
788           MLRISC ra                               2.69u  0.27s  1.38g
789              (spills = 62 reloads = 124)
790           TOTAL                                   682.48u  19.61s  13.99g
791    
792    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
793           MLRISC ra                               4.96u  0.27s  2.72g
794              (spills = 85 reloads = 170)
795           TOTAL                                   2375.26u  57.21s  48.00g
796    
797    As you can see the old cps spill module suffers from some serious
798    performance problem.  But since I cannot decipher the old code fully,
799    instead of patching the problems up, I'm reimplementing it
800    with a different algorithm.  The new code is more modular,
801    smaller when compiled, and substantially faster
802    (O(n log n) time and O(n) space).  Timing of the new spill module:
803    
804    4000  CPS 100 spill                           0.02u  0.00s  0.00g
805          MLRISC ra                               0.25u  0.02s  0.15g
806             (spills=1 reloads=3)
807          TOTAL                                   7.74u  0.34s  1.62g
808    
809    Implementation details:
810    
811    As far as I can tell, the purpose of the CPS spill module is to make sure the
812    number of live variables at any program point (the bandwidth)
813    does not exceed a certain limit, which is determined by the
814    size of the spill area.
815    
816    When the bandwidth is too large, we decrease the register pressure by
817    packing live variables into spill records.  How we achieve this is
818    completely different than what we did in the old code.
819    
820    First, there is something about the MLRiscGen code generator
821    that we should be aware of:
822    
823    o MLRiscGen performs code motion!
824    
825       In particular, it will move floating point computations and
826       address computations involving only the heap pointer to
827       their use sites (if there is only a single use).
828       What this means is that if we have a CPS record construction
829       statement
830    
831           RECORD(k,vl,w,e)
832    
833       we should never count the new record address w as live if w
834       has only one use (which is often the case).
835    
836       We should do something similar to floating point, but the transformation
837       there is much more complex, so I won't deal with that.
838    
839    Secondly, there are now two new cps primops at our disposal:
840    
841     1. rawrecord of record_kind option
842        This pure operator allocates some uninitialized storage from the heap.
843        There are two forms:
844    
845         rawrecord NONE [INT n]  allocates a tagless record of length n
846         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
847                                     and initializes the tag.
848    
849     2. rawupdate of cty
850          rawupdate cty (v,i,x)
851          Assigns to x to the ith component of record v.
852          The storelist is not updated.
853    
854    We use these new primops for both spilling and increment record construction.
855    
856     1. Spilling.
857    
858        This is implemented with a linear scan algorithm (but generalized
859        to trees).  The algorithm will create a single spill record at the
860        beginning of the cps function and use rawupdate to spill to it,
861        and SELECT or SELp to reload from it.  So both spills and reloads
862        are fine-grain operations.  In contrast, in the old algorithm
863        "spills" have to be bundled together in records.
864    
865        Ideally, we should sink the spill record construction to where
866        it is needed.  We can even split the spill record into multiple ones
867        at the places where they are needed.  But CPS is not a good
868        representation for global code motion, so I'll keep it simple and
869        am not attempting this.
870    
871     2. Incremental record construction (aka record splitting).
872    
873        Long records with many component values which are simulatenously live
874        (recall that single use record addresses are not considered to
875         be live) are constructed with rawrecord and rawupdate.
876        We allocate space on the heap with rawrecord first, then gradually
877        fill it in with rawupdate.  This is the technique suggested to me
878        by Matthias.
879    
880        Some restrictions on when this is applicable:
881        1. It is not a VECTOR record.  The code generator currently does not handle
882           this case. VECTOR record uses double indirection like arrays.
883        2. All the record component values are defined in the same "basic block"
884           as the record constructor.  This is to prevent speculative
885           record construction.
886    
887    ----------------------------------------------------------------------
888    Name: Allen Leung
889    Date: 2002/02/22 01:02:00 EST
890    Tag: leunga-20020222-mlrisc-tools
891    
892    Minor bug fixes in the parser and rewriter
893    
894    ----------------------------------------------------------------------
895    Name: Allen Leung
896    Date: 2002/02/21 20:20:00 EST
897    Tag: leunga-20020221-peephole
898    
899    Regenerated the peephole files.  Some contained typos in the specification
900    and some didn't compile because of pretty printing bugs in the old version
901    of 'nowhere'.
902    
903    ----------------------------------------------------------------------
904    Name: Allen Leung
905    Date: 2002/02/19 20:20:00 EST
906    Tag: leunga-20020219-mlrisc-tools
907    Description:
908    
909       Minor bug fixes to the mlrisc-tools library:
910    
911       1.  Fixed up parsing colon suffixed keywords
912       2.  Added the ability to shut the error messages up
913       3.  Reimplemented the pretty printer and fixed up/improved
914           the pretty printing of handle and -> types.
915       4.  Fixed up generation of literal symbols in the nowhere tool.
916       5.  Added some SML keywords to to sml.sty
917    
918    ----------------------------------------------------------------------
919    Name: Matthias Blume
920    Date: 2002/02/19 16:20:00 EST
921    Tag: blume-20020219-cmffi
922    Description:
923    
924    A wild mix of changes, some minor, some major:
925    
926    * All C FFI-related libraries are now anchored under $c:
927        $/c.cm      --> $c/c.cm
928        $/c-int.cm  --> $c/internals/c-int.cm
929        $/memory.cm --> $c/memory/memory.cm
930    
931    * "make" tool (in CM) now treats its argument pathname slightly
932      differently:
933        1. If the native expansion is an absolute name, then before invoking
934           the "make" command on it, CM will apply OS.Path.mkRelative
935           (with relativeTo = OS.FileSys.getDir()) to it.
936        2. The argument will be passed through to subsequent phases of CM
937           processing without "going native".  In particular, if the argument
938           was an anchored path, then "make" will not lose track of that anchor.
939    
940    * Compiler backends now "know" their respective C calling conventions
941      instead of having to be told about it by ml-nlffigen.  This relieves
942      ml-nlffigen from one of its burdens.
943    
944    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
945    
946    * Export C_DEBUG and C_Debug from $c/c.cm.
947    
948    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
949      subtyping relationship between incomplete pointers and their complete
950      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
951      with the convention of instantiating 'o with ('t, 'c) obj whenever
952      the pointer target type is complete.  In the incomplete case, 'o
953      will be instantiated with some "'c iobj" -- a type obtained by
954      using one of the functors PointerToIncompleteType or PointerToCompleteType.
955    
956      Operations that work on both incomplete and complete pointer types are
957      typed as taking an 'o ptr while operations that require the target to
958      be known are typed as taking some ('t, 'c) obj ptr.
959    
960      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
961      where void is an eqtype without any values.  This makes it possible
962      to work on voidptr values using functions meant to operate on light
963      incomplete pointers.
964    
965    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
966      been vastly simplified.
967    
968    ----------------------------------------------------------------------
969    Name: Matthias Blume
970    Date: 2002/02/19 10:48:00 EST
971    Tag: blume-20020219-pqfix
972    Description:
973    
974    Applied Chris Okasaki's bug fix for priority queues.
975    
976    ----------------------------------------------------------------------
977    Name: Matthias Blume
978    Date: 2002/02/15 17:05:00
979    Tag: Release_110_39
980    Description:
981    
982    Last-minute retagging is becoming a tradition... :-(
983    
984    This is the working release 110.39.
985    
986    ----------------------------------------------------------------------
987    Name: Matthias Blume
988    Date: 2002/02/15 16:00:00 EST
989    Tag: Release_110_39-orig
990    Description:
991    
992    Working release 110.39.  New bootfiles.
993    
994    (Update: There was a small bug in the installer so it wouldn't work
995    with all shells.  So I retagged. -Matthias)
996    
997    ----------------------------------------------------------------------
998    Name: Matthias Blume
999    Date: 2002/02/15 14:17:00 EST
1000    Tag: blume-20020215-showbindings
1001    Description:
1002    
1003    Added EnvRef.listBoundSymbols and CM.State.showBindings.  Especially
1004    the latter can be useful for exploring what bindings are available at
1005    the interactive prompt.  (The first function returns only the list
1006    of symbols that are really bound, the second prints those but also the
1007    ones that CM's autoloading mechanism knows about.)
1008    
1009    ----------------------------------------------------------------------
1010    Name: Matthias Blume
1011  Date: 2002/02/15 12:08:00 EST  Date: 2002/02/15 12:08:00 EST
1012  Tag: blume-20020215-iptrs  Tag: blume-20020215-iptrs
1013  Description:  Description:

Legend:
Removed from v.1067  
changed lines
  Added in v.1181

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0