Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1073, Fri Feb 15 22:07:38 2002 UTC revision 1201, Fri May 17 20:48:38 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Matthias Blume  Name: Matthias Blume
17    Date: 2002/05/17 16:49:00 EDT
18    Tag: blume-20020517-controls
19    Description:
20    
21    0. John Reppy made several modifications to the SML/NJ library.
22       In particular, there is a shiny new controls-lib.cm.
23    
24    1. Pushed new controls interface through compiler so that everything
25       compiles again.
26    
27    2. Added FormatComb and FORMAT_COMB to the CML version of the
28       SML/NJ library (so that CML compiles again).
29    
30    3. Modified init scripts because XXX_DEFAULT environment variables
31       are no longer with us.  (Boot-time initialization is now done
32       using the same environment variables that are also used for
33       startup-time initialization of controls.)
34    
35    ----------------------------------------------------------------------
36    Name: Lal George
37    Date: 2002/05/15 09:20:10 EDT
38    Tag: george-20020515-pseudo-op-decls
39    Description:
40    
41            All pseudo-ops emitted before the first segment declaration
42    such as TEXT, DATA, and BSS directives are assumed to be global
43    declarations and are emitted first in the assembly file. This is
44    useful in a number of situations where one has pseudo-ops that are not
45    specific to any segment, and also works around the constraint that one
46    cannot have client pseudo-ops in the TEXT segment.
47    
48            Because no segment is associated with these declarations it is
49    an error to allocate any space or objects before the first segment
50    directive and an exception will be raised. However, we cannot make
51    this check for client pseudo-ops.
52    
53            These top level declarations are a field in the CFG graph_info.
54    In theory you can continue to add to this field after the CFG has been
55    built -- provided you know what you are doing;-)
56    
57    ----------------------------------------------------------------------
58    Name: Matthias Blume
59    Date: 2002/05/13 16:40:00 EDT
60    Tag: blume-20020513-pp-etc
61    Description:
62    
63    A few minor bugfixes:
64    
65      - Stopgap measure for bug recently reported by Elsa Gunter (ppDec).
66        (Bogus printouts for redefined bindings still occur.  Compiler
67        bug should no longer occur now.  We need to redo the prettyprinter
68        from scratch.)
69    
70      - CM pathname printer now also adds escape sequences for ( and )
71    
72      - commend and docu fixes for ml-nlffi
73    
74    ----------------------------------------------------------------------
75    Name: Matthias Blume
76    Date: 2002/05/10 16:40:00 EDT
77    Tag: blume-20020510-erg-textio
78    Description:
79    
80    Applied the following bugfix provided by Emden Gansner:
81    
82        Output is corrupted when outputSubstr is used rather than output.
83    
84        The problem occurs when a substring
85    
86            ss = (s, dataStart, dataLen)
87    
88        where dataStart > 0, fills a stream buffer with avail bytes left.
89        avail bytes of s, starting at index dataStart, are copied into the
90        buffer, the buffer is flushed, and then the remaining dataLen-avail
91        bytes of ss are copied into the beginning of the buffer. Instead of
92        starting this copy at index dataStart+avail in s, the current code
93        starts the copy at index avail.
94    
95        Fix:
96        In text-io-fn.sml, change line 695 from
97             val needsFlush = copyVec(v, avail, dataLen-avail, buf, 0)
98        to
99             val needsFlush = copyVec(v, dataStart+avail, dataLen-avail, buf, 0)
100    
101    ----------------------------------------------------------------------
102    Name: Matthias Blume
103    Date: 2002/04/12 13:55:00 EDT
104    Tag: blume-20020412-assyntax
105    Description:
106    
107    1. Grabbed newer assyntax.h from the XFree86 project.
108    2. Fiddled with how to compile X86.prim.asm without warnings.
109    3. (Very) Minor cleanup in CM.
110    
111    ----------------------------------------------------------------------
112    Name: Matthias Blume
113    Date: 2002/04/01 (no joke!) 17:07:00 EST
114    Tag: blume-20020401-x86div
115    Description:
116    
117    Added full support for div/mod/rem/quot on the x86, using the machine
118    instruction's two results (without clumsily recomputing the remainder)
119    directly where appropriate.
120    
121    Some more extensive power-of-two support was added to the x86 instruction
122    selector (avoiding expensive divs, mods, and muls where they can be
123    replaced with cheaper shifts and masks).  However, this sort of thing
124    ought to be done earlier, e.g., within the CPS optimizer so that
125    all architectures benefit from it.
126    
127    The compiler compiles to a fixed point, but changes might be somewhat
128    fragile nevertheless.  Please, report any strange things that you might
129    see wrt. div/mod/quot/rem...
130    
131    ----------------------------------------------------------------------
132    Name: Matthias Blume
133    Date: 2002/03/29 17:22:00
134    Tag: blume-20020329-div
135    Description:
136    
137    Fixed my broken div/mod logic.  Unfortunately, this means that the
138    inline code for div/mod now has one more comparison than before.
139    Fast paths (quotient > 0 or remainder = 0) are not affected, though.
140    The problem was with quotient = 0, because that alone does not tell
141    us which way the rounding went.  One then has to look at whether
142    remainder and divisor have the same sign...  :(
143    
144    Anyway, I replaced the bootfiles with fresh ones...
145    
146    ----------------------------------------------------------------------
147    Name: Matthias Blume
148    Date: 2002/03/29 14:10:00 EST
149    Tag: blume-20020329-inlprims
150    Description:
151    
152    NEW BOOTFILES!!!    Version number bumped to 110.39.3.
153    
154    Primops have changed. This means that the bin/boot-file formats have
155    changed as well.
156    
157    To make sure that there is no confusion, I made a new version.
158    
159    
160    CHANGES:
161    
162    * removed REMT from mltree (remainder should never overflow).
163    
164    * added primops to deal with divisions of all flavors to the frontend
165    
166    * handled these primops all the way through so they map to their respective
167      MLRISC support
168    
169    * used these primops in the implementation of Int, Int32, Word, Word32
170    
171    * removed INLDIV, INLMOD, and INLREM as they are no longer necessary
172    
173    * parameterized INLMIN, INLMAX, and INLABS by a numkind
174    
175    * translate.sml now deals with all flavors of INL{MIN,MAX,ABS}, including
176      floating point
177    
178    * used INL{MIN,MAX,ABS} in the implementation of Int, Int32, Word, Word32,
179      and Real (but Real.abs maps to a separate floating-point-only primop)
180    
181    
182    TODO items:
183    
184    * Hacked Alpha32 instruction selection, disabling the selection of REMx
185      instructions because the machine instruction encoder cannot handle
186      them.  (Hppa, PPC, and Sparc instruction selection did not handle
187      REM in the first place, and REM is supported by the x86 machine coder.)
188    
189    * Handle DIV and MOD with DIV_TO_NEGINF directly in the x86 instruction
190      selection phase.  (The two can be streamlined because the hardware
191      delivers both quotient and remainder at the same time anyway.)
192    
193    * Think about what to do with "valOf(Int32.minInt) div ~1" and friends.
194      (Currently the behavior is inconsistent both across architectures and
195      wrt. the draft Basis spec.)
196    
197    * Word8 should eventually be handled natively, too.
198    
199    * There seems to be one serious bug in mltree-gen.sml.  It appears, though,
200      as if there currently is no execution path that could trigger it in
201      SML/NJ.  (The assumptions underlying functions arith and promotable do not
202      hold for things like multiplication and division.)
203    
204    ----------------------------------------------------------------------
205    Name: Matthias Blume
206    Date: 2002/03/27 16:27:00 EST
207    Tag: blume-20020327-mlrisc-divisions
208    Description:
209    
210    Added support for all four division operations (ML's div, mod, quot,
211    and rem) to MLRISC.  In the course of doing so, I also rationalized
212    the naming (no more annoying switch-around of DIV and QUOT), by
213    parameterizing the operation by div_rounding_mode (which can be either
214    DIV_TO_ZERO or DIV_TO_NEGINF).
215    
216    The generic MLTreeGen functor takes care of compiling all four
217    operations down to only round-to-zero div.
218    
219    Missing pieces:
220    
221      * Doing something smarter than relying on MLTreeGen on architectures
222        like, e.g., the x86 where hardware division delivers both quotient and
223        remainder at the same time.  With this, the implementation of the
224        round-to-neginf operations could be further streamlined.
225    
226      * Remove inlining support for div/mod/rem from the frontend and replace it
227        with primops that get carried through to the backend.  Do this for all
228        int and word types.
229    
230    ----------------------------------------------------------------------
231    Name: Matthias Blume
232    Date: 2002/03/25 17:25:00 EST
233    Tag: blume-20020325-divmod
234    Description:
235    
236    I improved (hopefully without breaking them) the implementation of Int.div,
237    Int.mod, and Int.rem.   For this, the code in translate.sml now takes
238    advantage of the following observations:
239    
240      Let  q = x quot y      r = x rem y
241           d = x div  y      m = x mod y
242    
243    where "quot" is the round-to-zero version of integer division that
244    hardware usually provides.  Then we have:
245    
246         r = x - q * y        where neither the * nor the - will overflow
247         d = if q >= 0 orelse x = q * y then q else q - 1
248                              where neither the * nor the - will overflow
249         m = if q >= 0 orelse r = 0 then r else r + y
250                              where the + will not overflow
251    
252    This results in substantial simplification of the generated code.
253    The following table shows the number of CFG nodes and edges generated
254    for
255            fun f (x, y) = x OPER y
256            (* with OPER \in div, mod, quot, rem *)
257    
258    
259        OPER | nodes(old) | edges(old) | nodes(new) | edges(new)
260        --------------------------------------------------------
261         div |         24 |         39 |         12 |         16
262         mod |         41 |         71 |         12 |         16
263        quot |          8 |         10 |          8 |         10
264         rem |         10 |         14 |          8 |         10
265    
266    
267    ----------------------------------------------------------------------
268    Name: Matthias Blume
269    Date: 2002/03/25 22:06:00 EST
270    Tag: blume-20020325-cprotobug
271    Description:
272    
273    Fixed a bug in cproto (c prototype decoder).
274    
275    ----------------------------------------------------------------------
276    Name: Matthias Blume
277    Date: 2002/03/25 16:00:00 EST
278    Tag: blume-20020325-raw-primops
279    Description:
280    
281    I did some cleanup to Allen's new primop code and
282    replaced yesterday's bootfiles with new ones.
283    (But they are stored in the same place.)
284    
285    ----------------------------------------------------------------------
286    Name: Matthias Blume
287    Date: 2002/03/24 22:40:00 EST
288    Tag: blume-20020324-bootfiles
289    Description:
290    
291    Made the bootfiles that Allen asked for.
292    
293    ----------------------------------------------------------------------
294    Name: Allen Leung
295    Date: 2002/03/23 15:50:00 EST
296    Tag: leunga-20020323-flint-cps-rcc-primops
297    Description:
298    
299      1. Changes to FLINT primops:
300    
301        (* make a call to a C-function;
302         * The primop carries C function prototype information and specifies
303         * which of its (ML-) arguments are floating point. C prototype
304         * information is for use by the backend, ML information is for
305         * use by the CPS converter. *)
306      | RAW_CCALL of { c_proto: CTypes.c_proto,
307                       ml_args: ccall_type list,
308                       ml_res_opt: ccall_type option,
309                       reentrant : bool
310                     } option
311       (* Allocate uninitialized storage on the heap.
312        * The record is meant to hold short-lived C objects, i.e., they
313        * are not ML pointers.  With the tag, the representation is
314        * the same as RECORD with tag tag_raw32 (sz=4), or tag_fblock (sz=8)
315        *)
316      | RAW_RECORD of {tag:bool,sz:int}
317      and ccall_type = CCALL_INT32 | CCALL_REAL64 | CCALL_ML_PTR
318    
319      2.  These CPS primops are now overloaded:
320    
321           rawload of {kind:numkind}
322           rawstore of {kind:numkind}
323    
324          The one argument form is:
325    
326             rawload {kind} address
327    
328          The two argument form is:
329    
330             rawload {kind} [ml object, byte-offset]
331    
332      3. RAW_CCALL/RCC now takes two extra arguments:
333    
334         a. The first is whether the C call is reentrant, i.e., whether
335            ML state should be saved and restored.
336         b. The second argument is a string argument specifying the name of
337            library and the C function.
338    
339         These things are currently not handled in the code generator, yet.
340    
341      4. In CProto,
342    
343         An encoding type of "bool" means "ml object" and is mapped into
344         C prototype of PTR.  Note that "bool" is different than "string",
345         even though "string" is also mapped into PTR, because "bool"
346         is assigned an CPS type of BOGt, while "string" is assigned INT32t.
347    
348      5. Pickler/unpicker
349    
350         Changed to handle RAW_RECORD and newest RAW_CCALL
351    
352      6. MLRiscGen,
353    
354         1. Changed to handle the new rawload/rawstore/rawrecord operators.
355         2. Code for handling C Calls has been moved to a new module CPSCCalls,
356            in the file CodeGen/cpscompile/cps-c-calls.sml
357    
358      7. Added the conditional move operator
359    
360             condmove of branch
361    
362         to cps.  Generation of this is still buggy so it is currently
363         disabled.
364    
365    ----------------------------------------------------------------------
366    Name: Lal George
367    Date: 2002/03/22 14:18:25 EST
368    Tag: george-20020322-cps-branch-prob
369    Description:
370    
371    Implemented the Ball-Larus branch prediction-heuristics, and
372    incorporated graphical viewers for control flow graphs.
373    
374    Ball-Larus Heuristics:
375    ---------------------
376    See the file compiler/CodeGen/cpscompile/cpsBranchProb.sml.
377    
378    By design it uses the Dempster-Shafer theory for combining
379    probabilities.  For example, in the function:
380    
381        fun f(n,acc) = if n = 0 then acc else f(n-1, n*acc)
382    
383    the ball-larus heuristics predicts that the n=0 is unlikely
384    (OH-heuristic), and the 'then' branch is unlikely because of the
385    RH-heuristic -- giving the 'then' branch an even lower combined
386    probability using the Dempster-Shafer theory.
387    
388    Finally, John Reppy's loop analysis in MLRISC, further lowers the
389    probability of the 'then' branch because of the loop in the else
390    branch.
391    
392    
393    Graphical Viewing:
394    ------------------
395    I merely plugged in Allen's graphical viewers into the compiler. The
396    additional code is not much. At the top level, saying:
397    
398            Control.MLRISC.getFlag "cfg-graphical-view" := true;
399    
400    will display the graphical view of the control flow graph just before
401    back-patching.  daVinci must be in your path for this to work. If
402    daVinci is not available, then the default viewer can be changed
403    using:
404    
405            Control.MLRISC.getString "viewer"
406    
407    which can be set to "dot" or "vcg" for the corresponding viewers. Of
408    course, these viewers must be in your path.
409    
410    The above will display the compilation unit at the level of clusters,
411    many of which are small, boring, and un-interesting. Also setting:
412    
413            Control.MLRISC.getInt "cfg-graphical-view_size"
414    
415    will display clusters that are larger than the value set by the above.
416    
417    
418    ----------------------------------------------------------------------
419    Name: Matthias Blume
420    Date: 2002/03/21 22:20:00 EST
421    Tag: blume-20020321-kmp-bugfix
422    Description:
423    
424    Changed the interface to the KMP routine in PreString and fixed
425    a minor bug in one place where it was used.
426    
427    ----------------------------------------------------------------------
428    Name: Allen Leung
429    Date: 2002/03/21 20:30:00 EST
430    Tag: leunga-20020321-cfg
431    Description:
432    
433      Fixed a potential problem in cfg edge splitting.
434    
435    ----------------------------------------------------------------------
436    Name: Allen Leung
437    Date: 2002/03/21 17:15:00 EST
438    Tag: leunga-20020321-x86-fp-cfg
439    Description:
440    
441      1. Recoded the buggy parts of x86-fp.
442    
443         a. All the block reordering code has been removed.
444            We now depend on the block placement phases to do this work.
445    
446         b. Critical edge splitting code has been simplified and moved into the
447            CFG modules, as where they belong.
448    
449         Both of these were quite buggy and complex.  The code is now much, much
450         simpler.
451    
452      2. X86 backend.
453    
454         a. Added instructions for 64-bit support.  Instruction selection for
455            64-bit has not been committed, however, since that
456            requires changes to MLTREE which haven't been approved by
457            Lal and John.
458    
459         b. Added support for FUCOMI and FUCOMIP when generating code for
460            PentiumPro and above.  We only generate these instructions in
461            the fast-fp mode.
462    
463         c. Added cases for JP and JNP in X86FreqProps.
464    
465      3. CFG
466    
467         CFG now has a bunch of methods for edge splitting and merging.
468    
469      4. Machine description.
470    
471         John's simplification of MLTREE_BASIS.fcond broke a few machine
472         description things:
473    
474         rtl-build.{sig,sml} and hppa.mdl fixed.
475    
476         NOTE: the machine description stuff in the repository is still broken.
477               Again, I can't put my fixes in because that involves
478               changes to MLTREE.
479    
480    ----------------------------------------------------------------------
481    Name: Matthias Blume
482    Date: 2002/03/20 15:55:00 EST
483    Tag: blume-20020320-kmp
484    Description:
485    
486    Implemented Knuth-Morris-Pratt string matching in PreString and used
487    it for String.isSubstring, Substring.isSubstring, and
488    Substring.position.
489    
490    (Might need some stress-testing.  Simple examples worked fine.)
491    
492    ----------------------------------------------------------------------
493    Name: Matthias Blume
494    Date: 2002/03/19 16:37:00 EST
495    Tag: blume-20020319-witnesses
496    Description:
497    
498    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
499    
500    This implements a generic mechanism for changing constness qualifiers
501    anywhere within big C types without resorting to outright "casts".
502    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
503    modify the constness at the outermost level.)
504    The implementation of "convert" is based on the idea of "witness"
505    values -- values that are not used by the operation but whose types
506    "testify" to their applicability.  On the implementation side, "convert"
507    is simply a projection (returning its second curried argument).  With
508    cross-module inlining, it should not result in any machine code being
509    generated.
510    
511    ----------------------------------------------------------------------
512    Name: Matthias Blume
513    Date: 2002/03/15 16:40:00 EST
514    Tag: blume-20020315-basis
515    Description:
516    
517    Provided (preliminary?) implementations for
518    
519      {String,Substring}.{concatWith,isSuffix,isSubstring}
520    
521    and
522    
523      Substring.full
524    
525    Those are in the Basis spec but they were missing in SML/NJ.
526    
527    ----------------------------------------------------------------------
528    Name: Matthias Blume
529    Date: 2002/03/14 21:30:00 EST
530    Tag: blume-20020314-controls
531    Description:
532    
533    Controls:
534    ---------
535    
536    1. Factored out the recently-added Controls : CONTROLS stuff and put
537       it into its own library $/controls-lib.cm.  The source tree for
538       this is under src/smlnj-lib/Controls.
539    
540    2. Changed the names of types and functions in this interface, so they
541       make a bit more "sense":
542    
543          module -> registry
544          'a registry -> 'a group
545    
546    3. The interface now deals in ref cells only.  The getter/setter interface
547       is (mostly) gone.
548    
549    4. Added a function that lets one register an already-existing ref cell.
550    
551    5. Made the corresponding modifications to the rest of the code so that
552       everything compiles again.
553    
554    6. Changed the implementation of Controls.MLRISC back to something closer
555       to the original.  In particular, this module (and therefore MLRISC)
556       does not depend on Controls.  There now is some link-time code in
557       int-sys.sml that registers the MLRISC controls with the Controls
558       module.
559    
560    CM:
561    ---
562    
563      * One can now specify the lambda-split aggressiveness in init.cmi.
564    
565    ----------------------------------------------------------------------
566    Name: Allen Leung
567    Date: 2002/03/13 17:30:00 EST
568    Tag: leunga-20020313-x86-fp-unary
569    Description:
570    
571    Bug fix for:
572    
573    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
574    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
575    > - fun f(x,(y,z)) = Real.~ y;
576    > [autoloading]
577    > [autoloading done]
578    >       fchsl   (%eax), 184(%esp)
579    > Error: MLRisc bug: X86MCEmitter.emitInstr
580    >
581    > uncaught exception Error
582    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
583    
584    The problem was that the code generator did not generate any fp registers
585    in this case, and the ra didn't know that it needed to run the X86FP phase to
586    translate the pseudo fp instruction.   This only happened with unary fp
587    operators in certain situations.
588    
589    ----------------------------------------------------------------------
590    Name: Matthias Blume
591    Date: 2002/03/13 14:00:00 EST
592    Tag: blume-20020313-overload-etc
593    Description:
594    
595    1. Added _overload as a synonym for overload for backward compatibility.
596       (Control.overloadKW must be true for either version to be accepted.)
597    
598    2. Fixed bug in install script that caused more things to be installed
599       than what was requested in config/targets.
600    
601    3. Made CM aware of the (_)overload construct so that autoloading
602       works.
603    
604    ----------------------------------------------------------------------
605    Name: Matthias Blume
606    Date: 2002/03/12 22:03:00 EST
607    Tag: blume-20020312-url
608    Description:
609    
610    Forgot to update BOOT and srcarchiveurl.
611    
612    ----------------------------------------------------------------------
613    Name: Matthias Blume
614    Date: 2002/03/12 17:30:00 EST
615    Tag: blume-20020312-version110392
616    Description:
617    
618    Yet another version number bump (because of small changes to the
619    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
620    
621    Changes:
622    
623      The new pid generation scheme described a few weeks ago was overly
624      complicated.  I implemented a new mechanism that is simpler and
625      provides a bit more "stability":  Once CM has seen a compilation
626      unit, it keeps its identity constant (as long as you do not delete
627      those crucial CM/GUID/* files).  This means that when you change
628      an interface, compile, then go back to the old interface, and
629      compile again, you arrive at the original pid.
630    
631      There now also is a mechanism that instructs CM to use the plain
632      environment hash as a module's pid (effectively making its GUID
633      the empty string).  For this, "noguid" must be specified as an
634      option to the .sml file in question within its .cm file.
635      This is most useful for code that is being generated by tools such
636      as ml-nlffigen (because during development programmers tend to
637      erase the tool's entire output directory tree including CM's cached
638      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
639      revert to the old, broken behavior of SML/NJ, but in specific cases
640      where there is no danger of interface confusion, its use is ok
641      (I think).
642    
643      ml-nlffigen by default generates "noguid" annotations.  They can be
644      turned off by specifying -guid in its command line.
645    
646    ----------------------------------------------------------------------
647    Name: Lal George
648    Date: 2002/03/12 12 14:42:36 EST
649    Tag: george-20020312-frequency-computation
650    Description:
651    
652    Integrated jump chaining and static block frequency into the
653    compiler. More details and numbers later.
654    
655    ----------------------------------------------------------------------
656    Name: Lal George
657    Date: 2002/03/11 11 22:38:53 EST
658    Tag: george-20020311-jump-chain-elim
659    Description:
660    
661    Tested the jump chain elimination on all architectures (except the
662    hppa).  This is on by default right now and is profitable for the
663    alpha and x86, however, it may not be profitable for the sparc and ppc
664    when compiling the compiler.
665    
666    The gc test will typically jump to a label at the end of the cluster,
667    where there is another jump to an external cluster containing the actual
668    code to invoke gc. This is to allow factoring of common gc invocation
669    sequences. That is to say, we generate:
670    
671            f:
672               testgc
673               ja   L1      % jump if above to L1
674    
675            L1:
676               jmp L2
677    
678    
679    After jump chain elimination the 'ja L1' instructions is converted to
680    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
681    up being implemented in their long form (if L2 is far away) using:
682    
683            jbe     L3      % jump if below or equal to L3
684            jmp     L2
685         L3:
686            ...
687    
688    
689    For large compilation units L2  may be far away.
690    
691    
692    ----------------------------------------------------------------------
693    Name: Matthias Blume
694    Date: 2002/03/11 13:30:00 EST
695    Tag: blume-20020311-mltreeeval
696    Description:
697    
698    A functor parameter was missing.
699    
700    ----------------------------------------------------------------------
701    Name: Allen Leung
702    Date: 2002/03/11 10:30:00 EST
703    Tag: leunga-20020311-runtime-string0
704    Description:
705    
706       The representation of the empty string now points to a
707    legal null terminated C string instead of unit.  It is now possible
708    to convert an ML string into C string with InlineT.CharVector.getData.
709    This compiles into one single machine instruction.
710    
711    ----------------------------------------------------------------------
712    Name: Allen Leung
713    Date: 2002/03/10 23:55:00 EST
714    Tag: leunga-20020310-x86-call
715    Description:
716    
717       Added machine generation for CALL instruction (relative displacement mode)
718    
719    ----------------------------------------------------------------------
720    Name: Matthias Blume
721    Date: 2002/03/08 16:05:00
722    Tag: blume-20020308-entrypoints
723    Description:
724    
725    Version number bumped to 110.39.1.  NEW BOOTFILES!
726    
727    Entrypoints: non-zero offset into a code object where execution should begin.
728    
729    - Added the notion of an entrypoint to CodeObj.
730    - Added reading/writing of entrypoint info to Binfile.
731    - Made runtime system bootloader aware of entrypoints.
732    - Use the address of the label of the first function given to mlriscGen
733      as the entrypoint.  This address is currently always 0, but it will
734      not be 0 once we turn on block placement.
735    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
736      with entry points) from mlriscGen.
737    
738    ----------------------------------------------------------------------
739    Name: Allen Leung
740    Date: 2002/03/07 20:45:00 EST
741    Tag: leunga-20020307-x86-cmov
742    Description:
743    
744       Bug fixes for CMOVcc on x86.
745    
746       1. Added machine code generation for CMOVcc
747       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
748       3. CMOVcc cannot have an immediate operand as argument.
749    
750    ----------------------------------------------------------------------
751    Name: Matthias Blume
752    Date: 2002/03/07 16:15:00 EST
753    Tag: blume-20020307-controls
754    Description:
755    
756    This is a very large but mostly boring patch which makes (almost)
757    every tuneable compiler knob (i.e., pretty much everything under
758    Control.* plus a few other things) configurable via both the command
759    line and environment variables in the style CM did its configuration
760    until now.
761    
762    Try starting sml with '-h' (or, if you are brave, '-H')
763    
764    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
765    implements the underlying generic mechanism.
766    
767    The interface to some of the existing such facilities has changed somewhat.
768    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
769    (The getFoo interface is still there for backward-compatibility, but its
770    use is deprecated.)
771    
772    The ml-build script passes -Cxxx=yyy command-line arguments through so
773    that one can now twiddle the compiler settings when using this "batch"
774    compiler.
775    
776    TODO items:
777    
778    We should go through and throw out all controls that are no longer
779    connected to anything.  Moreover, we should go through and provide
780    meaningful (and correct!) documentation strings for those controls
781    that still are connected.
782    
783    Currently, multiple calls to Controls.new are accepted (only the first
784    has any effect).  Eventually we should make sure that every control
785    is being made (via Controls.new) exactly once.  Future access can then
786    be done using Controls.acc.
787    
788    Finally, it would probably be a good idea to use the getter-setter
789    interface to controls rather than ref cells.  For the time being, both
790    styles are provided by the Controls module, but getter-setter pairs are
791    better if thread-safety is of any concern because they can be wrapped.
792    
793    *****************************************
794    
795    One bug fix: The function blockPlacement in three of the MLRISC
796    backpatch files used to be hard-wired to one of two possibilities at
797    link time (according to the value of the placementFlag).  But (I
798    think) it should rather sense the flag every time.
799    
800    *****************************************
801    
802    Other assorted changes (by other people who did not supply a HISTORY entry):
803    
804    1. the cross-module inliner now works much better (Monnier)
805    2. representation of weights, frequencies, and probabilities in MLRISC
806       changed in preparation of using those for weighted block placement
807       (Reppy, George)
808    
809    ----------------------------------------------------------------------
810    Name: Lal George
811    Date: 2002/03/07 14:44:24 EST 2002
812    Tag: george-20020307-weighted-block-placement
813    
814    Tested the weighted block placement optimization on all architectures
815    (except the hppa) using AMPL to generate the block and edge frequencies.
816    Changes were required in the machine properties to correctly
817    categorize trap instructions. There is an MLRISC flag
818    "weighted-block-placement" that can be used to enable weighted block
819    placement, but this will be ineffective without block/edge
820    frequencies (coming soon).
821    
822    
823    ----------------------------------------------------------------------
824    Name: Lal George
825    Date: 2002/03/05 17:24:48 EST
826    Tag: george-20020305-linkage-cluster
827    
828    In order to support the block placement optimization, a new cluster
829    is generated as the very first cluster (called the linkage cluster).
830    It contains a single jump to the 'real' entry point for the compilation
831    unit. Block placement has no effect on the linkage cluster itself, but
832    all the other clusters  have full freedom in the manner in which they
833    reorder blocks or functions.
834    
835    On the x86 the typical linkage code that is generated is:
836       ----------------------
837            .align 2
838       L0:
839            addl    $L1-L0, 72(%esp)
840            jmp     L1
841    
842    
843            .align  2
844       L1:
845       ----------------------
846    
847    72(%esp) is the memory location for the stdlink register. This
848    must contain the address of the CPS function being called. In the
849    above example, it contains the address of  L0; before
850    calling L1 (the real entry point for the compilation unit), it
851    must contain the address for L1, and hence
852    
853            addl $L1-L0, 72(%esp)
854    
855    I have tested this on all architectures except the hppa.The increase
856    in code size is of course negligible
857    
858    ----------------------------------------------------------------------
859    Name: Allen Leung
860    Date: 2002/03/03 13:20:00 EST
861    Tag: leunga-20020303-mlrisc-tools
862    
863      Added #[ ... ] expressions to mlrisc tools
864    
865    ----------------------------------------------------------------------
866    Name: Matthias Blume
867    Date: 2002/02/27 12:29:00 EST
868    Tag: blume-20020227-cdebug
869    Description:
870    
871    - made types in structure C and C_Debug to be equal
872    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
873    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
874    
875    ----------------------------------------------------------------------
876    Name: Matthias Blume
877    Date: 2002/02/26 12:00:00 EST
878    Tag: blume-20020226-ffi
879    Description:
880    
881    1. Fixed a minor bug in CM's "noweb" tool:
882       If numbering is turned off, then truly don't number (i.e., do not
883       supply the -L option to noweb).  The previous behavior was to supply
884       -L'' -- which caused noweb to use the "default" line numbering scheme.
885       Thanks to Chris Richards for pointing this out (and supplying the fix).
886    
887    2. Once again, I reworked some aspects of the FFI:
888    
889       A. The incomplete/complete type business:
890    
891       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
892         gone!
893       - ML types representing an incomplete type are now *equal* to
894         ML types representing their corresponding complete types (just like
895         in C).  This is still safe because ml-nlffigen will not generate
896         RTTI for incomplete types, nor will it generate functions that
897         require access to such RTTI.   But when ML code generated from both
898         incomplete and complete versions of the C type meet, the ML types
899         are trivially interoperable.
900    
901         NOTE:  These changes restore the full generality of the translation
902         (which was previously lost when I eliminated functorization)!
903    
904       B. Enum types:
905    
906       - Structure C now has a type constructor "enum" that is similar to
907         how the "su" constructor works.  However, "enum" is not a phantom
908         type because each "T enum" has values (and is isomorphic to
909         MLRep.Signed.int).
910       - There are generic access operations for enum objects (using
911         MLRep.Signed.int).
912       - ml-nlffigen will generate a structure E_foo for each "enum foo".
913         * The structure contains the definition of type "mlrep" (the ML-side
914         representation type of the enum).  Normally, mlrep is the same
915         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
916         then mlrep will be defined as a datatype -- thus facilitating
917         pattern matching on mlrep values.
918         ("-ec" will be suppressed if there are duplicate values in an
919          enumeration.)
920         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
921         will be generated for each C enum constant xxx.
922         * Conversion functions m2i and i2m convert between mlrep and
923         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
924         * Coversion functions c and ml convert between mlrep and "tag enum".
925         * Access functions (get/set) fetch and store mlrep values.
926       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
927         enumerations are merged into one single enumeration represented by
928         structure E_'.
929    
930    ----------------------------------------------------------------------
931    Name: Allen Leung
932    Date: 2002/02/25 04:45:00 EST
933    Tag: leunga-20020225-cps-spill
934    
935    This is a new implementation of the CPS spill phase.
936    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
937    In case of problems, replace it with the old file spill.sml
938    
939    The current compiler runs into some serious performance problems when
940    constructing a large record.  This can happen when we try to compile a
941    structure with many items.  Even a very simple structure like the following
942    makes the compiler slow down.
943    
944        structure Foo = struct
945           val x_1 = 0w1 : Word32.int
946           val x_2 = 0w2 : Word32.int
947           val x_3 = 0w3 : Word32.int
948           ...
949           val x_N = 0wN : Word32.int
950        end
951    
952    The following table shows the compile time, from N=1000 to N=4000,
953    with the old compiler:
954    
955    N
956    1000   CPS 100 spill                           0.04u  0.00s  0.00g
957           MLRISC ra                               0.06u  0.00s  0.05g
958              (spills = 0 reloads = 0)
959           TOTAL                                   0.63u  0.07s  0.21g
960    
961    1100   CPS 100 spill                           8.25u  0.32s  0.64g
962           MLRISC ra                               5.68u  0.59s  3.93g
963              (spills = 0 reloads = 0)
964           TOTAL                                   14.71u  0.99s  4.81g
965    
966    1500   CPS 100 spill                           58.55u  2.34s  1.74g
967           MLRISC ra                               5.54u  0.65s  3.91g
968              (spills = 543 reloads = 1082)
969           TOTAL                                   65.40u  3.13s  6.00g
970    
971    2000   CPS 100 spill                           126.69u  4.84s  3.08g
972           MLRISC ra                               0.80u  0.10s  0.55g
973              (spills = 42 reloads = 84)
974           TOTAL                                   129.42u  5.10s  4.13g
975    
976    3000   CPS 100 spill                           675.59u  19.03s  11.64g
977           MLRISC ra                               2.69u  0.27s  1.38g
978              (spills = 62 reloads = 124)
979           TOTAL                                   682.48u  19.61s  13.99g
980    
981    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
982           MLRISC ra                               4.96u  0.27s  2.72g
983              (spills = 85 reloads = 170)
984           TOTAL                                   2375.26u  57.21s  48.00g
985    
986    As you can see the old cps spill module suffers from some serious
987    performance problem.  But since I cannot decipher the old code fully,
988    instead of patching the problems up, I'm reimplementing it
989    with a different algorithm.  The new code is more modular,
990    smaller when compiled, and substantially faster
991    (O(n log n) time and O(n) space).  Timing of the new spill module:
992    
993    4000  CPS 100 spill                           0.02u  0.00s  0.00g
994          MLRISC ra                               0.25u  0.02s  0.15g
995             (spills=1 reloads=3)
996          TOTAL                                   7.74u  0.34s  1.62g
997    
998    Implementation details:
999    
1000    As far as I can tell, the purpose of the CPS spill module is to make sure the
1001    number of live variables at any program point (the bandwidth)
1002    does not exceed a certain limit, which is determined by the
1003    size of the spill area.
1004    
1005    When the bandwidth is too large, we decrease the register pressure by
1006    packing live variables into spill records.  How we achieve this is
1007    completely different than what we did in the old code.
1008    
1009    First, there is something about the MLRiscGen code generator
1010    that we should be aware of:
1011    
1012    o MLRiscGen performs code motion!
1013    
1014       In particular, it will move floating point computations and
1015       address computations involving only the heap pointer to
1016       their use sites (if there is only a single use).
1017       What this means is that if we have a CPS record construction
1018       statement
1019    
1020           RECORD(k,vl,w,e)
1021    
1022       we should never count the new record address w as live if w
1023       has only one use (which is often the case).
1024    
1025       We should do something similar to floating point, but the transformation
1026       there is much more complex, so I won't deal with that.
1027    
1028    Secondly, there are now two new cps primops at our disposal:
1029    
1030     1. rawrecord of record_kind option
1031        This pure operator allocates some uninitialized storage from the heap.
1032        There are two forms:
1033    
1034         rawrecord NONE [INT n]  allocates a tagless record of length n
1035         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
1036                                     and initializes the tag.
1037    
1038     2. rawupdate of cty
1039          rawupdate cty (v,i,x)
1040          Assigns to x to the ith component of record v.
1041          The storelist is not updated.
1042    
1043    We use these new primops for both spilling and increment record construction.
1044    
1045     1. Spilling.
1046    
1047        This is implemented with a linear scan algorithm (but generalized
1048        to trees).  The algorithm will create a single spill record at the
1049        beginning of the cps function and use rawupdate to spill to it,
1050        and SELECT or SELp to reload from it.  So both spills and reloads
1051        are fine-grain operations.  In contrast, in the old algorithm
1052        "spills" have to be bundled together in records.
1053    
1054        Ideally, we should sink the spill record construction to where
1055        it is needed.  We can even split the spill record into multiple ones
1056        at the places where they are needed.  But CPS is not a good
1057        representation for global code motion, so I'll keep it simple and
1058        am not attempting this.
1059    
1060     2. Incremental record construction (aka record splitting).
1061    
1062        Long records with many component values which are simulatenously live
1063        (recall that single use record addresses are not considered to
1064         be live) are constructed with rawrecord and rawupdate.
1065        We allocate space on the heap with rawrecord first, then gradually
1066        fill it in with rawupdate.  This is the technique suggested to me
1067        by Matthias.
1068    
1069        Some restrictions on when this is applicable:
1070        1. It is not a VECTOR record.  The code generator currently does not handle
1071           this case. VECTOR record uses double indirection like arrays.
1072        2. All the record component values are defined in the same "basic block"
1073           as the record constructor.  This is to prevent speculative
1074           record construction.
1075    
1076    ----------------------------------------------------------------------
1077    Name: Allen Leung
1078    Date: 2002/02/22 01:02:00 EST
1079    Tag: leunga-20020222-mlrisc-tools
1080    
1081    Minor bug fixes in the parser and rewriter
1082    
1083    ----------------------------------------------------------------------
1084    Name: Allen Leung
1085    Date: 2002/02/21 20:20:00 EST
1086    Tag: leunga-20020221-peephole
1087    
1088    Regenerated the peephole files.  Some contained typos in the specification
1089    and some didn't compile because of pretty printing bugs in the old version
1090    of 'nowhere'.
1091    
1092    ----------------------------------------------------------------------
1093    Name: Allen Leung
1094    Date: 2002/02/19 20:20:00 EST
1095    Tag: leunga-20020219-mlrisc-tools
1096    Description:
1097    
1098       Minor bug fixes to the mlrisc-tools library:
1099    
1100       1.  Fixed up parsing colon suffixed keywords
1101       2.  Added the ability to shut the error messages up
1102       3.  Reimplemented the pretty printer and fixed up/improved
1103           the pretty printing of handle and -> types.
1104       4.  Fixed up generation of literal symbols in the nowhere tool.
1105       5.  Added some SML keywords to to sml.sty
1106    
1107    ----------------------------------------------------------------------
1108    Name: Matthias Blume
1109    Date: 2002/02/19 16:20:00 EST
1110    Tag: blume-20020219-cmffi
1111    Description:
1112    
1113    A wild mix of changes, some minor, some major:
1114    
1115    * All C FFI-related libraries are now anchored under $c:
1116        $/c.cm      --> $c/c.cm
1117        $/c-int.cm  --> $c/internals/c-int.cm
1118        $/memory.cm --> $c/memory/memory.cm
1119    
1120    * "make" tool (in CM) now treats its argument pathname slightly
1121      differently:
1122        1. If the native expansion is an absolute name, then before invoking
1123           the "make" command on it, CM will apply OS.Path.mkRelative
1124           (with relativeTo = OS.FileSys.getDir()) to it.
1125        2. The argument will be passed through to subsequent phases of CM
1126           processing without "going native".  In particular, if the argument
1127           was an anchored path, then "make" will not lose track of that anchor.
1128    
1129    * Compiler backends now "know" their respective C calling conventions
1130      instead of having to be told about it by ml-nlffigen.  This relieves
1131      ml-nlffigen from one of its burdens.
1132    
1133    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
1134    
1135    * Export C_DEBUG and C_Debug from $c/c.cm.
1136    
1137    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
1138      subtyping relationship between incomplete pointers and their complete
1139      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
1140      with the convention of instantiating 'o with ('t, 'c) obj whenever
1141      the pointer target type is complete.  In the incomplete case, 'o
1142      will be instantiated with some "'c iobj" -- a type obtained by
1143      using one of the functors PointerToIncompleteType or PointerToCompleteType.
1144    
1145      Operations that work on both incomplete and complete pointer types are
1146      typed as taking an 'o ptr while operations that require the target to
1147      be known are typed as taking some ('t, 'c) obj ptr.
1148    
1149      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
1150      where void is an eqtype without any values.  This makes it possible
1151      to work on voidptr values using functions meant to operate on light
1152      incomplete pointers.
1153    
1154    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
1155      been vastly simplified.
1156    
1157    ----------------------------------------------------------------------
1158    Name: Matthias Blume
1159    Date: 2002/02/19 10:48:00 EST
1160    Tag: blume-20020219-pqfix
1161    Description:
1162    
1163    Applied Chris Okasaki's bug fix for priority queues.
1164    
1165    ----------------------------------------------------------------------
1166    Name: Matthias Blume
1167  Date: 2002/02/15 17:05:00  Date: 2002/02/15 17:05:00
1168  Tag: Release_110_39  Tag: Release_110_39
1169  Description:  Description:

Legend:
Removed from v.1073  
changed lines
  Added in v.1201

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0