Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1068, Fri Feb 15 19:18:00 2002 UTC revision 1190, Mon May 13 20:39:05 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Matthias Blume  Name: Matthias Blume
17    Date: 2002/05/13 16:40:00 EDT
18    Tag: blume-20020513-pp-etc
19    Description:
20    
21    A few minor bugfixes:
22    
23      - Stopgap measure for bug recently reported by Elsa Gunter (ppDec).
24        (Bogus printouts for redefined bindings still occur.  Compiler
25        bug should no longer occur now.  We need to redo the prettyprinter
26        from scratch.)
27    
28      - CM pathname printer now also adds escape sequences for ( and )
29    
30      - commend and docu fixes for ml-nlffi
31    
32    ----------------------------------------------------------------------
33    Name: Matthias Blume
34    Date: 2002/05/10 16:40:00 EDT
35    Tag: blume-20020510-erg-textio
36    Description:
37    
38    Applied the following bugfix provided by Emden Gansner:
39    
40        Output is corrupted when outputSubstr is used rather than output.
41    
42        The problem occurs when a substring
43    
44            ss = (s, dataStart, dataLen)
45    
46        where dataStart > 0, fills a stream buffer with avail bytes left.
47        avail bytes of s, starting at index dataStart, are copied into the
48        buffer, the buffer is flushed, and then the remaining dataLen-avail
49        bytes of ss are copied into the beginning of the buffer. Instead of
50        starting this copy at index dataStart+avail in s, the current code
51        starts the copy at index avail.
52    
53        Fix:
54        In text-io-fn.sml, change line 695 from
55             val needsFlush = copyVec(v, avail, dataLen-avail, buf, 0)
56        to
57             val needsFlush = copyVec(v, dataStart+avail, dataLen-avail, buf, 0)
58    
59    ----------------------------------------------------------------------
60    Name: Matthias Blume
61    Date: 2002/04/12 13:55:00 EDT
62    Tag: blume-20020412-assyntax
63    Description:
64    
65    1. Grabbed newer assyntax.h from the XFree86 project.
66    2. Fiddled with how to compile X86.prim.asm without warnings.
67    3. (Very) Minor cleanup in CM.
68    
69    ----------------------------------------------------------------------
70    Name: Matthias Blume
71    Date: 2002/04/01 (no joke!) 17:07:00 EST
72    Tag: blume-20020401-x86div
73    Description:
74    
75    Added full support for div/mod/rem/quot on the x86, using the machine
76    instruction's two results (without clumsily recomputing the remainder)
77    directly where appropriate.
78    
79    Some more extensive power-of-two support was added to the x86 instruction
80    selector (avoiding expensive divs, mods, and muls where they can be
81    replaced with cheaper shifts and masks).  However, this sort of thing
82    ought to be done earlier, e.g., within the CPS optimizer so that
83    all architectures benefit from it.
84    
85    The compiler compiles to a fixed point, but changes might be somewhat
86    fragile nevertheless.  Please, report any strange things that you might
87    see wrt. div/mod/quot/rem...
88    
89    ----------------------------------------------------------------------
90    Name: Matthias Blume
91    Date: 2002/03/29 17:22:00
92    Tag: blume-20020329-div
93    Description:
94    
95    Fixed my broken div/mod logic.  Unfortunately, this means that the
96    inline code for div/mod now has one more comparison than before.
97    Fast paths (quotient > 0 or remainder = 0) are not affected, though.
98    The problem was with quotient = 0, because that alone does not tell
99    us which way the rounding went.  One then has to look at whether
100    remainder and divisor have the same sign...  :(
101    
102    Anyway, I replaced the bootfiles with fresh ones...
103    
104    ----------------------------------------------------------------------
105    Name: Matthias Blume
106    Date: 2002/03/29 14:10:00 EST
107    Tag: blume-20020329-inlprims
108    Description:
109    
110    NEW BOOTFILES!!!    Version number bumped to 110.39.3.
111    
112    Primops have changed. This means that the bin/boot-file formats have
113    changed as well.
114    
115    To make sure that there is no confusion, I made a new version.
116    
117    
118    CHANGES:
119    
120    * removed REMT from mltree (remainder should never overflow).
121    
122    * added primops to deal with divisions of all flavors to the frontend
123    
124    * handled these primops all the way through so they map to their respective
125      MLRISC support
126    
127    * used these primops in the implementation of Int, Int32, Word, Word32
128    
129    * removed INLDIV, INLMOD, and INLREM as they are no longer necessary
130    
131    * parameterized INLMIN, INLMAX, and INLABS by a numkind
132    
133    * translate.sml now deals with all flavors of INL{MIN,MAX,ABS}, including
134      floating point
135    
136    * used INL{MIN,MAX,ABS} in the implementation of Int, Int32, Word, Word32,
137      and Real (but Real.abs maps to a separate floating-point-only primop)
138    
139    
140    TODO items:
141    
142    * Hacked Alpha32 instruction selection, disabling the selection of REMx
143      instructions because the machine instruction encoder cannot handle
144      them.  (Hppa, PPC, and Sparc instruction selection did not handle
145      REM in the first place, and REM is supported by the x86 machine coder.)
146    
147    * Handle DIV and MOD with DIV_TO_NEGINF directly in the x86 instruction
148      selection phase.  (The two can be streamlined because the hardware
149      delivers both quotient and remainder at the same time anyway.)
150    
151    * Think about what to do with "valOf(Int32.minInt) div ~1" and friends.
152      (Currently the behavior is inconsistent both across architectures and
153      wrt. the draft Basis spec.)
154    
155    * Word8 should eventually be handled natively, too.
156    
157    * There seems to be one serious bug in mltree-gen.sml.  It appears, though,
158      as if there currently is no execution path that could trigger it in
159      SML/NJ.  (The assumptions underlying functions arith and promotable do not
160      hold for things like multiplication and division.)
161    
162    ----------------------------------------------------------------------
163    Name: Matthias Blume
164    Date: 2002/03/27 16:27:00 EST
165    Tag: blume-20020327-mlrisc-divisions
166    Description:
167    
168    Added support for all four division operations (ML's div, mod, quot,
169    and rem) to MLRISC.  In the course of doing so, I also rationalized
170    the naming (no more annoying switch-around of DIV and QUOT), by
171    parameterizing the operation by div_rounding_mode (which can be either
172    DIV_TO_ZERO or DIV_TO_NEGINF).
173    
174    The generic MLTreeGen functor takes care of compiling all four
175    operations down to only round-to-zero div.
176    
177    Missing pieces:
178    
179      * Doing something smarter than relying on MLTreeGen on architectures
180        like, e.g., the x86 where hardware division delivers both quotient and
181        remainder at the same time.  With this, the implementation of the
182        round-to-neginf operations could be further streamlined.
183    
184      * Remove inlining support for div/mod/rem from the frontend and replace it
185        with primops that get carried through to the backend.  Do this for all
186        int and word types.
187    
188    ----------------------------------------------------------------------
189    Name: Matthias Blume
190    Date: 2002/03/25 17:25:00 EST
191    Tag: blume-20020325-divmod
192    Description:
193    
194    I improved (hopefully without breaking them) the implementation of Int.div,
195    Int.mod, and Int.rem.   For this, the code in translate.sml now takes
196    advantage of the following observations:
197    
198      Let  q = x quot y      r = x rem y
199           d = x div  y      m = x mod y
200    
201    where "quot" is the round-to-zero version of integer division that
202    hardware usually provides.  Then we have:
203    
204         r = x - q * y        where neither the * nor the - will overflow
205         d = if q >= 0 orelse x = q * y then q else q - 1
206                              where neither the * nor the - will overflow
207         m = if q >= 0 orelse r = 0 then r else r + y
208                              where the + will not overflow
209    
210    This results in substantial simplification of the generated code.
211    The following table shows the number of CFG nodes and edges generated
212    for
213            fun f (x, y) = x OPER y
214            (* with OPER \in div, mod, quot, rem *)
215    
216    
217        OPER | nodes(old) | edges(old) | nodes(new) | edges(new)
218        --------------------------------------------------------
219         div |         24 |         39 |         12 |         16
220         mod |         41 |         71 |         12 |         16
221        quot |          8 |         10 |          8 |         10
222         rem |         10 |         14 |          8 |         10
223    
224    
225    ----------------------------------------------------------------------
226    Name: Matthias Blume
227    Date: 2002/03/25 22:06:00 EST
228    Tag: blume-20020325-cprotobug
229    Description:
230    
231    Fixed a bug in cproto (c prototype decoder).
232    
233    ----------------------------------------------------------------------
234    Name: Matthias Blume
235    Date: 2002/03/25 16:00:00 EST
236    Tag: blume-20020325-raw-primops
237    Description:
238    
239    I did some cleanup to Allen's new primop code and
240    replaced yesterday's bootfiles with new ones.
241    (But they are stored in the same place.)
242    
243    ----------------------------------------------------------------------
244    Name: Matthias Blume
245    Date: 2002/03/24 22:40:00 EST
246    Tag: blume-20020324-bootfiles
247    Description:
248    
249    Made the bootfiles that Allen asked for.
250    
251    ----------------------------------------------------------------------
252    Name: Allen Leung
253    Date: 2002/03/23 15:50:00 EST
254    Tag: leunga-20020323-flint-cps-rcc-primops
255    Description:
256    
257      1. Changes to FLINT primops:
258    
259        (* make a call to a C-function;
260         * The primop carries C function prototype information and specifies
261         * which of its (ML-) arguments are floating point. C prototype
262         * information is for use by the backend, ML information is for
263         * use by the CPS converter. *)
264      | RAW_CCALL of { c_proto: CTypes.c_proto,
265                       ml_args: ccall_type list,
266                       ml_res_opt: ccall_type option,
267                       reentrant : bool
268                     } option
269       (* Allocate uninitialized storage on the heap.
270        * The record is meant to hold short-lived C objects, i.e., they
271        * are not ML pointers.  With the tag, the representation is
272        * the same as RECORD with tag tag_raw32 (sz=4), or tag_fblock (sz=8)
273        *)
274      | RAW_RECORD of {tag:bool,sz:int}
275      and ccall_type = CCALL_INT32 | CCALL_REAL64 | CCALL_ML_PTR
276    
277      2.  These CPS primops are now overloaded:
278    
279           rawload of {kind:numkind}
280           rawstore of {kind:numkind}
281    
282          The one argument form is:
283    
284             rawload {kind} address
285    
286          The two argument form is:
287    
288             rawload {kind} [ml object, byte-offset]
289    
290      3. RAW_CCALL/RCC now takes two extra arguments:
291    
292         a. The first is whether the C call is reentrant, i.e., whether
293            ML state should be saved and restored.
294         b. The second argument is a string argument specifying the name of
295            library and the C function.
296    
297         These things are currently not handled in the code generator, yet.
298    
299      4. In CProto,
300    
301         An encoding type of "bool" means "ml object" and is mapped into
302         C prototype of PTR.  Note that "bool" is different than "string",
303         even though "string" is also mapped into PTR, because "bool"
304         is assigned an CPS type of BOGt, while "string" is assigned INT32t.
305    
306      5. Pickler/unpicker
307    
308         Changed to handle RAW_RECORD and newest RAW_CCALL
309    
310      6. MLRiscGen,
311    
312         1. Changed to handle the new rawload/rawstore/rawrecord operators.
313         2. Code for handling C Calls has been moved to a new module CPSCCalls,
314            in the file CodeGen/cpscompile/cps-c-calls.sml
315    
316      7. Added the conditional move operator
317    
318             condmove of branch
319    
320         to cps.  Generation of this is still buggy so it is currently
321         disabled.
322    
323    ----------------------------------------------------------------------
324    Name: Lal George
325    Date: 2002/03/22 14:18:25 EST
326    Tag: george-20020322-cps-branch-prob
327    Description:
328    
329    Implemented the Ball-Larus branch prediction-heuristics, and
330    incorporated graphical viewers for control flow graphs.
331    
332    Ball-Larus Heuristics:
333    ---------------------
334    See the file compiler/CodeGen/cpscompile/cpsBranchProb.sml.
335    
336    By design it uses the Dempster-Shafer theory for combining
337    probabilities.  For example, in the function:
338    
339        fun f(n,acc) = if n = 0 then acc else f(n-1, n*acc)
340    
341    the ball-larus heuristics predicts that the n=0 is unlikely
342    (OH-heuristic), and the 'then' branch is unlikely because of the
343    RH-heuristic -- giving the 'then' branch an even lower combined
344    probability using the Dempster-Shafer theory.
345    
346    Finally, John Reppy's loop analysis in MLRISC, further lowers the
347    probability of the 'then' branch because of the loop in the else
348    branch.
349    
350    
351    Graphical Viewing:
352    ------------------
353    I merely plugged in Allen's graphical viewers into the compiler. The
354    additional code is not much. At the top level, saying:
355    
356            Control.MLRISC.getFlag "cfg-graphical-view" := true;
357    
358    will display the graphical view of the control flow graph just before
359    back-patching.  daVinci must be in your path for this to work. If
360    daVinci is not available, then the default viewer can be changed
361    using:
362    
363            Control.MLRISC.getString "viewer"
364    
365    which can be set to "dot" or "vcg" for the corresponding viewers. Of
366    course, these viewers must be in your path.
367    
368    The above will display the compilation unit at the level of clusters,
369    many of which are small, boring, and un-interesting. Also setting:
370    
371            Control.MLRISC.getInt "cfg-graphical-view_size"
372    
373    will display clusters that are larger than the value set by the above.
374    
375    
376    ----------------------------------------------------------------------
377    Name: Matthias Blume
378    Date: 2002/03/21 22:20:00 EST
379    Tag: blume-20020321-kmp-bugfix
380    Description:
381    
382    Changed the interface to the KMP routine in PreString and fixed
383    a minor bug in one place where it was used.
384    
385    ----------------------------------------------------------------------
386    Name: Allen Leung
387    Date: 2002/03/21 20:30:00 EST
388    Tag: leunga-20020321-cfg
389    Description:
390    
391      Fixed a potential problem in cfg edge splitting.
392    
393    ----------------------------------------------------------------------
394    Name: Allen Leung
395    Date: 2002/03/21 17:15:00 EST
396    Tag: leunga-20020321-x86-fp-cfg
397    Description:
398    
399      1. Recoded the buggy parts of x86-fp.
400    
401         a. All the block reordering code has been removed.
402            We now depend on the block placement phases to do this work.
403    
404         b. Critical edge splitting code has been simplified and moved into the
405            CFG modules, as where they belong.
406    
407         Both of these were quite buggy and complex.  The code is now much, much
408         simpler.
409    
410      2. X86 backend.
411    
412         a. Added instructions for 64-bit support.  Instruction selection for
413            64-bit has not been committed, however, since that
414            requires changes to MLTREE which haven't been approved by
415            Lal and John.
416    
417         b. Added support for FUCOMI and FUCOMIP when generating code for
418            PentiumPro and above.  We only generate these instructions in
419            the fast-fp mode.
420    
421         c. Added cases for JP and JNP in X86FreqProps.
422    
423      3. CFG
424    
425         CFG now has a bunch of methods for edge splitting and merging.
426    
427      4. Machine description.
428    
429         John's simplification of MLTREE_BASIS.fcond broke a few machine
430         description things:
431    
432         rtl-build.{sig,sml} and hppa.mdl fixed.
433    
434         NOTE: the machine description stuff in the repository is still broken.
435               Again, I can't put my fixes in because that involves
436               changes to MLTREE.
437    
438    ----------------------------------------------------------------------
439    Name: Matthias Blume
440    Date: 2002/03/20 15:55:00 EST
441    Tag: blume-20020320-kmp
442    Description:
443    
444    Implemented Knuth-Morris-Pratt string matching in PreString and used
445    it for String.isSubstring, Substring.isSubstring, and
446    Substring.position.
447    
448    (Might need some stress-testing.  Simple examples worked fine.)
449    
450    ----------------------------------------------------------------------
451    Name: Matthias Blume
452    Date: 2002/03/19 16:37:00 EST
453    Tag: blume-20020319-witnesses
454    Description:
455    
456    Added a structure C.W and functions convert/Ptr.convert to ml-nlffi-lib.
457    
458    This implements a generic mechanism for changing constness qualifiers
459    anywhere within big C types without resorting to outright "casts".
460    (So far, functions such as C.rw/C.ro or C.Ptr.rw/C.Ptr.ro only let you
461    modify the constness at the outermost level.)
462    The implementation of "convert" is based on the idea of "witness"
463    values -- values that are not used by the operation but whose types
464    "testify" to their applicability.  On the implementation side, "convert"
465    is simply a projection (returning its second curried argument).  With
466    cross-module inlining, it should not result in any machine code being
467    generated.
468    
469    ----------------------------------------------------------------------
470    Name: Matthias Blume
471    Date: 2002/03/15 16:40:00 EST
472    Tag: blume-20020315-basis
473    Description:
474    
475    Provided (preliminary?) implementations for
476    
477      {String,Substring}.{concatWith,isSuffix,isSubstring}
478    
479    and
480    
481      Substring.full
482    
483    Those are in the Basis spec but they were missing in SML/NJ.
484    
485    ----------------------------------------------------------------------
486    Name: Matthias Blume
487    Date: 2002/03/14 21:30:00 EST
488    Tag: blume-20020314-controls
489    Description:
490    
491    Controls:
492    ---------
493    
494    1. Factored out the recently-added Controls : CONTROLS stuff and put
495       it into its own library $/controls-lib.cm.  The source tree for
496       this is under src/smlnj-lib/Controls.
497    
498    2. Changed the names of types and functions in this interface, so they
499       make a bit more "sense":
500    
501          module -> registry
502          'a registry -> 'a group
503    
504    3. The interface now deals in ref cells only.  The getter/setter interface
505       is (mostly) gone.
506    
507    4. Added a function that lets one register an already-existing ref cell.
508    
509    5. Made the corresponding modifications to the rest of the code so that
510       everything compiles again.
511    
512    6. Changed the implementation of Controls.MLRISC back to something closer
513       to the original.  In particular, this module (and therefore MLRISC)
514       does not depend on Controls.  There now is some link-time code in
515       int-sys.sml that registers the MLRISC controls with the Controls
516       module.
517    
518    CM:
519    ---
520    
521      * One can now specify the lambda-split aggressiveness in init.cmi.
522    
523    ----------------------------------------------------------------------
524    Name: Allen Leung
525    Date: 2002/03/13 17:30:00 EST
526    Tag: leunga-20020313-x86-fp-unary
527    Description:
528    
529    Bug fix for:
530    
531    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
532    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
533    > - fun f(x,(y,z)) = Real.~ y;
534    > [autoloading]
535    > [autoloading done]
536    >       fchsl   (%eax), 184(%esp)
537    > Error: MLRisc bug: X86MCEmitter.emitInstr
538    >
539    > uncaught exception Error
540    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
541    
542    The problem was that the code generator did not generate any fp registers
543    in this case, and the ra didn't know that it needed to run the X86FP phase to
544    translate the pseudo fp instruction.   This only happened with unary fp
545    operators in certain situations.
546    
547    ----------------------------------------------------------------------
548    Name: Matthias Blume
549    Date: 2002/03/13 14:00:00 EST
550    Tag: blume-20020313-overload-etc
551    Description:
552    
553    1. Added _overload as a synonym for overload for backward compatibility.
554       (Control.overloadKW must be true for either version to be accepted.)
555    
556    2. Fixed bug in install script that caused more things to be installed
557       than what was requested in config/targets.
558    
559    3. Made CM aware of the (_)overload construct so that autoloading
560       works.
561    
562    ----------------------------------------------------------------------
563    Name: Matthias Blume
564    Date: 2002/03/12 22:03:00 EST
565    Tag: blume-20020312-url
566    Description:
567    
568    Forgot to update BOOT and srcarchiveurl.
569    
570    ----------------------------------------------------------------------
571    Name: Matthias Blume
572    Date: 2002/03/12 17:30:00 EST
573    Tag: blume-20020312-version110392
574    Description:
575    
576    Yet another version number bump (because of small changes to the
577    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
578    
579    Changes:
580    
581      The new pid generation scheme described a few weeks ago was overly
582      complicated.  I implemented a new mechanism that is simpler and
583      provides a bit more "stability":  Once CM has seen a compilation
584      unit, it keeps its identity constant (as long as you do not delete
585      those crucial CM/GUID/* files).  This means that when you change
586      an interface, compile, then go back to the old interface, and
587      compile again, you arrive at the original pid.
588    
589      There now also is a mechanism that instructs CM to use the plain
590      environment hash as a module's pid (effectively making its GUID
591      the empty string).  For this, "noguid" must be specified as an
592      option to the .sml file in question within its .cm file.
593      This is most useful for code that is being generated by tools such
594      as ml-nlffigen (because during development programmers tend to
595      erase the tool's entire output directory tree including CM's cached
596      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
597      revert to the old, broken behavior of SML/NJ, but in specific cases
598      where there is no danger of interface confusion, its use is ok
599      (I think).
600    
601      ml-nlffigen by default generates "noguid" annotations.  They can be
602      turned off by specifying -guid in its command line.
603    
604    ----------------------------------------------------------------------
605    Name: Lal George
606    Date: 2002/03/12 12 14:42:36 EST
607    Tag: george-20020312-frequency-computation
608    Description:
609    
610    Integrated jump chaining and static block frequency into the
611    compiler. More details and numbers later.
612    
613    ----------------------------------------------------------------------
614    Name: Lal George
615    Date: 2002/03/11 11 22:38:53 EST
616    Tag: george-20020311-jump-chain-elim
617    Description:
618    
619    Tested the jump chain elimination on all architectures (except the
620    hppa).  This is on by default right now and is profitable for the
621    alpha and x86, however, it may not be profitable for the sparc and ppc
622    when compiling the compiler.
623    
624    The gc test will typically jump to a label at the end of the cluster,
625    where there is another jump to an external cluster containing the actual
626    code to invoke gc. This is to allow factoring of common gc invocation
627    sequences. That is to say, we generate:
628    
629            f:
630               testgc
631               ja   L1      % jump if above to L1
632    
633            L1:
634               jmp L2
635    
636    
637    After jump chain elimination the 'ja L1' instructions is converted to
638    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
639    up being implemented in their long form (if L2 is far away) using:
640    
641            jbe     L3      % jump if below or equal to L3
642            jmp     L2
643         L3:
644            ...
645    
646    
647    For large compilation units L2  may be far away.
648    
649    
650    ----------------------------------------------------------------------
651    Name: Matthias Blume
652    Date: 2002/03/11 13:30:00 EST
653    Tag: blume-20020311-mltreeeval
654    Description:
655    
656    A functor parameter was missing.
657    
658    ----------------------------------------------------------------------
659    Name: Allen Leung
660    Date: 2002/03/11 10:30:00 EST
661    Tag: leunga-20020311-runtime-string0
662    Description:
663    
664       The representation of the empty string now points to a
665    legal null terminated C string instead of unit.  It is now possible
666    to convert an ML string into C string with InlineT.CharVector.getData.
667    This compiles into one single machine instruction.
668    
669    ----------------------------------------------------------------------
670    Name: Allen Leung
671    Date: 2002/03/10 23:55:00 EST
672    Tag: leunga-20020310-x86-call
673    Description:
674    
675       Added machine generation for CALL instruction (relative displacement mode)
676    
677    ----------------------------------------------------------------------
678    Name: Matthias Blume
679    Date: 2002/03/08 16:05:00
680    Tag: blume-20020308-entrypoints
681    Description:
682    
683    Version number bumped to 110.39.1.  NEW BOOTFILES!
684    
685    Entrypoints: non-zero offset into a code object where execution should begin.
686    
687    - Added the notion of an entrypoint to CodeObj.
688    - Added reading/writing of entrypoint info to Binfile.
689    - Made runtime system bootloader aware of entrypoints.
690    - Use the address of the label of the first function given to mlriscGen
691      as the entrypoint.  This address is currently always 0, but it will
692      not be 0 once we turn on block placement.
693    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
694      with entry points) from mlriscGen.
695    
696    ----------------------------------------------------------------------
697    Name: Allen Leung
698    Date: 2002/03/07 20:45:00 EST
699    Tag: leunga-20020307-x86-cmov
700    Description:
701    
702       Bug fixes for CMOVcc on x86.
703    
704       1. Added machine code generation for CMOVcc
705       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
706       3. CMOVcc cannot have an immediate operand as argument.
707    
708    ----------------------------------------------------------------------
709    Name: Matthias Blume
710    Date: 2002/03/07 16:15:00 EST
711    Tag: blume-20020307-controls
712    Description:
713    
714    This is a very large but mostly boring patch which makes (almost)
715    every tuneable compiler knob (i.e., pretty much everything under
716    Control.* plus a few other things) configurable via both the command
717    line and environment variables in the style CM did its configuration
718    until now.
719    
720    Try starting sml with '-h' (or, if you are brave, '-H')
721    
722    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
723    implements the underlying generic mechanism.
724    
725    The interface to some of the existing such facilities has changed somewhat.
726    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
727    (The getFoo interface is still there for backward-compatibility, but its
728    use is deprecated.)
729    
730    The ml-build script passes -Cxxx=yyy command-line arguments through so
731    that one can now twiddle the compiler settings when using this "batch"
732    compiler.
733    
734    TODO items:
735    
736    We should go through and throw out all controls that are no longer
737    connected to anything.  Moreover, we should go through and provide
738    meaningful (and correct!) documentation strings for those controls
739    that still are connected.
740    
741    Currently, multiple calls to Controls.new are accepted (only the first
742    has any effect).  Eventually we should make sure that every control
743    is being made (via Controls.new) exactly once.  Future access can then
744    be done using Controls.acc.
745    
746    Finally, it would probably be a good idea to use the getter-setter
747    interface to controls rather than ref cells.  For the time being, both
748    styles are provided by the Controls module, but getter-setter pairs are
749    better if thread-safety is of any concern because they can be wrapped.
750    
751    *****************************************
752    
753    One bug fix: The function blockPlacement in three of the MLRISC
754    backpatch files used to be hard-wired to one of two possibilities at
755    link time (according to the value of the placementFlag).  But (I
756    think) it should rather sense the flag every time.
757    
758    *****************************************
759    
760    Other assorted changes (by other people who did not supply a HISTORY entry):
761    
762    1. the cross-module inliner now works much better (Monnier)
763    2. representation of weights, frequencies, and probabilities in MLRISC
764       changed in preparation of using those for weighted block placement
765       (Reppy, George)
766    
767    ----------------------------------------------------------------------
768    Name: Lal George
769    Date: 2002/03/07 14:44:24 EST 2002
770    Tag: george-20020307-weighted-block-placement
771    
772    Tested the weighted block placement optimization on all architectures
773    (except the hppa) using AMPL to generate the block and edge frequencies.
774    Changes were required in the machine properties to correctly
775    categorize trap instructions. There is an MLRISC flag
776    "weighted-block-placement" that can be used to enable weighted block
777    placement, but this will be ineffective without block/edge
778    frequencies (coming soon).
779    
780    
781    ----------------------------------------------------------------------
782    Name: Lal George
783    Date: 2002/03/05 17:24:48 EST
784    Tag: george-20020305-linkage-cluster
785    
786    In order to support the block placement optimization, a new cluster
787    is generated as the very first cluster (called the linkage cluster).
788    It contains a single jump to the 'real' entry point for the compilation
789    unit. Block placement has no effect on the linkage cluster itself, but
790    all the other clusters  have full freedom in the manner in which they
791    reorder blocks or functions.
792    
793    On the x86 the typical linkage code that is generated is:
794       ----------------------
795            .align 2
796       L0:
797            addl    $L1-L0, 72(%esp)
798            jmp     L1
799    
800    
801            .align  2
802       L1:
803       ----------------------
804    
805    72(%esp) is the memory location for the stdlink register. This
806    must contain the address of the CPS function being called. In the
807    above example, it contains the address of  L0; before
808    calling L1 (the real entry point for the compilation unit), it
809    must contain the address for L1, and hence
810    
811            addl $L1-L0, 72(%esp)
812    
813    I have tested this on all architectures except the hppa.The increase
814    in code size is of course negligible
815    
816    ----------------------------------------------------------------------
817    Name: Allen Leung
818    Date: 2002/03/03 13:20:00 EST
819    Tag: leunga-20020303-mlrisc-tools
820    
821      Added #[ ... ] expressions to mlrisc tools
822    
823    ----------------------------------------------------------------------
824    Name: Matthias Blume
825    Date: 2002/02/27 12:29:00 EST
826    Tag: blume-20020227-cdebug
827    Description:
828    
829    - made types in structure C and C_Debug to be equal
830    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
831    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
832    
833    ----------------------------------------------------------------------
834    Name: Matthias Blume
835    Date: 2002/02/26 12:00:00 EST
836    Tag: blume-20020226-ffi
837    Description:
838    
839    1. Fixed a minor bug in CM's "noweb" tool:
840       If numbering is turned off, then truly don't number (i.e., do not
841       supply the -L option to noweb).  The previous behavior was to supply
842       -L'' -- which caused noweb to use the "default" line numbering scheme.
843       Thanks to Chris Richards for pointing this out (and supplying the fix).
844    
845    2. Once again, I reworked some aspects of the FFI:
846    
847       A. The incomplete/complete type business:
848    
849       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
850         gone!
851       - ML types representing an incomplete type are now *equal* to
852         ML types representing their corresponding complete types (just like
853         in C).  This is still safe because ml-nlffigen will not generate
854         RTTI for incomplete types, nor will it generate functions that
855         require access to such RTTI.   But when ML code generated from both
856         incomplete and complete versions of the C type meet, the ML types
857         are trivially interoperable.
858    
859         NOTE:  These changes restore the full generality of the translation
860         (which was previously lost when I eliminated functorization)!
861    
862       B. Enum types:
863    
864       - Structure C now has a type constructor "enum" that is similar to
865         how the "su" constructor works.  However, "enum" is not a phantom
866         type because each "T enum" has values (and is isomorphic to
867         MLRep.Signed.int).
868       - There are generic access operations for enum objects (using
869         MLRep.Signed.int).
870       - ml-nlffigen will generate a structure E_foo for each "enum foo".
871         * The structure contains the definition of type "mlrep" (the ML-side
872         representation type of the enum).  Normally, mlrep is the same
873         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
874         then mlrep will be defined as a datatype -- thus facilitating
875         pattern matching on mlrep values.
876         ("-ec" will be suppressed if there are duplicate values in an
877          enumeration.)
878         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
879         will be generated for each C enum constant xxx.
880         * Conversion functions m2i and i2m convert between mlrep and
881         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
882         * Coversion functions c and ml convert between mlrep and "tag enum".
883         * Access functions (get/set) fetch and store mlrep values.
884       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
885         enumerations are merged into one single enumeration represented by
886         structure E_'.
887    
888    ----------------------------------------------------------------------
889    Name: Allen Leung
890    Date: 2002/02/25 04:45:00 EST
891    Tag: leunga-20020225-cps-spill
892    
893    This is a new implementation of the CPS spill phase.
894    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
895    In case of problems, replace it with the old file spill.sml
896    
897    The current compiler runs into some serious performance problems when
898    constructing a large record.  This can happen when we try to compile a
899    structure with many items.  Even a very simple structure like the following
900    makes the compiler slow down.
901    
902        structure Foo = struct
903           val x_1 = 0w1 : Word32.int
904           val x_2 = 0w2 : Word32.int
905           val x_3 = 0w3 : Word32.int
906           ...
907           val x_N = 0wN : Word32.int
908        end
909    
910    The following table shows the compile time, from N=1000 to N=4000,
911    with the old compiler:
912    
913    N
914    1000   CPS 100 spill                           0.04u  0.00s  0.00g
915           MLRISC ra                               0.06u  0.00s  0.05g
916              (spills = 0 reloads = 0)
917           TOTAL                                   0.63u  0.07s  0.21g
918    
919    1100   CPS 100 spill                           8.25u  0.32s  0.64g
920           MLRISC ra                               5.68u  0.59s  3.93g
921              (spills = 0 reloads = 0)
922           TOTAL                                   14.71u  0.99s  4.81g
923    
924    1500   CPS 100 spill                           58.55u  2.34s  1.74g
925           MLRISC ra                               5.54u  0.65s  3.91g
926              (spills = 543 reloads = 1082)
927           TOTAL                                   65.40u  3.13s  6.00g
928    
929    2000   CPS 100 spill                           126.69u  4.84s  3.08g
930           MLRISC ra                               0.80u  0.10s  0.55g
931              (spills = 42 reloads = 84)
932           TOTAL                                   129.42u  5.10s  4.13g
933    
934    3000   CPS 100 spill                           675.59u  19.03s  11.64g
935           MLRISC ra                               2.69u  0.27s  1.38g
936              (spills = 62 reloads = 124)
937           TOTAL                                   682.48u  19.61s  13.99g
938    
939    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
940           MLRISC ra                               4.96u  0.27s  2.72g
941              (spills = 85 reloads = 170)
942           TOTAL                                   2375.26u  57.21s  48.00g
943    
944    As you can see the old cps spill module suffers from some serious
945    performance problem.  But since I cannot decipher the old code fully,
946    instead of patching the problems up, I'm reimplementing it
947    with a different algorithm.  The new code is more modular,
948    smaller when compiled, and substantially faster
949    (O(n log n) time and O(n) space).  Timing of the new spill module:
950    
951    4000  CPS 100 spill                           0.02u  0.00s  0.00g
952          MLRISC ra                               0.25u  0.02s  0.15g
953             (spills=1 reloads=3)
954          TOTAL                                   7.74u  0.34s  1.62g
955    
956    Implementation details:
957    
958    As far as I can tell, the purpose of the CPS spill module is to make sure the
959    number of live variables at any program point (the bandwidth)
960    does not exceed a certain limit, which is determined by the
961    size of the spill area.
962    
963    When the bandwidth is too large, we decrease the register pressure by
964    packing live variables into spill records.  How we achieve this is
965    completely different than what we did in the old code.
966    
967    First, there is something about the MLRiscGen code generator
968    that we should be aware of:
969    
970    o MLRiscGen performs code motion!
971    
972       In particular, it will move floating point computations and
973       address computations involving only the heap pointer to
974       their use sites (if there is only a single use).
975       What this means is that if we have a CPS record construction
976       statement
977    
978           RECORD(k,vl,w,e)
979    
980       we should never count the new record address w as live if w
981       has only one use (which is often the case).
982    
983       We should do something similar to floating point, but the transformation
984       there is much more complex, so I won't deal with that.
985    
986    Secondly, there are now two new cps primops at our disposal:
987    
988     1. rawrecord of record_kind option
989        This pure operator allocates some uninitialized storage from the heap.
990        There are two forms:
991    
992         rawrecord NONE [INT n]  allocates a tagless record of length n
993         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
994                                     and initializes the tag.
995    
996     2. rawupdate of cty
997          rawupdate cty (v,i,x)
998          Assigns to x to the ith component of record v.
999          The storelist is not updated.
1000    
1001    We use these new primops for both spilling and increment record construction.
1002    
1003     1. Spilling.
1004    
1005        This is implemented with a linear scan algorithm (but generalized
1006        to trees).  The algorithm will create a single spill record at the
1007        beginning of the cps function and use rawupdate to spill to it,
1008        and SELECT or SELp to reload from it.  So both spills and reloads
1009        are fine-grain operations.  In contrast, in the old algorithm
1010        "spills" have to be bundled together in records.
1011    
1012        Ideally, we should sink the spill record construction to where
1013        it is needed.  We can even split the spill record into multiple ones
1014        at the places where they are needed.  But CPS is not a good
1015        representation for global code motion, so I'll keep it simple and
1016        am not attempting this.
1017    
1018     2. Incremental record construction (aka record splitting).
1019    
1020        Long records with many component values which are simulatenously live
1021        (recall that single use record addresses are not considered to
1022         be live) are constructed with rawrecord and rawupdate.
1023        We allocate space on the heap with rawrecord first, then gradually
1024        fill it in with rawupdate.  This is the technique suggested to me
1025        by Matthias.
1026    
1027        Some restrictions on when this is applicable:
1028        1. It is not a VECTOR record.  The code generator currently does not handle
1029           this case. VECTOR record uses double indirection like arrays.
1030        2. All the record component values are defined in the same "basic block"
1031           as the record constructor.  This is to prevent speculative
1032           record construction.
1033    
1034    ----------------------------------------------------------------------
1035    Name: Allen Leung
1036    Date: 2002/02/22 01:02:00 EST
1037    Tag: leunga-20020222-mlrisc-tools
1038    
1039    Minor bug fixes in the parser and rewriter
1040    
1041    ----------------------------------------------------------------------
1042    Name: Allen Leung
1043    Date: 2002/02/21 20:20:00 EST
1044    Tag: leunga-20020221-peephole
1045    
1046    Regenerated the peephole files.  Some contained typos in the specification
1047    and some didn't compile because of pretty printing bugs in the old version
1048    of 'nowhere'.
1049    
1050    ----------------------------------------------------------------------
1051    Name: Allen Leung
1052    Date: 2002/02/19 20:20:00 EST
1053    Tag: leunga-20020219-mlrisc-tools
1054    Description:
1055    
1056       Minor bug fixes to the mlrisc-tools library:
1057    
1058       1.  Fixed up parsing colon suffixed keywords
1059       2.  Added the ability to shut the error messages up
1060       3.  Reimplemented the pretty printer and fixed up/improved
1061           the pretty printing of handle and -> types.
1062       4.  Fixed up generation of literal symbols in the nowhere tool.
1063       5.  Added some SML keywords to to sml.sty
1064    
1065    ----------------------------------------------------------------------
1066    Name: Matthias Blume
1067    Date: 2002/02/19 16:20:00 EST
1068    Tag: blume-20020219-cmffi
1069    Description:
1070    
1071    A wild mix of changes, some minor, some major:
1072    
1073    * All C FFI-related libraries are now anchored under $c:
1074        $/c.cm      --> $c/c.cm
1075        $/c-int.cm  --> $c/internals/c-int.cm
1076        $/memory.cm --> $c/memory/memory.cm
1077    
1078    * "make" tool (in CM) now treats its argument pathname slightly
1079      differently:
1080        1. If the native expansion is an absolute name, then before invoking
1081           the "make" command on it, CM will apply OS.Path.mkRelative
1082           (with relativeTo = OS.FileSys.getDir()) to it.
1083        2. The argument will be passed through to subsequent phases of CM
1084           processing without "going native".  In particular, if the argument
1085           was an anchored path, then "make" will not lose track of that anchor.
1086    
1087    * Compiler backends now "know" their respective C calling conventions
1088      instead of having to be told about it by ml-nlffigen.  This relieves
1089      ml-nlffigen from one of its burdens.
1090    
1091    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
1092    
1093    * Export C_DEBUG and C_Debug from $c/c.cm.
1094    
1095    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
1096      subtyping relationship between incomplete pointers and their complete
1097      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
1098      with the convention of instantiating 'o with ('t, 'c) obj whenever
1099      the pointer target type is complete.  In the incomplete case, 'o
1100      will be instantiated with some "'c iobj" -- a type obtained by
1101      using one of the functors PointerToIncompleteType or PointerToCompleteType.
1102    
1103      Operations that work on both incomplete and complete pointer types are
1104      typed as taking an 'o ptr while operations that require the target to
1105      be known are typed as taking some ('t, 'c) obj ptr.
1106    
1107      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
1108      where void is an eqtype without any values.  This makes it possible
1109      to work on voidptr values using functions meant to operate on light
1110      incomplete pointers.
1111    
1112    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
1113      been vastly simplified.
1114    
1115    ----------------------------------------------------------------------
1116    Name: Matthias Blume
1117    Date: 2002/02/19 10:48:00 EST
1118    Tag: blume-20020219-pqfix
1119    Description:
1120    
1121    Applied Chris Okasaki's bug fix for priority queues.
1122    
1123    ----------------------------------------------------------------------
1124    Name: Matthias Blume
1125    Date: 2002/02/15 17:05:00
1126    Tag: Release_110_39
1127    Description:
1128    
1129    Last-minute retagging is becoming a tradition... :-(
1130    
1131    This is the working release 110.39.
1132    
1133    ----------------------------------------------------------------------
1134    Name: Matthias Blume
1135    Date: 2002/02/15 16:00:00 EST
1136    Tag: Release_110_39-orig
1137    Description:
1138    
1139    Working release 110.39.  New bootfiles.
1140    
1141    (Update: There was a small bug in the installer so it wouldn't work
1142    with all shells.  So I retagged. -Matthias)
1143    
1144    ----------------------------------------------------------------------
1145    Name: Matthias Blume
1146  Date: 2002/02/15 14:17:00 EST  Date: 2002/02/15 14:17:00 EST
1147  Tag: blume-20020215-showbindings  Tag: blume-20020215-showbindings
1148  Description:  Description:

Legend:
Removed from v.1068  
changed lines
  Added in v.1190

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0