Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1064, Thu Feb 14 03:40:24 2002 UTC revision 1130, Mon Mar 11 04:49:41 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2002/03/10 23:55:00 EST
18    Tag: leunga-20020310-x86-call
19    Description:
20    
21       Added machine generation for CALL instruction (relative displacement mode)
22    
23    ----------------------------------------------------------------------
24    Name: Matthias Blume
25    Date: 2002/03/08 16:05:00
26    Tag: blume-20020308-entrypoints
27    Description:
28    
29    Version number bumped to 110.39.1.  NEW BOOTFILES!
30    
31    Entrypoints: non-zero offset into a code object where execution should begin.
32    
33    - Added the notion of an entrypoint to CodeObj.
34    - Added reading/writing of entrypoint info to Binfile.
35    - Made runtime system bootloader aware of entrypoints.
36    - Use the address of the label of the first function given to mlriscGen
37      as the entrypoint.  This address is currently always 0, but it will
38      not be 0 once we turn on block placement.
39    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
40      with entry points) from mlriscGen.
41    
42    ----------------------------------------------------------------------
43    Name: Allen Leung
44    Date: 2002/03/07 20:45:00 EST
45    Tag: leunga-20020307-x86-cmov
46    Description:
47    
48       Bug fixes for CMOVcc on x86.
49    
50       1. Added machine code generation for CMOVcc
51       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
52       3. CMOVcc cannot have an immediate operand as argument.
53    
54    ----------------------------------------------------------------------
55    Name: Matthias Blume
56    Date: 2002/03/07 16:15:00 EST
57    Tag: blume-20020307-controls
58    Description:
59    
60    This is a very large but mostly boring patch which makes (almost)
61    every tuneable compiler knob (i.e., pretty much everything under
62    Control.* plus a few other things) configurable via both the command
63    line and environment variables in the style CM did its configuration
64    until now.
65    
66    Try starting sml with '-h' (or, if you are brave, '-H')
67    
68    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
69    implements the underlying generic mechanism.
70    
71    The interface to some of the existing such facilities has changed somewhat.
72    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
73    (The getFoo interface is still there for backward-compatibility, but its
74    use is deprecated.)
75    
76    The ml-build script passes -Cxxx=yyy command-line arguments through so
77    that one can now twiddle the compiler settings when using this "batch"
78    compiler.
79    
80    TODO items:
81    
82    We should go through and throw out all controls that are no longer
83    connected to anything.  Moreover, we should go through and provide
84    meaningful (and correct!) documentation strings for those controls
85    that still are connected.
86    
87    Currently, multiple calls to Controls.new are accepted (only the first
88    has any effect).  Eventually we should make sure that every control
89    is being made (via Controls.new) exactly once.  Future access can then
90    be done using Controls.acc.
91    
92    Finally, it would probably be a good idea to use the getter-setter
93    interface to controls rather than ref cells.  For the time being, both
94    styles are provided by the Controls module, but getter-setter pairs are
95    better if thread-safety is of any concern because they can be wrapped.
96    
97    *****************************************
98    
99    One bug fix: The function blockPlacement in three of the MLRISC
100    backpatch files used to be hard-wired to one of two possibilities at
101    link time (according to the value of the placementFlag).  But (I
102    think) it should rather sense the flag every time.
103    
104    *****************************************
105    
106    Other assorted changes (by other people who did not supply a HISTORY entry):
107    
108    1. the cross-module inliner now works much better (Monnier)
109    2. representation of weights, frequencies, and probabilities in MLRISC
110       changed in preparation of using those for weighted block placement
111       (Reppy, George)
112    
113    ----------------------------------------------------------------------
114    Name: Lal George
115    Date: 2002/03/07 14:44:24 EST 2002
116    Tag: george-20020307-weighted-block-placement
117    
118    Tested the weighted block placement optimization on all architectures
119    (except the hppa) using AMPL to generate the block and edge frequencies.
120    Changes were required in the machine properties to correctly
121    categorize trap instructions. There is an MLRISC flag
122    "weighted-block-placement" that can be used to enable weighted block
123    placement, but this will be ineffective without block/edge
124    frequencies (coming soon).
125    
126    
127    ----------------------------------------------------------------------
128    Name: Lal George
129    Date: 2002/03/05 17:24:48 EST
130    Tag: george-20020305-linkage-cluster
131    
132    In order to support the block placement optimization, a new cluster
133    is generated as the very first cluster (called the linkage cluster).
134    It contains a single jump to the 'real' entry point for the compilation
135    unit. Block placement has no effect on the linkage cluster itself, but
136    all the other clusters  have full freedom in the manner in which they
137    reorder blocks or functions.
138    
139    On the x86 the typical linkage code that is generated is:
140       ----------------------
141            .align 2
142       L0:
143            addl    $L1-L0, 72(%esp)
144            jmp     L1
145    
146    
147            .align  2
148       L1:
149       ----------------------
150    
151    72(%esp) is the memory location for the stdlink register. This
152    must contain the address of the CPS function being called. In the
153    above example, it contains the address of  L0; before
154    calling L1 (the real entry point for the compilation unit), it
155    must contain the address for L1, and hence
156    
157            addl $L1-L0, 72(%esp)
158    
159    I have tested this on all architectures except the hppa.The increase
160    in code size is of course negligible
161    
162    ----------------------------------------------------------------------
163    Name: Allen Leung
164    Date: 2002/03/03 13:20:00 EST
165    Tag: leunga-20020303-mlrisc-tools
166    
167      Added #[ ... ] expressions to mlrisc tools
168    
169    ----------------------------------------------------------------------
170    Name: Matthias Blume
171    Date: 2002/02/27 12:29:00 EST
172    Tag: blume-20020227-cdebug
173    Description:
174    
175    - made types in structure C and C_Debug to be equal
176    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
177    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
178    
179    ----------------------------------------------------------------------
180    Name: Matthias Blume
181    Date: 2002/02/26 12:00:00 EST
182    Tag: blume-20020226-ffi
183    Description:
184    
185    1. Fixed a minor bug in CM's "noweb" tool:
186       If numbering is turned off, then truly don't number (i.e., do not
187       supply the -L option to noweb).  The previous behavior was to supply
188       -L'' -- which caused noweb to use the "default" line numbering scheme.
189       Thanks to Chris Richards for pointing this out (and supplying the fix).
190    
191    2. Once again, I reworked some aspects of the FFI:
192    
193       A. The incomplete/complete type business:
194    
195       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
196         gone!
197       - ML types representing an incomplete type are now *equal* to
198         ML types representing their corresponding complete types (just like
199         in C).  This is still safe because ml-nlffigen will not generate
200         RTTI for incomplete types, nor will it generate functions that
201         require access to such RTTI.   But when ML code generated from both
202         incomplete and complete versions of the C type meet, the ML types
203         are trivially interoperable.
204    
205         NOTE:  These changes restore the full generality of the translation
206         (which was previously lost when I eliminated functorization)!
207    
208       B. Enum types:
209    
210       - Structure C now has a type constructor "enum" that is similar to
211         how the "su" constructor works.  However, "enum" is not a phantom
212         type because each "T enum" has values (and is isomorphic to
213         MLRep.Signed.int).
214       - There are generic access operations for enum objects (using
215         MLRep.Signed.int).
216       - ml-nlffigen will generate a structure E_foo for each "enum foo".
217         * The structure contains the definition of type "mlrep" (the ML-side
218         representation type of the enum).  Normally, mlrep is the same
219         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
220         then mlrep will be defined as a datatype -- thus facilitating
221         pattern matching on mlrep values.
222         ("-ec" will be suppressed if there are duplicate values in an
223          enumeration.)
224         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
225         will be generated for each C enum constant xxx.
226         * Conversion functions m2i and i2m convert between mlrep and
227         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
228         * Coversion functions c and ml convert between mlrep and "tag enum".
229         * Access functions (get/set) fetch and store mlrep values.
230       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
231         enumerations are merged into one single enumeration represented by
232         structure E_'.
233    
234    ----------------------------------------------------------------------
235    Name: Allen Leung
236    Date: 2002/02/25 04:45:00 EST
237    Tag: leunga-20020225-cps-spill
238    
239    This is a new implementation of the CPS spill phase.
240    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
241    In case of problems, replace it with the old file spill.sml
242    
243    The current compiler runs into some serious performance problems when
244    constructing a large record.  This can happen when we try to compile a
245    structure with many items.  Even a very simple structure like the following
246    makes the compiler slow down.
247    
248        structure Foo = struct
249           val x_1 = 0w1 : Word32.int
250           val x_2 = 0w2 : Word32.int
251           val x_3 = 0w3 : Word32.int
252           ...
253           val x_N = 0wN : Word32.int
254        end
255    
256    The following table shows the compile time, from N=1000 to N=4000,
257    with the old compiler:
258    
259    N
260    1000   CPS 100 spill                           0.04u  0.00s  0.00g
261           MLRISC ra                               0.06u  0.00s  0.05g
262              (spills = 0 reloads = 0)
263           TOTAL                                   0.63u  0.07s  0.21g
264    
265    1100   CPS 100 spill                           8.25u  0.32s  0.64g
266           MLRISC ra                               5.68u  0.59s  3.93g
267              (spills = 0 reloads = 0)
268           TOTAL                                   14.71u  0.99s  4.81g
269    
270    1500   CPS 100 spill                           58.55u  2.34s  1.74g
271           MLRISC ra                               5.54u  0.65s  3.91g
272              (spills = 543 reloads = 1082)
273           TOTAL                                   65.40u  3.13s  6.00g
274    
275    2000   CPS 100 spill                           126.69u  4.84s  3.08g
276           MLRISC ra                               0.80u  0.10s  0.55g
277              (spills = 42 reloads = 84)
278           TOTAL                                   129.42u  5.10s  4.13g
279    
280    3000   CPS 100 spill                           675.59u  19.03s  11.64g
281           MLRISC ra                               2.69u  0.27s  1.38g
282              (spills = 62 reloads = 124)
283           TOTAL                                   682.48u  19.61s  13.99g
284    
285    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
286           MLRISC ra                               4.96u  0.27s  2.72g
287              (spills = 85 reloads = 170)
288           TOTAL                                   2375.26u  57.21s  48.00g
289    
290    As you can see the old cps spill module suffers from some serious
291    performance problem.  But since I cannot decipher the old code fully,
292    instead of patching the problems up, I'm reimplementing it
293    with a different algorithm.  The new code is more modular,
294    smaller when compiled, and substantially faster
295    (O(n log n) time and O(n) space).  Timing of the new spill module:
296    
297    4000  CPS 100 spill                           0.02u  0.00s  0.00g
298          MLRISC ra                               0.25u  0.02s  0.15g
299             (spills=1 reloads=3)
300          TOTAL                                   7.74u  0.34s  1.62g
301    
302    Implementation details:
303    
304    As far as I can tell, the purpose of the CPS spill module is to make sure the
305    number of live variables at any program point (the bandwidth)
306    does not exceed a certain limit, which is determined by the
307    size of the spill area.
308    
309    When the bandwidth is too large, we decrease the register pressure by
310    packing live variables into spill records.  How we achieve this is
311    completely different than what we did in the old code.
312    
313    First, there is something about the MLRiscGen code generator
314    that we should be aware of:
315    
316    o MLRiscGen performs code motion!
317    
318       In particular, it will move floating point computations and
319       address computations involving only the heap pointer to
320       their use sites (if there is only a single use).
321       What this means is that if we have a CPS record construction
322       statement
323    
324           RECORD(k,vl,w,e)
325    
326       we should never count the new record address w as live if w
327       has only one use (which is often the case).
328    
329       We should do something similar to floating point, but the transformation
330       there is much more complex, so I won't deal with that.
331    
332    Secondly, there are now two new cps primops at our disposal:
333    
334     1. rawrecord of record_kind option
335        This pure operator allocates some uninitialized storage from the heap.
336        There are two forms:
337    
338         rawrecord NONE [INT n]  allocates a tagless record of length n
339         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
340                                     and initializes the tag.
341    
342     2. rawupdate of cty
343          rawupdate cty (v,i,x)
344          Assigns to x to the ith component of record v.
345          The storelist is not updated.
346    
347    We use these new primops for both spilling and increment record construction.
348    
349     1. Spilling.
350    
351        This is implemented with a linear scan algorithm (but generalized
352        to trees).  The algorithm will create a single spill record at the
353        beginning of the cps function and use rawupdate to spill to it,
354        and SELECT or SELp to reload from it.  So both spills and reloads
355        are fine-grain operations.  In contrast, in the old algorithm
356        "spills" have to be bundled together in records.
357    
358        Ideally, we should sink the spill record construction to where
359        it is needed.  We can even split the spill record into multiple ones
360        at the places where they are needed.  But CPS is not a good
361        representation for global code motion, so I'll keep it simple and
362        am not attempting this.
363    
364     2. Incremental record construction (aka record splitting).
365    
366        Long records with many component values which are simulatenously live
367        (recall that single use record addresses are not considered to
368         be live) are constructed with rawrecord and rawupdate.
369        We allocate space on the heap with rawrecord first, then gradually
370        fill it in with rawupdate.  This is the technique suggested to me
371        by Matthias.
372    
373        Some restrictions on when this is applicable:
374        1. It is not a VECTOR record.  The code generator currently does not handle
375           this case. VECTOR record uses double indirection like arrays.
376        2. All the record component values are defined in the same "basic block"
377           as the record constructor.  This is to prevent speculative
378           record construction.
379    
380    ----------------------------------------------------------------------
381    Name: Allen Leung
382    Date: 2002/02/22 01:02:00 EST
383    Tag: leunga-20020222-mlrisc-tools
384    
385    Minor bug fixes in the parser and rewriter
386    
387    ----------------------------------------------------------------------
388    Name: Allen Leung
389    Date: 2002/02/21 20:20:00 EST
390    Tag: leunga-20020221-peephole
391    
392    Regenerated the peephole files.  Some contained typos in the specification
393    and some didn't compile because of pretty printing bugs in the old version
394    of 'nowhere'.
395    
396    ----------------------------------------------------------------------
397    Name: Allen Leung
398    Date: 2002/02/19 20:20:00 EST
399    Tag: leunga-20020219-mlrisc-tools
400    Description:
401    
402       Minor bug fixes to the mlrisc-tools library:
403    
404       1.  Fixed up parsing colon suffixed keywords
405       2.  Added the ability to shut the error messages up
406       3.  Reimplemented the pretty printer and fixed up/improved
407           the pretty printing of handle and -> types.
408       4.  Fixed up generation of literal symbols in the nowhere tool.
409       5.  Added some SML keywords to to sml.sty
410    
411    ----------------------------------------------------------------------
412    Name: Matthias Blume
413    Date: 2002/02/19 16:20:00 EST
414    Tag: blume-20020219-cmffi
415    Description:
416    
417    A wild mix of changes, some minor, some major:
418    
419    * All C FFI-related libraries are now anchored under $c:
420        $/c.cm      --> $c/c.cm
421        $/c-int.cm  --> $c/internals/c-int.cm
422        $/memory.cm --> $c/memory/memory.cm
423    
424    * "make" tool (in CM) now treats its argument pathname slightly
425      differently:
426        1. If the native expansion is an absolute name, then before invoking
427           the "make" command on it, CM will apply OS.Path.mkRelative
428           (with relativeTo = OS.FileSys.getDir()) to it.
429        2. The argument will be passed through to subsequent phases of CM
430           processing without "going native".  In particular, if the argument
431           was an anchored path, then "make" will not lose track of that anchor.
432    
433    * Compiler backends now "know" their respective C calling conventions
434      instead of having to be told about it by ml-nlffigen.  This relieves
435      ml-nlffigen from one of its burdens.
436    
437    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
438    
439    * Export C_DEBUG and C_Debug from $c/c.cm.
440    
441    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
442      subtyping relationship between incomplete pointers and their complete
443      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
444      with the convention of instantiating 'o with ('t, 'c) obj whenever
445      the pointer target type is complete.  In the incomplete case, 'o
446      will be instantiated with some "'c iobj" -- a type obtained by
447      using one of the functors PointerToIncompleteType or PointerToCompleteType.
448    
449      Operations that work on both incomplete and complete pointer types are
450      typed as taking an 'o ptr while operations that require the target to
451      be known are typed as taking some ('t, 'c) obj ptr.
452    
453      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
454      where void is an eqtype without any values.  This makes it possible
455      to work on voidptr values using functions meant to operate on light
456      incomplete pointers.
457    
458    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
459      been vastly simplified.
460    
461    ----------------------------------------------------------------------
462    Name: Matthias Blume
463    Date: 2002/02/19 10:48:00 EST
464    Tag: blume-20020219-pqfix
465    Description:
466    
467    Applied Chris Okasaki's bug fix for priority queues.
468    
469    ----------------------------------------------------------------------
470    Name: Matthias Blume
471    Date: 2002/02/15 17:05:00
472    Tag: Release_110_39
473    Description:
474    
475    Last-minute retagging is becoming a tradition... :-(
476    
477    This is the working release 110.39.
478    
479    ----------------------------------------------------------------------
480    Name: Matthias Blume
481    Date: 2002/02/15 16:00:00 EST
482    Tag: Release_110_39-orig
483    Description:
484    
485    Working release 110.39.  New bootfiles.
486    
487    (Update: There was a small bug in the installer so it wouldn't work
488    with all shells.  So I retagged. -Matthias)
489    
490    ----------------------------------------------------------------------
491    Name: Matthias Blume
492    Date: 2002/02/15 14:17:00 EST
493    Tag: blume-20020215-showbindings
494    Description:
495    
496    Added EnvRef.listBoundSymbols and CM.State.showBindings.  Especially
497    the latter can be useful for exploring what bindings are available at
498    the interactive prompt.  (The first function returns only the list
499    of symbols that are really bound, the second prints those but also the
500    ones that CM's autoloading mechanism knows about.)
501    
502    ----------------------------------------------------------------------
503    Name: Matthias Blume
504    Date: 2002/02/15 12:08:00 EST
505    Tag: blume-20020215-iptrs
506    Description:
507    
508    Two improvements to ml-nlffigen:
509    
510      1. Write files only if they do not exist or if their current contents
511         do not coincide with what's being written.  (That is, avoid messing
512         with the time stamps unless absolutely necessary.)
513    
514      2. Implement a "repository" mechanism for generated files related
515         to "incomplete pointer types".   See the README file for details.
516    
517    ----------------------------------------------------------------------
518    Name: Matthias Blume
519    Date: 2002/02/14 11:50:00 EST
520    Tag: blume-20020214-quote
521    Description:
522    
523    Added a type 't t_' to tag.sml (in ml-nlffi-lib.cm).  This is required
524    because of the new and improved tag generation scheme.  (Thanks to Allen
525    Leung for pointing it out.)
526    
527    ----------------------------------------------------------------------
528    Name: Lal George
529    Date: 2002/02/14 09:55:27 EST 2002
530    Tag: george-20020214-isabelle-bug
531    Description:
532    
533    Fixed the MLRISC bug sent by Markus Wenzel regarding the compilation
534    of Isabelle on the x86.
535    
536    From Allen:
537    -----------
538     I've found the problem:
539    
540         in ra-core.sml, I use the counter "blocked" to keep track of the
541         true number of elements in the freeze queue.  When the counter goes
542         to zero, I skip examining the queue.  But I've messed up the
543         bookkeeping in combine():
544    
545             else ();
546             case !ucol of
547               PSEUDO => (if !cntv > 0 then
548                     (if !cntu > 0 then blocked := !blocked - 1 else ();
549                                        ^^^^^^^^^^^^^^^^^^^^^^^
550                      moveu := mergeMoveList(!movev, !moveu)
551                     )
552                  else ();
553    
554         combine() is called to coalesce two nodes u and v.
555         I think I was thinking that if the move counts of u and v are both
556         greater than zero then after they are coalesced then one node is
557         removed from the freeze queue.  Apparently I was thinking that
558         both u and v are of low degree, but that's clearly not necessarily true.
559    
560    
561    02/12/2002:
562        Here's the patch.  HOL now compiles.
563    
564        I don't know how this impact on performance (compile
565        time or runtime).  This bug caused the RA (especially on the x86)
566        to go thru the potential spill phase when there are still nodes on the
567        freeze queue.
568    
569    
570    
571    
572    ----------------------------------------------------------------------
573  Name: Matthias Blume  Name: Matthias Blume
574  Date: 2002/02/13 22:40:00 EST  Date: 2002/02/13 22:40:00 EST
575  Tag: blume-20020213-fptr-rtti  Tag: blume-20020213-fptr-rtti

Legend:
Removed from v.1064  
changed lines
  Added in v.1130

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0