Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1085, Fri Feb 22 00:15:55 2002 UTC revision 1133, Tue Mar 12 03:56:23 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Lal George
17    Date: 2002/03/11 11 22:38:53 EST
18    Tag: george-20020311-jump-chain-elim
19    Description:
20    
21    Tested the jump chain elimination on all architectures (except the
22    hppa).  This is on by default right now and is profitable for the
23    alpha and x86, however, it may not be profitable for the sparc and ppc
24    when compiling the compiler.
25    
26    The gc test will typically jump to a label at the end of the cluster,
27    where there is another jump to an external cluster containing the actual
28    code to invoke gc. This is to allow factoring of common gc invocation
29    sequences. That is to say, we generate:
30    
31            f:
32               testgc
33               ja   L1      % jump if above to L1
34    
35            L1:
36               jmp L2
37    
38    
39    After jump chain elimination the 'ja L1' instructions is converted to
40    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
41    up being implemented in their long form (if L2 is far away) using:
42    
43            jbe     L3      % jump if below or equal to L3
44            jmp     L2
45         L3:
46            ...
47    
48    
49    For large compilation units L2  may be far away.
50    
51    
52    ----------------------------------------------------------------------
53    Name: Matthias Blume
54    Date: 2002/03/11 13:30:00 EST
55    Tag: blume-20020311-mltreeeval
56    Description:
57    
58    A functor parameter was missing.
59    
60    ----------------------------------------------------------------------
61    Name: Allen Leung
62    Date: 2002/03/11 10:30:00 EST
63    Tag: leunga-20020310-runtime-string0
64    Description:
65    
66       The representation of the empty string now points to a
67    legal null terminated C string instead of unit.  It is now possible
68    to convert an ML string into C string with InlineT.CharVector.getData.
69    This compiles into one single machine instruction.
70    
71    ----------------------------------------------------------------------
72    Name: Allen Leung
73    Date: 2002/03/10 23:55:00 EST
74    Tag: leunga-20020310-x86-call
75    Description:
76    
77       Added machine generation for CALL instruction (relative displacement mode)
78    
79    ----------------------------------------------------------------------
80    Name: Matthias Blume
81    Date: 2002/03/08 16:05:00
82    Tag: blume-20020308-entrypoints
83    Description:
84    
85    Version number bumped to 110.39.1.  NEW BOOTFILES!
86    
87    Entrypoints: non-zero offset into a code object where execution should begin.
88    
89    - Added the notion of an entrypoint to CodeObj.
90    - Added reading/writing of entrypoint info to Binfile.
91    - Made runtime system bootloader aware of entrypoints.
92    - Use the address of the label of the first function given to mlriscGen
93      as the entrypoint.  This address is currently always 0, but it will
94      not be 0 once we turn on block placement.
95    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
96      with entry points) from mlriscGen.
97    
98    ----------------------------------------------------------------------
99    Name: Allen Leung
100    Date: 2002/03/07 20:45:00 EST
101    Tag: leunga-20020307-x86-cmov
102    Description:
103    
104       Bug fixes for CMOVcc on x86.
105    
106       1. Added machine code generation for CMOVcc
107       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
108       3. CMOVcc cannot have an immediate operand as argument.
109    
110    ----------------------------------------------------------------------
111    Name: Matthias Blume
112    Date: 2002/03/07 16:15:00 EST
113    Tag: blume-20020307-controls
114    Description:
115    
116    This is a very large but mostly boring patch which makes (almost)
117    every tuneable compiler knob (i.e., pretty much everything under
118    Control.* plus a few other things) configurable via both the command
119    line and environment variables in the style CM did its configuration
120    until now.
121    
122    Try starting sml with '-h' (or, if you are brave, '-H')
123    
124    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
125    implements the underlying generic mechanism.
126    
127    The interface to some of the existing such facilities has changed somewhat.
128    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
129    (The getFoo interface is still there for backward-compatibility, but its
130    use is deprecated.)
131    
132    The ml-build script passes -Cxxx=yyy command-line arguments through so
133    that one can now twiddle the compiler settings when using this "batch"
134    compiler.
135    
136    TODO items:
137    
138    We should go through and throw out all controls that are no longer
139    connected to anything.  Moreover, we should go through and provide
140    meaningful (and correct!) documentation strings for those controls
141    that still are connected.
142    
143    Currently, multiple calls to Controls.new are accepted (only the first
144    has any effect).  Eventually we should make sure that every control
145    is being made (via Controls.new) exactly once.  Future access can then
146    be done using Controls.acc.
147    
148    Finally, it would probably be a good idea to use the getter-setter
149    interface to controls rather than ref cells.  For the time being, both
150    styles are provided by the Controls module, but getter-setter pairs are
151    better if thread-safety is of any concern because they can be wrapped.
152    
153    *****************************************
154    
155    One bug fix: The function blockPlacement in three of the MLRISC
156    backpatch files used to be hard-wired to one of two possibilities at
157    link time (according to the value of the placementFlag).  But (I
158    think) it should rather sense the flag every time.
159    
160    *****************************************
161    
162    Other assorted changes (by other people who did not supply a HISTORY entry):
163    
164    1. the cross-module inliner now works much better (Monnier)
165    2. representation of weights, frequencies, and probabilities in MLRISC
166       changed in preparation of using those for weighted block placement
167       (Reppy, George)
168    
169    ----------------------------------------------------------------------
170    Name: Lal George
171    Date: 2002/03/07 14:44:24 EST 2002
172    Tag: george-20020307-weighted-block-placement
173    
174    Tested the weighted block placement optimization on all architectures
175    (except the hppa) using AMPL to generate the block and edge frequencies.
176    Changes were required in the machine properties to correctly
177    categorize trap instructions. There is an MLRISC flag
178    "weighted-block-placement" that can be used to enable weighted block
179    placement, but this will be ineffective without block/edge
180    frequencies (coming soon).
181    
182    
183    ----------------------------------------------------------------------
184    Name: Lal George
185    Date: 2002/03/05 17:24:48 EST
186    Tag: george-20020305-linkage-cluster
187    
188    In order to support the block placement optimization, a new cluster
189    is generated as the very first cluster (called the linkage cluster).
190    It contains a single jump to the 'real' entry point for the compilation
191    unit. Block placement has no effect on the linkage cluster itself, but
192    all the other clusters  have full freedom in the manner in which they
193    reorder blocks or functions.
194    
195    On the x86 the typical linkage code that is generated is:
196       ----------------------
197            .align 2
198       L0:
199            addl    $L1-L0, 72(%esp)
200            jmp     L1
201    
202    
203            .align  2
204       L1:
205       ----------------------
206    
207    72(%esp) is the memory location for the stdlink register. This
208    must contain the address of the CPS function being called. In the
209    above example, it contains the address of  L0; before
210    calling L1 (the real entry point for the compilation unit), it
211    must contain the address for L1, and hence
212    
213            addl $L1-L0, 72(%esp)
214    
215    I have tested this on all architectures except the hppa.The increase
216    in code size is of course negligible
217    
218    ----------------------------------------------------------------------
219    Name: Allen Leung
220    Date: 2002/03/03 13:20:00 EST
221    Tag: leunga-20020303-mlrisc-tools
222    
223      Added #[ ... ] expressions to mlrisc tools
224    
225    ----------------------------------------------------------------------
226    Name: Matthias Blume
227    Date: 2002/02/27 12:29:00 EST
228    Tag: blume-20020227-cdebug
229    Description:
230    
231    - made types in structure C and C_Debug to be equal
232    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
233    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
234    
235    ----------------------------------------------------------------------
236    Name: Matthias Blume
237    Date: 2002/02/26 12:00:00 EST
238    Tag: blume-20020226-ffi
239    Description:
240    
241    1. Fixed a minor bug in CM's "noweb" tool:
242       If numbering is turned off, then truly don't number (i.e., do not
243       supply the -L option to noweb).  The previous behavior was to supply
244       -L'' -- which caused noweb to use the "default" line numbering scheme.
245       Thanks to Chris Richards for pointing this out (and supplying the fix).
246    
247    2. Once again, I reworked some aspects of the FFI:
248    
249       A. The incomplete/complete type business:
250    
251       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
252         gone!
253       - ML types representing an incomplete type are now *equal* to
254         ML types representing their corresponding complete types (just like
255         in C).  This is still safe because ml-nlffigen will not generate
256         RTTI for incomplete types, nor will it generate functions that
257         require access to such RTTI.   But when ML code generated from both
258         incomplete and complete versions of the C type meet, the ML types
259         are trivially interoperable.
260    
261         NOTE:  These changes restore the full generality of the translation
262         (which was previously lost when I eliminated functorization)!
263    
264       B. Enum types:
265    
266       - Structure C now has a type constructor "enum" that is similar to
267         how the "su" constructor works.  However, "enum" is not a phantom
268         type because each "T enum" has values (and is isomorphic to
269         MLRep.Signed.int).
270       - There are generic access operations for enum objects (using
271         MLRep.Signed.int).
272       - ml-nlffigen will generate a structure E_foo for each "enum foo".
273         * The structure contains the definition of type "mlrep" (the ML-side
274         representation type of the enum).  Normally, mlrep is the same
275         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
276         then mlrep will be defined as a datatype -- thus facilitating
277         pattern matching on mlrep values.
278         ("-ec" will be suppressed if there are duplicate values in an
279          enumeration.)
280         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
281         will be generated for each C enum constant xxx.
282         * Conversion functions m2i and i2m convert between mlrep and
283         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
284         * Coversion functions c and ml convert between mlrep and "tag enum".
285         * Access functions (get/set) fetch and store mlrep values.
286       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
287         enumerations are merged into one single enumeration represented by
288         structure E_'.
289    
290    ----------------------------------------------------------------------
291    Name: Allen Leung
292    Date: 2002/02/25 04:45:00 EST
293    Tag: leunga-20020225-cps-spill
294    
295    This is a new implementation of the CPS spill phase.
296    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
297    In case of problems, replace it with the old file spill.sml
298    
299    The current compiler runs into some serious performance problems when
300    constructing a large record.  This can happen when we try to compile a
301    structure with many items.  Even a very simple structure like the following
302    makes the compiler slow down.
303    
304        structure Foo = struct
305           val x_1 = 0w1 : Word32.int
306           val x_2 = 0w2 : Word32.int
307           val x_3 = 0w3 : Word32.int
308           ...
309           val x_N = 0wN : Word32.int
310        end
311    
312    The following table shows the compile time, from N=1000 to N=4000,
313    with the old compiler:
314    
315    N
316    1000   CPS 100 spill                           0.04u  0.00s  0.00g
317           MLRISC ra                               0.06u  0.00s  0.05g
318              (spills = 0 reloads = 0)
319           TOTAL                                   0.63u  0.07s  0.21g
320    
321    1100   CPS 100 spill                           8.25u  0.32s  0.64g
322           MLRISC ra                               5.68u  0.59s  3.93g
323              (spills = 0 reloads = 0)
324           TOTAL                                   14.71u  0.99s  4.81g
325    
326    1500   CPS 100 spill                           58.55u  2.34s  1.74g
327           MLRISC ra                               5.54u  0.65s  3.91g
328              (spills = 543 reloads = 1082)
329           TOTAL                                   65.40u  3.13s  6.00g
330    
331    2000   CPS 100 spill                           126.69u  4.84s  3.08g
332           MLRISC ra                               0.80u  0.10s  0.55g
333              (spills = 42 reloads = 84)
334           TOTAL                                   129.42u  5.10s  4.13g
335    
336    3000   CPS 100 spill                           675.59u  19.03s  11.64g
337           MLRISC ra                               2.69u  0.27s  1.38g
338              (spills = 62 reloads = 124)
339           TOTAL                                   682.48u  19.61s  13.99g
340    
341    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
342           MLRISC ra                               4.96u  0.27s  2.72g
343              (spills = 85 reloads = 170)
344           TOTAL                                   2375.26u  57.21s  48.00g
345    
346    As you can see the old cps spill module suffers from some serious
347    performance problem.  But since I cannot decipher the old code fully,
348    instead of patching the problems up, I'm reimplementing it
349    with a different algorithm.  The new code is more modular,
350    smaller when compiled, and substantially faster
351    (O(n log n) time and O(n) space).  Timing of the new spill module:
352    
353    4000  CPS 100 spill                           0.02u  0.00s  0.00g
354          MLRISC ra                               0.25u  0.02s  0.15g
355             (spills=1 reloads=3)
356          TOTAL                                   7.74u  0.34s  1.62g
357    
358    Implementation details:
359    
360    As far as I can tell, the purpose of the CPS spill module is to make sure the
361    number of live variables at any program point (the bandwidth)
362    does not exceed a certain limit, which is determined by the
363    size of the spill area.
364    
365    When the bandwidth is too large, we decrease the register pressure by
366    packing live variables into spill records.  How we achieve this is
367    completely different than what we did in the old code.
368    
369    First, there is something about the MLRiscGen code generator
370    that we should be aware of:
371    
372    o MLRiscGen performs code motion!
373    
374       In particular, it will move floating point computations and
375       address computations involving only the heap pointer to
376       their use sites (if there is only a single use).
377       What this means is that if we have a CPS record construction
378       statement
379    
380           RECORD(k,vl,w,e)
381    
382       we should never count the new record address w as live if w
383       has only one use (which is often the case).
384    
385       We should do something similar to floating point, but the transformation
386       there is much more complex, so I won't deal with that.
387    
388    Secondly, there are now two new cps primops at our disposal:
389    
390     1. rawrecord of record_kind option
391        This pure operator allocates some uninitialized storage from the heap.
392        There are two forms:
393    
394         rawrecord NONE [INT n]  allocates a tagless record of length n
395         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
396                                     and initializes the tag.
397    
398     2. rawupdate of cty
399          rawupdate cty (v,i,x)
400          Assigns to x to the ith component of record v.
401          The storelist is not updated.
402    
403    We use these new primops for both spilling and increment record construction.
404    
405     1. Spilling.
406    
407        This is implemented with a linear scan algorithm (but generalized
408        to trees).  The algorithm will create a single spill record at the
409        beginning of the cps function and use rawupdate to spill to it,
410        and SELECT or SELp to reload from it.  So both spills and reloads
411        are fine-grain operations.  In contrast, in the old algorithm
412        "spills" have to be bundled together in records.
413    
414        Ideally, we should sink the spill record construction to where
415        it is needed.  We can even split the spill record into multiple ones
416        at the places where they are needed.  But CPS is not a good
417        representation for global code motion, so I'll keep it simple and
418        am not attempting this.
419    
420     2. Incremental record construction (aka record splitting).
421    
422        Long records with many component values which are simulatenously live
423        (recall that single use record addresses are not considered to
424         be live) are constructed with rawrecord and rawupdate.
425        We allocate space on the heap with rawrecord first, then gradually
426        fill it in with rawupdate.  This is the technique suggested to me
427        by Matthias.
428    
429        Some restrictions on when this is applicable:
430        1. It is not a VECTOR record.  The code generator currently does not handle
431           this case. VECTOR record uses double indirection like arrays.
432        2. All the record component values are defined in the same "basic block"
433           as the record constructor.  This is to prevent speculative
434           record construction.
435    
436    ----------------------------------------------------------------------
437    Name: Allen Leung
438    Date: 2002/02/22 01:02:00 EST
439    Tag: leunga-20020222-mlrisc-tools
440    
441    Minor bug fixes in the parser and rewriter
442    
443    ----------------------------------------------------------------------
444  Name: Allen Leung  Name: Allen Leung
445  Date: 2002/02/21 20:20:00 EST  Date: 2002/02/21 20:20:00 EST
446  Tag: leunga-20020221-peephole  Tag: leunga-20020221-peephole

Legend:
Removed from v.1085  
changed lines
  Added in v.1133

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0