Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1078, Tue Feb 19 21:26:48 2002 UTC revision 1136, Tue Mar 12 19:44:02 2002 UTC
# Line 11  Line 11 
11  Date: yyyy/mm/dd  Date: yyyy/mm/dd
12  Tag: <post-commit CVS tag>  Tag: <post-commit CVS tag>
13  Description:  Description:
14    ----------------------------------------------------------------------
15    Name: Lal George
16    Date: 2002/03/12 12 14:42:36 EST
17    Tag: george-20020312-frequency-computation
18    Description:
19    
20    Integrated jump chaining and static block frequency into the
21    compiler. More details and numbers later.
22    
23    ----------------------------------------------------------------------
24    Name: Lal George
25    Date: 2002/03/11 11 22:38:53 EST
26    Tag: george-20020311-jump-chain-elim
27    Description:
28    
29    Tested the jump chain elimination on all architectures (except the
30    hppa).  This is on by default right now and is profitable for the
31    alpha and x86, however, it may not be profitable for the sparc and ppc
32    when compiling the compiler.
33    
34    The gc test will typically jump to a label at the end of the cluster,
35    where there is another jump to an external cluster containing the actual
36    code to invoke gc. This is to allow factoring of common gc invocation
37    sequences. That is to say, we generate:
38    
39            f:
40               testgc
41               ja   L1      % jump if above to L1
42    
43            L1:
44               jmp L2
45    
46    
47    After jump chain elimination the 'ja L1' instructions is converted to
48    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
49    up being implemented in their long form (if L2 is far away) using:
50    
51            jbe     L3      % jump if below or equal to L3
52            jmp     L2
53         L3:
54            ...
55    
56    
57    For large compilation units L2  may be far away.
58    
59    
60    ----------------------------------------------------------------------
61    Name: Matthias Blume
62    Date: 2002/03/11 13:30:00 EST
63    Tag: blume-20020311-mltreeeval
64    Description:
65    
66    A functor parameter was missing.
67    
68    ----------------------------------------------------------------------
69    Name: Allen Leung
70    Date: 2002/03/11 10:30:00 EST
71    Tag: leunga-20020310-runtime-string0
72    Description:
73    
74       The representation of the empty string now points to a
75    legal null terminated C string instead of unit.  It is now possible
76    to convert an ML string into C string with InlineT.CharVector.getData.
77    This compiles into one single machine instruction.
78    
79    ----------------------------------------------------------------------
80    Name: Allen Leung
81    Date: 2002/03/10 23:55:00 EST
82    Tag: leunga-20020310-x86-call
83    Description:
84    
85       Added machine generation for CALL instruction (relative displacement mode)
86    
87    ----------------------------------------------------------------------
88    Name: Matthias Blume
89    Date: 2002/03/08 16:05:00
90    Tag: blume-20020308-entrypoints
91    Description:
92    
93    Version number bumped to 110.39.1.  NEW BOOTFILES!
94    
95    Entrypoints: non-zero offset into a code object where execution should begin.
96    
97    - Added the notion of an entrypoint to CodeObj.
98    - Added reading/writing of entrypoint info to Binfile.
99    - Made runtime system bootloader aware of entrypoints.
100    - Use the address of the label of the first function given to mlriscGen
101      as the entrypoint.  This address is currently always 0, but it will
102      not be 0 once we turn on block placement.
103    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
104      with entry points) from mlriscGen.
105    
106    ----------------------------------------------------------------------
107    Name: Allen Leung
108    Date: 2002/03/07 20:45:00 EST
109    Tag: leunga-20020307-x86-cmov
110    Description:
111    
112       Bug fixes for CMOVcc on x86.
113    
114       1. Added machine code generation for CMOVcc
115       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
116       3. CMOVcc cannot have an immediate operand as argument.
117    
118    ----------------------------------------------------------------------
119    Name: Matthias Blume
120    Date: 2002/03/07 16:15:00 EST
121    Tag: blume-20020307-controls
122    Description:
123    
124    This is a very large but mostly boring patch which makes (almost)
125    every tuneable compiler knob (i.e., pretty much everything under
126    Control.* plus a few other things) configurable via both the command
127    line and environment variables in the style CM did its configuration
128    until now.
129    
130    Try starting sml with '-h' (or, if you are brave, '-H')
131    
132    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
133    implements the underlying generic mechanism.
134    
135    The interface to some of the existing such facilities has changed somewhat.
136    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
137    (The getFoo interface is still there for backward-compatibility, but its
138    use is deprecated.)
139    
140    The ml-build script passes -Cxxx=yyy command-line arguments through so
141    that one can now twiddle the compiler settings when using this "batch"
142    compiler.
143    
144    TODO items:
145    
146    We should go through and throw out all controls that are no longer
147    connected to anything.  Moreover, we should go through and provide
148    meaningful (and correct!) documentation strings for those controls
149    that still are connected.
150    
151    Currently, multiple calls to Controls.new are accepted (only the first
152    has any effect).  Eventually we should make sure that every control
153    is being made (via Controls.new) exactly once.  Future access can then
154    be done using Controls.acc.
155    
156    Finally, it would probably be a good idea to use the getter-setter
157    interface to controls rather than ref cells.  For the time being, both
158    styles are provided by the Controls module, but getter-setter pairs are
159    better if thread-safety is of any concern because they can be wrapped.
160    
161    *****************************************
162    
163    One bug fix: The function blockPlacement in three of the MLRISC
164    backpatch files used to be hard-wired to one of two possibilities at
165    link time (according to the value of the placementFlag).  But (I
166    think) it should rather sense the flag every time.
167    
168    *****************************************
169    
170    Other assorted changes (by other people who did not supply a HISTORY entry):
171    
172    1. the cross-module inliner now works much better (Monnier)
173    2. representation of weights, frequencies, and probabilities in MLRISC
174       changed in preparation of using those for weighted block placement
175       (Reppy, George)
176    
177    ----------------------------------------------------------------------
178    Name: Lal George
179    Date: 2002/03/07 14:44:24 EST 2002
180    Tag: george-20020307-weighted-block-placement
181    
182    Tested the weighted block placement optimization on all architectures
183    (except the hppa) using AMPL to generate the block and edge frequencies.
184    Changes were required in the machine properties to correctly
185    categorize trap instructions. There is an MLRISC flag
186    "weighted-block-placement" that can be used to enable weighted block
187    placement, but this will be ineffective without block/edge
188    frequencies (coming soon).
189    
190    
191    ----------------------------------------------------------------------
192    Name: Lal George
193    Date: 2002/03/05 17:24:48 EST
194    Tag: george-20020305-linkage-cluster
195    
196    In order to support the block placement optimization, a new cluster
197    is generated as the very first cluster (called the linkage cluster).
198    It contains a single jump to the 'real' entry point for the compilation
199    unit. Block placement has no effect on the linkage cluster itself, but
200    all the other clusters  have full freedom in the manner in which they
201    reorder blocks or functions.
202    
203    On the x86 the typical linkage code that is generated is:
204       ----------------------
205            .align 2
206       L0:
207            addl    $L1-L0, 72(%esp)
208            jmp     L1
209    
210    
211            .align  2
212       L1:
213       ----------------------
214    
215    72(%esp) is the memory location for the stdlink register. This
216    must contain the address of the CPS function being called. In the
217    above example, it contains the address of  L0; before
218    calling L1 (the real entry point for the compilation unit), it
219    must contain the address for L1, and hence
220    
221            addl $L1-L0, 72(%esp)
222    
223    I have tested this on all architectures except the hppa.The increase
224    in code size is of course negligible
225    
226    ----------------------------------------------------------------------
227    Name: Allen Leung
228    Date: 2002/03/03 13:20:00 EST
229    Tag: leunga-20020303-mlrisc-tools
230    
231      Added #[ ... ] expressions to mlrisc tools
232    
233    ----------------------------------------------------------------------
234    Name: Matthias Blume
235    Date: 2002/02/27 12:29:00 EST
236    Tag: blume-20020227-cdebug
237    Description:
238    
239    - made types in structure C and C_Debug to be equal
240    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
241    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
242    
243    ----------------------------------------------------------------------
244    Name: Matthias Blume
245    Date: 2002/02/26 12:00:00 EST
246    Tag: blume-20020226-ffi
247    Description:
248    
249    1. Fixed a minor bug in CM's "noweb" tool:
250       If numbering is turned off, then truly don't number (i.e., do not
251       supply the -L option to noweb).  The previous behavior was to supply
252       -L'' -- which caused noweb to use the "default" line numbering scheme.
253       Thanks to Chris Richards for pointing this out (and supplying the fix).
254    
255    2. Once again, I reworked some aspects of the FFI:
256    
257       A. The incomplete/complete type business:
258    
259       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
260         gone!
261       - ML types representing an incomplete type are now *equal* to
262         ML types representing their corresponding complete types (just like
263         in C).  This is still safe because ml-nlffigen will not generate
264         RTTI for incomplete types, nor will it generate functions that
265         require access to such RTTI.   But when ML code generated from both
266         incomplete and complete versions of the C type meet, the ML types
267         are trivially interoperable.
268    
269         NOTE:  These changes restore the full generality of the translation
270         (which was previously lost when I eliminated functorization)!
271    
272       B. Enum types:
273    
274       - Structure C now has a type constructor "enum" that is similar to
275         how the "su" constructor works.  However, "enum" is not a phantom
276         type because each "T enum" has values (and is isomorphic to
277         MLRep.Signed.int).
278       - There are generic access operations for enum objects (using
279         MLRep.Signed.int).
280       - ml-nlffigen will generate a structure E_foo for each "enum foo".
281         * The structure contains the definition of type "mlrep" (the ML-side
282         representation type of the enum).  Normally, mlrep is the same
283         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
284         then mlrep will be defined as a datatype -- thus facilitating
285         pattern matching on mlrep values.
286         ("-ec" will be suppressed if there are duplicate values in an
287          enumeration.)
288         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
289         will be generated for each C enum constant xxx.
290         * Conversion functions m2i and i2m convert between mlrep and
291         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
292         * Coversion functions c and ml convert between mlrep and "tag enum".
293         * Access functions (get/set) fetch and store mlrep values.
294       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
295         enumerations are merged into one single enumeration represented by
296         structure E_'.
297    
298    ----------------------------------------------------------------------
299    Name: Allen Leung
300    Date: 2002/02/25 04:45:00 EST
301    Tag: leunga-20020225-cps-spill
302    
303    This is a new implementation of the CPS spill phase.
304    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
305    In case of problems, replace it with the old file spill.sml
306    
307    The current compiler runs into some serious performance problems when
308    constructing a large record.  This can happen when we try to compile a
309    structure with many items.  Even a very simple structure like the following
310    makes the compiler slow down.
311    
312        structure Foo = struct
313           val x_1 = 0w1 : Word32.int
314           val x_2 = 0w2 : Word32.int
315           val x_3 = 0w3 : Word32.int
316           ...
317           val x_N = 0wN : Word32.int
318        end
319    
320    The following table shows the compile time, from N=1000 to N=4000,
321    with the old compiler:
322    
323    N
324    1000   CPS 100 spill                           0.04u  0.00s  0.00g
325           MLRISC ra                               0.06u  0.00s  0.05g
326              (spills = 0 reloads = 0)
327           TOTAL                                   0.63u  0.07s  0.21g
328    
329    1100   CPS 100 spill                           8.25u  0.32s  0.64g
330           MLRISC ra                               5.68u  0.59s  3.93g
331              (spills = 0 reloads = 0)
332           TOTAL                                   14.71u  0.99s  4.81g
333    
334    1500   CPS 100 spill                           58.55u  2.34s  1.74g
335           MLRISC ra                               5.54u  0.65s  3.91g
336              (spills = 543 reloads = 1082)
337           TOTAL                                   65.40u  3.13s  6.00g
338    
339    2000   CPS 100 spill                           126.69u  4.84s  3.08g
340           MLRISC ra                               0.80u  0.10s  0.55g
341              (spills = 42 reloads = 84)
342           TOTAL                                   129.42u  5.10s  4.13g
343    
344    3000   CPS 100 spill                           675.59u  19.03s  11.64g
345           MLRISC ra                               2.69u  0.27s  1.38g
346              (spills = 62 reloads = 124)
347           TOTAL                                   682.48u  19.61s  13.99g
348    
349    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
350           MLRISC ra                               4.96u  0.27s  2.72g
351              (spills = 85 reloads = 170)
352           TOTAL                                   2375.26u  57.21s  48.00g
353    
354    As you can see the old cps spill module suffers from some serious
355    performance problem.  But since I cannot decipher the old code fully,
356    instead of patching the problems up, I'm reimplementing it
357    with a different algorithm.  The new code is more modular,
358    smaller when compiled, and substantially faster
359    (O(n log n) time and O(n) space).  Timing of the new spill module:
360    
361    4000  CPS 100 spill                           0.02u  0.00s  0.00g
362          MLRISC ra                               0.25u  0.02s  0.15g
363             (spills=1 reloads=3)
364          TOTAL                                   7.74u  0.34s  1.62g
365    
366    Implementation details:
367    
368    As far as I can tell, the purpose of the CPS spill module is to make sure the
369    number of live variables at any program point (the bandwidth)
370    does not exceed a certain limit, which is determined by the
371    size of the spill area.
372    
373    When the bandwidth is too large, we decrease the register pressure by
374    packing live variables into spill records.  How we achieve this is
375    completely different than what we did in the old code.
376    
377    First, there is something about the MLRiscGen code generator
378    that we should be aware of:
379    
380    o MLRiscGen performs code motion!
381    
382       In particular, it will move floating point computations and
383       address computations involving only the heap pointer to
384       their use sites (if there is only a single use).
385       What this means is that if we have a CPS record construction
386       statement
387    
388           RECORD(k,vl,w,e)
389    
390       we should never count the new record address w as live if w
391       has only one use (which is often the case).
392    
393       We should do something similar to floating point, but the transformation
394       there is much more complex, so I won't deal with that.
395    
396    Secondly, there are now two new cps primops at our disposal:
397    
398     1. rawrecord of record_kind option
399        This pure operator allocates some uninitialized storage from the heap.
400        There are two forms:
401    
402         rawrecord NONE [INT n]  allocates a tagless record of length n
403         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
404                                     and initializes the tag.
405    
406     2. rawupdate of cty
407          rawupdate cty (v,i,x)
408          Assigns to x to the ith component of record v.
409          The storelist is not updated.
410    
411    We use these new primops for both spilling and increment record construction.
412    
413     1. Spilling.
414    
415        This is implemented with a linear scan algorithm (but generalized
416        to trees).  The algorithm will create a single spill record at the
417        beginning of the cps function and use rawupdate to spill to it,
418        and SELECT or SELp to reload from it.  So both spills and reloads
419        are fine-grain operations.  In contrast, in the old algorithm
420        "spills" have to be bundled together in records.
421    
422        Ideally, we should sink the spill record construction to where
423        it is needed.  We can even split the spill record into multiple ones
424        at the places where they are needed.  But CPS is not a good
425        representation for global code motion, so I'll keep it simple and
426        am not attempting this.
427    
428     2. Incremental record construction (aka record splitting).
429    
430        Long records with many component values which are simulatenously live
431        (recall that single use record addresses are not considered to
432         be live) are constructed with rawrecord and rawupdate.
433        We allocate space on the heap with rawrecord first, then gradually
434        fill it in with rawupdate.  This is the technique suggested to me
435        by Matthias.
436    
437        Some restrictions on when this is applicable:
438        1. It is not a VECTOR record.  The code generator currently does not handle
439           this case. VECTOR record uses double indirection like arrays.
440        2. All the record component values are defined in the same "basic block"
441           as the record constructor.  This is to prevent speculative
442           record construction.
443    
444    ----------------------------------------------------------------------
445    Name: Allen Leung
446    Date: 2002/02/22 01:02:00 EST
447    Tag: leunga-20020222-mlrisc-tools
448    
449    Minor bug fixes in the parser and rewriter
450    
451    ----------------------------------------------------------------------
452    Name: Allen Leung
453    Date: 2002/02/21 20:20:00 EST
454    Tag: leunga-20020221-peephole
455    
456    Regenerated the peephole files.  Some contained typos in the specification
457    and some didn't compile because of pretty printing bugs in the old version
458    of 'nowhere'.
459    
460    ----------------------------------------------------------------------
461    Name: Allen Leung
462    Date: 2002/02/19 20:20:00 EST
463    Tag: leunga-20020219-mlrisc-tools
464    Description:
465    
466       Minor bug fixes to the mlrisc-tools library:
467    
468       1.  Fixed up parsing colon suffixed keywords
469       2.  Added the ability to shut the error messages up
470       3.  Reimplemented the pretty printer and fixed up/improved
471           the pretty printing of handle and -> types.
472       4.  Fixed up generation of literal symbols in the nowhere tool.
473       5.  Added some SML keywords to to sml.sty
474    
475  ----------------------------------------------------------------------  ----------------------------------------------------------------------
476  Name: Matthias Blume  Name: Matthias Blume

Legend:
Removed from v.1078  
changed lines
  Added in v.1136

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0