Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1086, Fri Feb 22 05:56:29 2002 UTC revision 1094, Mon Feb 25 09:58:56 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Allen Leung  Name: Allen Leung
17    Date: 2002/02/25 04:45:00 EST
18    Tag: leunga-20020225-cps-spill
19    
20    This is a new implementation of the CPS spill phase.
21    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
22    In case of problems, replace it with the old file spill.sml
23    
24    The current compiler runs into some serious performance problems when
25    constructing a large record.  This can happen when we try to compile a
26    structure with many items.  Even a very simple structure like the following
27    makes the compiler slow down.
28    
29        structure Foo = struct
30           val x_1 = 0w1 : Word32.int
31           val x_2 = 0w2 : Word32.int
32           val x_3 = 0w3 : Word32.int
33           ...
34           val x_N = 0wN : Word32.int
35        end
36    
37    The following table shows the compile time, from N=1000 to N=4000,
38    with the old compiler:
39    
40    N
41    1000   CPS 100 spill                           0.04u  0.00s  0.00g
42           MLRISC ra                               0.06u  0.00s  0.05g
43              (spills = 0 reloads = 0)
44           TOTAL                                   0.63u  0.07s  0.21g
45    
46    1100   CPS 100 spill                           8.25u  0.32s  0.64g
47           MLRISC ra                               5.68u  0.59s  3.93g
48              (spills = 0 reloads = 0)
49           TOTAL                                   14.71u  0.99s  4.81g
50    
51    1500   CPS 100 spill                           58.55u  2.34s  1.74g
52           MLRISC ra                               5.54u  0.65s  3.91g
53              (spills = 543 reloads = 1082)
54           TOTAL                                   65.40u  3.13s  6.00g
55    
56    2000   CPS 100 spill                           126.69u  4.84s  3.08g
57           MLRISC ra                               0.80u  0.10s  0.55g
58              (spills = 42 reloads = 84)
59           TOTAL                                   129.42u  5.10s  4.13g
60    
61    3000   CPS 100 spill                           675.59u  19.03s  11.64g
62           MLRISC ra                               2.69u  0.27s  1.38g
63              (spills = 62 reloads = 124)
64           TOTAL                                   682.48u  19.61s  13.99g
65    
66    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
67           MLRISC ra                               4.96u  0.27s  2.72g
68              (spills = 85 reloads = 170)
69           TOTAL                                   2375.26u  57.21s  48.00g
70    
71    As you can see the old cps spill module suffers from some serious
72    performance problem but since I cannot decipher the old code fully,
73    innstead of patching the problems up, I'm reimplementing it
74    with a different algorithm.  The new code is more modular,
75    smaller when compiled, and substantially faster
76    (O(n log n) time and O(n) space).  Timing of the new spill module:
77    
78    4000  CPS 100 spill                           0.02u  0.00s  0.00g
79          MLRISC ra                               0.25u  0.02s  0.15g
80             (spills=1 reloads=3)
81          TOTAL                                   7.74u  0.34s  1.62g
82    
83    Implementation details:
84    
85    As far as I can tell, the purpose of the CPS spill module is to make sure the
86    number of live variables at any program point (the bandwidth)
87    does not exceed a certain limit, which is determined by the
88    size of the spill area.
89    
90    When the bandwidth is too large, we decrease the register pressure by
91    packing live variables into spill records.  How we achieve this is
92    completely different than what we did in the old code.
93    
94    First, there is something about the MLRiscGen code generator
95    that we should be aware of:
96    
97    o MLRiscGen performs code motion!
98    
99       In particular, it will move floating point computations and
100       address computations involving only the heap pointer to
101       their use sites (if there is only a single use).
102       What this means is that if we have a CPS record construction
103       statement
104    
105           RECORD(k,vl,w,e)
106    
107       we should never count the new record address w as live if w
108       has only one use (which is often the case).
109    
110       We should do something similar to floating point, but the transformation
111       there is much more complex, so I won't deal with that.
112    
113    Secondly, there are now two new cps primops at our disposal:
114    
115     1. rawrecord of record_kind option
116        This pure operator allocates some uninitialized storage from the heap.
117        There are two forms:
118    
119         rawrecord NONE [INT n]  allocates a tagless record of length n
120         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
121                                     and initializes the tag.
122    
123     2. rawupdate of cty
124          rawupdate cty (v,i,x)
125          Assigns to x to the ith component of record v.
126          The storelist is not updated.
127    
128    We use these new primops for both spilling and increment record construction.
129    
130     1. Spilling.
131    
132        This is implemented with a linear scan algorithm (but generalized
133        to trees).  The algorithm will create a single spill record at the
134        beginning of the cps function and use rawupdate to spill to it,
135        and SELECT or SELp to reload from it.  So both spills and reloads
136        are fine-grain operations.  In contrast, in the old algorithm
137        "spills" have to be bundled together in records.
138    
139        Ideally, we should sink the spill record construction to where
140        it is needed.  We can even split the spill record into multiple ones
141        at the places where they are needed.  But CPS is not a good
142        representation for global code motion, so I'll keep it simple and
143        am not attempting this.
144    
145     2. Incremental record construction (aka record splitting).
146    
147        Long records with many component values which are simulatenously live
148        (recall that single use record addresses are not considered to
149         be live) are constructed with rawrecord and rawupdate.
150        We allocate space on the heap with rawrecord first, then gradually
151        fill it in with rawupdate.  This is the technique suggested to me
152        by Matthias.
153    
154        Some restrictions on when this is applicable:
155        1. It is not a VECTOR record.  The code generator currently does not handle
156           this case. VECTOR record uses double indirection like arrays.
157        2. All the record component values are defined in the same "basic block"
158           as the record constructor.  This is to prevent speculative
159           record construction.
160    
161    ----------------------------------------------------------------------
162    Name: Allen Leung
163  Date: 2002/02/22 01:02:00 EST  Date: 2002/02/22 01:02:00 EST
164  Tag: leunga-20020222-mlrisc-tools  Tag: leunga-20020222-mlrisc-tools
165    

Legend:
Removed from v.1086  
changed lines
  Added in v.1094

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0