Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1078, Tue Feb 19 21:26:48 2002 UTC revision 1096, Tue Feb 26 16:59:02 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Matthias Blume  Name: Matthias Blume
17    Date: 2002/02/26 12:00:00 EST
18    Tag: blume-20020226-ffi
19    Description:
20    
21    1. Fixed a minor bug in CM's "noweb" tool:
22       If numbering is turned off, then truly don't number (i.e., do not
23       supply the -L option to noweb).  The previous behavior was to supply
24       -L'' -- which caused noweb to use the "default" line numbering scheme.
25       Thanks to Chris Richards for pointing this out (and supplying the fix).
26    
27    2. Once again, I reworked some aspects of the FFI:
28    
29       A. The incomplete/complete type business:
30    
31       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
32         gone!
33       - ML types representing an incomplete type are now *equal* to
34         ML types representing their corresponding complete types (just like
35         in C).  This is still safe because ml-nlffigen will not generate
36         RTTI for incomplete types, nor will it generate functions that
37         require access to such RTTI.   But when ML code generated from both
38         incomplete and complete versions of the C type meet, the ML types
39         are trivially interoperable.
40    
41         NOTE:  These changes restore the full generality of the translation
42         (which was previously lost when I eliminated functorization)!
43    
44       B. Enum types:
45    
46       - Structure C now has a type constructor "enum" that is similar to
47         how the "su" constructor works.  However, "enum" is not a phantom
48         type because each "T enum" has values (and is isomorphic to
49         MLRep.Signed.int).
50       - There are generic access operations for enum objects (using
51         MLRep.Signed.int).
52       - ml-nlffigen will generate a structure E_foo for each "enum foo".
53         * The structure contains the definition of type "mlrep" (the ML-side
54         representation type of the enum).  Normally, mlrep is the same
55         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
56         then mlrep will be defined as a datatype -- thus facilitating
57         pattern matching on mlrep values.
58         ("-ec" will be suppressed if there are duplicate values in an
59          enumeration.)
60         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
61         will be generated for each C enum constant xxx.
62         * Conversion functions m2i and i2m convert between mlrep and
63         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
64         * Coversion functions c and ml convert between mlrep and "tag enum".
65         * Access functions (get/set) fetch and store mlrep values.
66       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
67         enumerations are merged into one single enumeration represented by
68         structure E_'.
69    
70    ----------------------------------------------------------------------
71    Name: Allen Leung
72    Date: 2002/02/25 04:45:00 EST
73    Tag: leunga-20020225-cps-spill
74    
75    This is a new implementation of the CPS spill phase.
76    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
77    In case of problems, replace it with the old file spill.sml
78    
79    The current compiler runs into some serious performance problems when
80    constructing a large record.  This can happen when we try to compile a
81    structure with many items.  Even a very simple structure like the following
82    makes the compiler slow down.
83    
84        structure Foo = struct
85           val x_1 = 0w1 : Word32.int
86           val x_2 = 0w2 : Word32.int
87           val x_3 = 0w3 : Word32.int
88           ...
89           val x_N = 0wN : Word32.int
90        end
91    
92    The following table shows the compile time, from N=1000 to N=4000,
93    with the old compiler:
94    
95    N
96    1000   CPS 100 spill                           0.04u  0.00s  0.00g
97           MLRISC ra                               0.06u  0.00s  0.05g
98              (spills = 0 reloads = 0)
99           TOTAL                                   0.63u  0.07s  0.21g
100    
101    1100   CPS 100 spill                           8.25u  0.32s  0.64g
102           MLRISC ra                               5.68u  0.59s  3.93g
103              (spills = 0 reloads = 0)
104           TOTAL                                   14.71u  0.99s  4.81g
105    
106    1500   CPS 100 spill                           58.55u  2.34s  1.74g
107           MLRISC ra                               5.54u  0.65s  3.91g
108              (spills = 543 reloads = 1082)
109           TOTAL                                   65.40u  3.13s  6.00g
110    
111    2000   CPS 100 spill                           126.69u  4.84s  3.08g
112           MLRISC ra                               0.80u  0.10s  0.55g
113              (spills = 42 reloads = 84)
114           TOTAL                                   129.42u  5.10s  4.13g
115    
116    3000   CPS 100 spill                           675.59u  19.03s  11.64g
117           MLRISC ra                               2.69u  0.27s  1.38g
118              (spills = 62 reloads = 124)
119           TOTAL                                   682.48u  19.61s  13.99g
120    
121    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
122           MLRISC ra                               4.96u  0.27s  2.72g
123              (spills = 85 reloads = 170)
124           TOTAL                                   2375.26u  57.21s  48.00g
125    
126    As you can see the old cps spill module suffers from some serious
127    performance problem but since I cannot decipher the old code fully,
128    innstead of patching the problems up, I'm reimplementing it
129    with a different algorithm.  The new code is more modular,
130    smaller when compiled, and substantially faster
131    (O(n log n) time and O(n) space).  Timing of the new spill module:
132    
133    4000  CPS 100 spill                           0.02u  0.00s  0.00g
134          MLRISC ra                               0.25u  0.02s  0.15g
135             (spills=1 reloads=3)
136          TOTAL                                   7.74u  0.34s  1.62g
137    
138    Implementation details:
139    
140    As far as I can tell, the purpose of the CPS spill module is to make sure the
141    number of live variables at any program point (the bandwidth)
142    does not exceed a certain limit, which is determined by the
143    size of the spill area.
144    
145    When the bandwidth is too large, we decrease the register pressure by
146    packing live variables into spill records.  How we achieve this is
147    completely different than what we did in the old code.
148    
149    First, there is something about the MLRiscGen code generator
150    that we should be aware of:
151    
152    o MLRiscGen performs code motion!
153    
154       In particular, it will move floating point computations and
155       address computations involving only the heap pointer to
156       their use sites (if there is only a single use).
157       What this means is that if we have a CPS record construction
158       statement
159    
160           RECORD(k,vl,w,e)
161    
162       we should never count the new record address w as live if w
163       has only one use (which is often the case).
164    
165       We should do something similar to floating point, but the transformation
166       there is much more complex, so I won't deal with that.
167    
168    Secondly, there are now two new cps primops at our disposal:
169    
170     1. rawrecord of record_kind option
171        This pure operator allocates some uninitialized storage from the heap.
172        There are two forms:
173    
174         rawrecord NONE [INT n]  allocates a tagless record of length n
175         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
176                                     and initializes the tag.
177    
178     2. rawupdate of cty
179          rawupdate cty (v,i,x)
180          Assigns to x to the ith component of record v.
181          The storelist is not updated.
182    
183    We use these new primops for both spilling and increment record construction.
184    
185     1. Spilling.
186    
187        This is implemented with a linear scan algorithm (but generalized
188        to trees).  The algorithm will create a single spill record at the
189        beginning of the cps function and use rawupdate to spill to it,
190        and SELECT or SELp to reload from it.  So both spills and reloads
191        are fine-grain operations.  In contrast, in the old algorithm
192        "spills" have to be bundled together in records.
193    
194        Ideally, we should sink the spill record construction to where
195        it is needed.  We can even split the spill record into multiple ones
196        at the places where they are needed.  But CPS is not a good
197        representation for global code motion, so I'll keep it simple and
198        am not attempting this.
199    
200     2. Incremental record construction (aka record splitting).
201    
202        Long records with many component values which are simulatenously live
203        (recall that single use record addresses are not considered to
204         be live) are constructed with rawrecord and rawupdate.
205        We allocate space on the heap with rawrecord first, then gradually
206        fill it in with rawupdate.  This is the technique suggested to me
207        by Matthias.
208    
209        Some restrictions on when this is applicable:
210        1. It is not a VECTOR record.  The code generator currently does not handle
211           this case. VECTOR record uses double indirection like arrays.
212        2. All the record component values are defined in the same "basic block"
213           as the record constructor.  This is to prevent speculative
214           record construction.
215    
216    ----------------------------------------------------------------------
217    Name: Allen Leung
218    Date: 2002/02/22 01:02:00 EST
219    Tag: leunga-20020222-mlrisc-tools
220    
221    Minor bug fixes in the parser and rewriter
222    
223    ----------------------------------------------------------------------
224    Name: Allen Leung
225    Date: 2002/02/21 20:20:00 EST
226    Tag: leunga-20020221-peephole
227    
228    Regenerated the peephole files.  Some contained typos in the specification
229    and some didn't compile because of pretty printing bugs in the old version
230    of 'nowhere'.
231    
232    ----------------------------------------------------------------------
233    Name: Allen Leung
234    Date: 2002/02/19 20:20:00 EST
235    Tag: leunga-20020219-mlrisc-tools
236    Description:
237    
238       Minor bug fixes to the mlrisc-tools library:
239    
240       1.  Fixed up parsing colon suffixed keywords
241       2.  Added the ability to shut the error messages up
242       3.  Reimplemented the pretty printer and fixed up/improved
243           the pretty printing of handle and -> types.
244       4.  Fixed up generation of literal symbols in the nowhere tool.
245       5.  Added some SML keywords to to sml.sty
246    
247    ----------------------------------------------------------------------
248    Name: Matthias Blume
249  Date: 2002/02/19 16:20:00 EST  Date: 2002/02/19 16:20:00 EST
250  Tag: blume-20020219-cmffi  Tag: blume-20020219-cmffi
251  Description:  Description:

Legend:
Removed from v.1078  
changed lines
  Added in v.1096

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0