Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1076, Tue Feb 19 15:47:18 2002 UTC revision 1115, Sun Mar 3 18:16:31 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2002/03/03 13:20:00 EST
18    Tag: leunga-20020303-mlrisc-tools
19    
20      Added #[ ... ] expressions to mlrisc tools
21    
22    ----------------------------------------------------------------------
23    Name: Matthias Blume
24    Date: 2002/02/27 12:29:00 EST
25    Tag: blume-20020227-cdebug
26    Description:
27    
28    - made types in structure C and C_Debug to be equal
29    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
30    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
31    
32    ----------------------------------------------------------------------
33    Name: Matthias Blume
34    Date: 2002/02/26 12:00:00 EST
35    Tag: blume-20020226-ffi
36    Description:
37    
38    1. Fixed a minor bug in CM's "noweb" tool:
39       If numbering is turned off, then truly don't number (i.e., do not
40       supply the -L option to noweb).  The previous behavior was to supply
41       -L'' -- which caused noweb to use the "default" line numbering scheme.
42       Thanks to Chris Richards for pointing this out (and supplying the fix).
43    
44    2. Once again, I reworked some aspects of the FFI:
45    
46       A. The incomplete/complete type business:
47    
48       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
49         gone!
50       - ML types representing an incomplete type are now *equal* to
51         ML types representing their corresponding complete types (just like
52         in C).  This is still safe because ml-nlffigen will not generate
53         RTTI for incomplete types, nor will it generate functions that
54         require access to such RTTI.   But when ML code generated from both
55         incomplete and complete versions of the C type meet, the ML types
56         are trivially interoperable.
57    
58         NOTE:  These changes restore the full generality of the translation
59         (which was previously lost when I eliminated functorization)!
60    
61       B. Enum types:
62    
63       - Structure C now has a type constructor "enum" that is similar to
64         how the "su" constructor works.  However, "enum" is not a phantom
65         type because each "T enum" has values (and is isomorphic to
66         MLRep.Signed.int).
67       - There are generic access operations for enum objects (using
68         MLRep.Signed.int).
69       - ml-nlffigen will generate a structure E_foo for each "enum foo".
70         * The structure contains the definition of type "mlrep" (the ML-side
71         representation type of the enum).  Normally, mlrep is the same
72         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
73         then mlrep will be defined as a datatype -- thus facilitating
74         pattern matching on mlrep values.
75         ("-ec" will be suppressed if there are duplicate values in an
76          enumeration.)
77         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
78         will be generated for each C enum constant xxx.
79         * Conversion functions m2i and i2m convert between mlrep and
80         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
81         * Coversion functions c and ml convert between mlrep and "tag enum".
82         * Access functions (get/set) fetch and store mlrep values.
83       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
84         enumerations are merged into one single enumeration represented by
85         structure E_'.
86    
87    ----------------------------------------------------------------------
88    Name: Allen Leung
89    Date: 2002/02/25 04:45:00 EST
90    Tag: leunga-20020225-cps-spill
91    
92    This is a new implementation of the CPS spill phase.
93    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
94    In case of problems, replace it with the old file spill.sml
95    
96    The current compiler runs into some serious performance problems when
97    constructing a large record.  This can happen when we try to compile a
98    structure with many items.  Even a very simple structure like the following
99    makes the compiler slow down.
100    
101        structure Foo = struct
102           val x_1 = 0w1 : Word32.int
103           val x_2 = 0w2 : Word32.int
104           val x_3 = 0w3 : Word32.int
105           ...
106           val x_N = 0wN : Word32.int
107        end
108    
109    The following table shows the compile time, from N=1000 to N=4000,
110    with the old compiler:
111    
112    N
113    1000   CPS 100 spill                           0.04u  0.00s  0.00g
114           MLRISC ra                               0.06u  0.00s  0.05g
115              (spills = 0 reloads = 0)
116           TOTAL                                   0.63u  0.07s  0.21g
117    
118    1100   CPS 100 spill                           8.25u  0.32s  0.64g
119           MLRISC ra                               5.68u  0.59s  3.93g
120              (spills = 0 reloads = 0)
121           TOTAL                                   14.71u  0.99s  4.81g
122    
123    1500   CPS 100 spill                           58.55u  2.34s  1.74g
124           MLRISC ra                               5.54u  0.65s  3.91g
125              (spills = 543 reloads = 1082)
126           TOTAL                                   65.40u  3.13s  6.00g
127    
128    2000   CPS 100 spill                           126.69u  4.84s  3.08g
129           MLRISC ra                               0.80u  0.10s  0.55g
130              (spills = 42 reloads = 84)
131           TOTAL                                   129.42u  5.10s  4.13g
132    
133    3000   CPS 100 spill                           675.59u  19.03s  11.64g
134           MLRISC ra                               2.69u  0.27s  1.38g
135              (spills = 62 reloads = 124)
136           TOTAL                                   682.48u  19.61s  13.99g
137    
138    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
139           MLRISC ra                               4.96u  0.27s  2.72g
140              (spills = 85 reloads = 170)
141           TOTAL                                   2375.26u  57.21s  48.00g
142    
143    As you can see the old cps spill module suffers from some serious
144    performance problem.  But since I cannot decipher the old code fully,
145    instead of patching the problems up, I'm reimplementing it
146    with a different algorithm.  The new code is more modular,
147    smaller when compiled, and substantially faster
148    (O(n log n) time and O(n) space).  Timing of the new spill module:
149    
150    4000  CPS 100 spill                           0.02u  0.00s  0.00g
151          MLRISC ra                               0.25u  0.02s  0.15g
152             (spills=1 reloads=3)
153          TOTAL                                   7.74u  0.34s  1.62g
154    
155    Implementation details:
156    
157    As far as I can tell, the purpose of the CPS spill module is to make sure the
158    number of live variables at any program point (the bandwidth)
159    does not exceed a certain limit, which is determined by the
160    size of the spill area.
161    
162    When the bandwidth is too large, we decrease the register pressure by
163    packing live variables into spill records.  How we achieve this is
164    completely different than what we did in the old code.
165    
166    First, there is something about the MLRiscGen code generator
167    that we should be aware of:
168    
169    o MLRiscGen performs code motion!
170    
171       In particular, it will move floating point computations and
172       address computations involving only the heap pointer to
173       their use sites (if there is only a single use).
174       What this means is that if we have a CPS record construction
175       statement
176    
177           RECORD(k,vl,w,e)
178    
179       we should never count the new record address w as live if w
180       has only one use (which is often the case).
181    
182       We should do something similar to floating point, but the transformation
183       there is much more complex, so I won't deal with that.
184    
185    Secondly, there are now two new cps primops at our disposal:
186    
187     1. rawrecord of record_kind option
188        This pure operator allocates some uninitialized storage from the heap.
189        There are two forms:
190    
191         rawrecord NONE [INT n]  allocates a tagless record of length n
192         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
193                                     and initializes the tag.
194    
195     2. rawupdate of cty
196          rawupdate cty (v,i,x)
197          Assigns to x to the ith component of record v.
198          The storelist is not updated.
199    
200    We use these new primops for both spilling and increment record construction.
201    
202     1. Spilling.
203    
204        This is implemented with a linear scan algorithm (but generalized
205        to trees).  The algorithm will create a single spill record at the
206        beginning of the cps function and use rawupdate to spill to it,
207        and SELECT or SELp to reload from it.  So both spills and reloads
208        are fine-grain operations.  In contrast, in the old algorithm
209        "spills" have to be bundled together in records.
210    
211        Ideally, we should sink the spill record construction to where
212        it is needed.  We can even split the spill record into multiple ones
213        at the places where they are needed.  But CPS is not a good
214        representation for global code motion, so I'll keep it simple and
215        am not attempting this.
216    
217     2. Incremental record construction (aka record splitting).
218    
219        Long records with many component values which are simulatenously live
220        (recall that single use record addresses are not considered to
221         be live) are constructed with rawrecord and rawupdate.
222        We allocate space on the heap with rawrecord first, then gradually
223        fill it in with rawupdate.  This is the technique suggested to me
224        by Matthias.
225    
226        Some restrictions on when this is applicable:
227        1. It is not a VECTOR record.  The code generator currently does not handle
228           this case. VECTOR record uses double indirection like arrays.
229        2. All the record component values are defined in the same "basic block"
230           as the record constructor.  This is to prevent speculative
231           record construction.
232    
233    ----------------------------------------------------------------------
234    Name: Allen Leung
235    Date: 2002/02/22 01:02:00 EST
236    Tag: leunga-20020222-mlrisc-tools
237    
238    Minor bug fixes in the parser and rewriter
239    
240    ----------------------------------------------------------------------
241    Name: Allen Leung
242    Date: 2002/02/21 20:20:00 EST
243    Tag: leunga-20020221-peephole
244    
245    Regenerated the peephole files.  Some contained typos in the specification
246    and some didn't compile because of pretty printing bugs in the old version
247    of 'nowhere'.
248    
249    ----------------------------------------------------------------------
250    Name: Allen Leung
251    Date: 2002/02/19 20:20:00 EST
252    Tag: leunga-20020219-mlrisc-tools
253    Description:
254    
255       Minor bug fixes to the mlrisc-tools library:
256    
257       1.  Fixed up parsing colon suffixed keywords
258       2.  Added the ability to shut the error messages up
259       3.  Reimplemented the pretty printer and fixed up/improved
260           the pretty printing of handle and -> types.
261       4.  Fixed up generation of literal symbols in the nowhere tool.
262       5.  Added some SML keywords to to sml.sty
263    
264    ----------------------------------------------------------------------
265    Name: Matthias Blume
266    Date: 2002/02/19 16:20:00 EST
267    Tag: blume-20020219-cmffi
268    Description:
269    
270    A wild mix of changes, some minor, some major:
271    
272    * All C FFI-related libraries are now anchored under $c:
273        $/c.cm      --> $c/c.cm
274        $/c-int.cm  --> $c/internals/c-int.cm
275        $/memory.cm --> $c/memory/memory.cm
276    
277    * "make" tool (in CM) now treats its argument pathname slightly
278      differently:
279        1. If the native expansion is an absolute name, then before invoking
280           the "make" command on it, CM will apply OS.Path.mkRelative
281           (with relativeTo = OS.FileSys.getDir()) to it.
282        2. The argument will be passed through to subsequent phases of CM
283           processing without "going native".  In particular, if the argument
284           was an anchored path, then "make" will not lose track of that anchor.
285    
286    * Compiler backends now "know" their respective C calling conventions
287      instead of having to be told about it by ml-nlffigen.  This relieves
288      ml-nlffigen from one of its burdens.
289    
290    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
291    
292    * Export C_DEBUG and C_Debug from $c/c.cm.
293    
294    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
295      subtyping relationship between incomplete pointers and their complete
296      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
297      with the convention of instantiating 'o with ('t, 'c) obj whenever
298      the pointer target type is complete.  In the incomplete case, 'o
299      will be instantiated with some "'c iobj" -- a type obtained by
300      using one of the functors PointerToIncompleteType or PointerToCompleteType.
301    
302      Operations that work on both incomplete and complete pointer types are
303      typed as taking an 'o ptr while operations that require the target to
304      be known are typed as taking some ('t, 'c) obj ptr.
305    
306      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
307      where void is an eqtype without any values.  This makes it possible
308      to work on voidptr values using functions meant to operate on light
309      incomplete pointers.
310    
311    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
312      been vastly simplified.
313    
314    ----------------------------------------------------------------------
315  Name: Matthias Blume  Name: Matthias Blume
316  Date: 2002/02/19 10:48:00 EST  Date: 2002/02/19 10:48:00 EST
317  Tag: blume-20020219-pqfix  Tag: blume-20020219-pqfix

Legend:
Removed from v.1076  
changed lines
  Added in v.1115

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0