Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1073, Fri Feb 15 22:07:38 2002 UTC revision 1116, Tue Mar 5 23:17:18 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Lal George
17    Date: 2002/03/05 17:24:48 EST
18    Tag: george-20020305-linkage-cluster
19    
20    In order to support the block placement optimization, the first
21    cluster that is generated (called the linkage cluster) contains a jump
22    to the entry point for the compilation unit. The linkage cluster
23    contains only one function, so block placement will have no effect on
24    the linkage cluster itself, but all the other clusters have full
25    freedom in the manner in which they reorder blocks or functions.
26    
27    On the x86 the typical linkage code that is generated is:
28       ----------------------
29            .align 2
30       L0:
31            addl    $L1-L0, 72(%esp)
32            jmp     L0
33    
34    
35            .align  2
36       L1:
37       ----------------------
38    
39    72(%esp) is the memory location for the stdlink register. This
40    must contain the address of the CPS function being called. In the
41    above example, it contains the address of memory for  L0; before
42    calling L1 (the real entry point for the compilation unit), it
43    must contain the address for L1, and hence
44    
45            addl $L1-L0, 72(%esp)
46    
47    I have tested this on all architectures except the hppa.
48    
49    ----------------------------------------------------------------------
50    Name: Allen Leung
51    Date: 2002/03/03 13:20:00 EST
52    Tag: leunga-20020303-mlrisc-tools
53    
54      Added #[ ... ] expressions to mlrisc tools
55    
56    ----------------------------------------------------------------------
57    Name: Matthias Blume
58    Date: 2002/02/27 12:29:00 EST
59    Tag: blume-20020227-cdebug
60    Description:
61    
62    - made types in structure C and C_Debug to be equal
63    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
64    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
65    
66    ----------------------------------------------------------------------
67    Name: Matthias Blume
68    Date: 2002/02/26 12:00:00 EST
69    Tag: blume-20020226-ffi
70    Description:
71    
72    1. Fixed a minor bug in CM's "noweb" tool:
73       If numbering is turned off, then truly don't number (i.e., do not
74       supply the -L option to noweb).  The previous behavior was to supply
75       -L'' -- which caused noweb to use the "default" line numbering scheme.
76       Thanks to Chris Richards for pointing this out (and supplying the fix).
77    
78    2. Once again, I reworked some aspects of the FFI:
79    
80       A. The incomplete/complete type business:
81    
82       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
83         gone!
84       - ML types representing an incomplete type are now *equal* to
85         ML types representing their corresponding complete types (just like
86         in C).  This is still safe because ml-nlffigen will not generate
87         RTTI for incomplete types, nor will it generate functions that
88         require access to such RTTI.   But when ML code generated from both
89         incomplete and complete versions of the C type meet, the ML types
90         are trivially interoperable.
91    
92         NOTE:  These changes restore the full generality of the translation
93         (which was previously lost when I eliminated functorization)!
94    
95       B. Enum types:
96    
97       - Structure C now has a type constructor "enum" that is similar to
98         how the "su" constructor works.  However, "enum" is not a phantom
99         type because each "T enum" has values (and is isomorphic to
100         MLRep.Signed.int).
101       - There are generic access operations for enum objects (using
102         MLRep.Signed.int).
103       - ml-nlffigen will generate a structure E_foo for each "enum foo".
104         * The structure contains the definition of type "mlrep" (the ML-side
105         representation type of the enum).  Normally, mlrep is the same
106         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
107         then mlrep will be defined as a datatype -- thus facilitating
108         pattern matching on mlrep values.
109         ("-ec" will be suppressed if there are duplicate values in an
110          enumeration.)
111         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
112         will be generated for each C enum constant xxx.
113         * Conversion functions m2i and i2m convert between mlrep and
114         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
115         * Coversion functions c and ml convert between mlrep and "tag enum".
116         * Access functions (get/set) fetch and store mlrep values.
117       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
118         enumerations are merged into one single enumeration represented by
119         structure E_'.
120    
121    ----------------------------------------------------------------------
122    Name: Allen Leung
123    Date: 2002/02/25 04:45:00 EST
124    Tag: leunga-20020225-cps-spill
125    
126    This is a new implementation of the CPS spill phase.
127    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
128    In case of problems, replace it with the old file spill.sml
129    
130    The current compiler runs into some serious performance problems when
131    constructing a large record.  This can happen when we try to compile a
132    structure with many items.  Even a very simple structure like the following
133    makes the compiler slow down.
134    
135        structure Foo = struct
136           val x_1 = 0w1 : Word32.int
137           val x_2 = 0w2 : Word32.int
138           val x_3 = 0w3 : Word32.int
139           ...
140           val x_N = 0wN : Word32.int
141        end
142    
143    The following table shows the compile time, from N=1000 to N=4000,
144    with the old compiler:
145    
146    N
147    1000   CPS 100 spill                           0.04u  0.00s  0.00g
148           MLRISC ra                               0.06u  0.00s  0.05g
149              (spills = 0 reloads = 0)
150           TOTAL                                   0.63u  0.07s  0.21g
151    
152    1100   CPS 100 spill                           8.25u  0.32s  0.64g
153           MLRISC ra                               5.68u  0.59s  3.93g
154              (spills = 0 reloads = 0)
155           TOTAL                                   14.71u  0.99s  4.81g
156    
157    1500   CPS 100 spill                           58.55u  2.34s  1.74g
158           MLRISC ra                               5.54u  0.65s  3.91g
159              (spills = 543 reloads = 1082)
160           TOTAL                                   65.40u  3.13s  6.00g
161    
162    2000   CPS 100 spill                           126.69u  4.84s  3.08g
163           MLRISC ra                               0.80u  0.10s  0.55g
164              (spills = 42 reloads = 84)
165           TOTAL                                   129.42u  5.10s  4.13g
166    
167    3000   CPS 100 spill                           675.59u  19.03s  11.64g
168           MLRISC ra                               2.69u  0.27s  1.38g
169              (spills = 62 reloads = 124)
170           TOTAL                                   682.48u  19.61s  13.99g
171    
172    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
173           MLRISC ra                               4.96u  0.27s  2.72g
174              (spills = 85 reloads = 170)
175           TOTAL                                   2375.26u  57.21s  48.00g
176    
177    As you can see the old cps spill module suffers from some serious
178    performance problem.  But since I cannot decipher the old code fully,
179    instead of patching the problems up, I'm reimplementing it
180    with a different algorithm.  The new code is more modular,
181    smaller when compiled, and substantially faster
182    (O(n log n) time and O(n) space).  Timing of the new spill module:
183    
184    4000  CPS 100 spill                           0.02u  0.00s  0.00g
185          MLRISC ra                               0.25u  0.02s  0.15g
186             (spills=1 reloads=3)
187          TOTAL                                   7.74u  0.34s  1.62g
188    
189    Implementation details:
190    
191    As far as I can tell, the purpose of the CPS spill module is to make sure the
192    number of live variables at any program point (the bandwidth)
193    does not exceed a certain limit, which is determined by the
194    size of the spill area.
195    
196    When the bandwidth is too large, we decrease the register pressure by
197    packing live variables into spill records.  How we achieve this is
198    completely different than what we did in the old code.
199    
200    First, there is something about the MLRiscGen code generator
201    that we should be aware of:
202    
203    o MLRiscGen performs code motion!
204    
205       In particular, it will move floating point computations and
206       address computations involving only the heap pointer to
207       their use sites (if there is only a single use).
208       What this means is that if we have a CPS record construction
209       statement
210    
211           RECORD(k,vl,w,e)
212    
213       we should never count the new record address w as live if w
214       has only one use (which is often the case).
215    
216       We should do something similar to floating point, but the transformation
217       there is much more complex, so I won't deal with that.
218    
219    Secondly, there are now two new cps primops at our disposal:
220    
221     1. rawrecord of record_kind option
222        This pure operator allocates some uninitialized storage from the heap.
223        There are two forms:
224    
225         rawrecord NONE [INT n]  allocates a tagless record of length n
226         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
227                                     and initializes the tag.
228    
229     2. rawupdate of cty
230          rawupdate cty (v,i,x)
231          Assigns to x to the ith component of record v.
232          The storelist is not updated.
233    
234    We use these new primops for both spilling and increment record construction.
235    
236     1. Spilling.
237    
238        This is implemented with a linear scan algorithm (but generalized
239        to trees).  The algorithm will create a single spill record at the
240        beginning of the cps function and use rawupdate to spill to it,
241        and SELECT or SELp to reload from it.  So both spills and reloads
242        are fine-grain operations.  In contrast, in the old algorithm
243        "spills" have to be bundled together in records.
244    
245        Ideally, we should sink the spill record construction to where
246        it is needed.  We can even split the spill record into multiple ones
247        at the places where they are needed.  But CPS is not a good
248        representation for global code motion, so I'll keep it simple and
249        am not attempting this.
250    
251     2. Incremental record construction (aka record splitting).
252    
253        Long records with many component values which are simulatenously live
254        (recall that single use record addresses are not considered to
255         be live) are constructed with rawrecord and rawupdate.
256        We allocate space on the heap with rawrecord first, then gradually
257        fill it in with rawupdate.  This is the technique suggested to me
258        by Matthias.
259    
260        Some restrictions on when this is applicable:
261        1. It is not a VECTOR record.  The code generator currently does not handle
262           this case. VECTOR record uses double indirection like arrays.
263        2. All the record component values are defined in the same "basic block"
264           as the record constructor.  This is to prevent speculative
265           record construction.
266    
267    ----------------------------------------------------------------------
268    Name: Allen Leung
269    Date: 2002/02/22 01:02:00 EST
270    Tag: leunga-20020222-mlrisc-tools
271    
272    Minor bug fixes in the parser and rewriter
273    
274    ----------------------------------------------------------------------
275    Name: Allen Leung
276    Date: 2002/02/21 20:20:00 EST
277    Tag: leunga-20020221-peephole
278    
279    Regenerated the peephole files.  Some contained typos in the specification
280    and some didn't compile because of pretty printing bugs in the old version
281    of 'nowhere'.
282    
283    ----------------------------------------------------------------------
284    Name: Allen Leung
285    Date: 2002/02/19 20:20:00 EST
286    Tag: leunga-20020219-mlrisc-tools
287    Description:
288    
289       Minor bug fixes to the mlrisc-tools library:
290    
291       1.  Fixed up parsing colon suffixed keywords
292       2.  Added the ability to shut the error messages up
293       3.  Reimplemented the pretty printer and fixed up/improved
294           the pretty printing of handle and -> types.
295       4.  Fixed up generation of literal symbols in the nowhere tool.
296       5.  Added some SML keywords to to sml.sty
297    
298    ----------------------------------------------------------------------
299    Name: Matthias Blume
300    Date: 2002/02/19 16:20:00 EST
301    Tag: blume-20020219-cmffi
302    Description:
303    
304    A wild mix of changes, some minor, some major:
305    
306    * All C FFI-related libraries are now anchored under $c:
307        $/c.cm      --> $c/c.cm
308        $/c-int.cm  --> $c/internals/c-int.cm
309        $/memory.cm --> $c/memory/memory.cm
310    
311    * "make" tool (in CM) now treats its argument pathname slightly
312      differently:
313        1. If the native expansion is an absolute name, then before invoking
314           the "make" command on it, CM will apply OS.Path.mkRelative
315           (with relativeTo = OS.FileSys.getDir()) to it.
316        2. The argument will be passed through to subsequent phases of CM
317           processing without "going native".  In particular, if the argument
318           was an anchored path, then "make" will not lose track of that anchor.
319    
320    * Compiler backends now "know" their respective C calling conventions
321      instead of having to be told about it by ml-nlffigen.  This relieves
322      ml-nlffigen from one of its burdens.
323    
324    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
325    
326    * Export C_DEBUG and C_Debug from $c/c.cm.
327    
328    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
329      subtyping relationship between incomplete pointers and their complete
330      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
331      with the convention of instantiating 'o with ('t, 'c) obj whenever
332      the pointer target type is complete.  In the incomplete case, 'o
333      will be instantiated with some "'c iobj" -- a type obtained by
334      using one of the functors PointerToIncompleteType or PointerToCompleteType.
335    
336      Operations that work on both incomplete and complete pointer types are
337      typed as taking an 'o ptr while operations that require the target to
338      be known are typed as taking some ('t, 'c) obj ptr.
339    
340      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
341      where void is an eqtype without any values.  This makes it possible
342      to work on voidptr values using functions meant to operate on light
343      incomplete pointers.
344    
345    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
346      been vastly simplified.
347    
348    ----------------------------------------------------------------------
349    Name: Matthias Blume
350    Date: 2002/02/19 10:48:00 EST
351    Tag: blume-20020219-pqfix
352    Description:
353    
354    Applied Chris Okasaki's bug fix for priority queues.
355    
356    ----------------------------------------------------------------------
357  Name: Matthias Blume  Name: Matthias Blume
358  Date: 2002/02/15 17:05:00  Date: 2002/02/15 17:05:00
359  Tag: Release_110_39  Tag: Release_110_39

Legend:
Removed from v.1073  
changed lines
  Added in v.1116

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0