Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1062, Wed Feb 13 21:15:14 2002 UTC revision 1098, Wed Feb 27 17:29:47 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Matthias Blume  Name: Matthias Blume
17    Date: 2002/02/27 12:29:00 EST
18    Tag: blume-20020227-cdebug
19    Description:
20    
21    - made types in structure C and C_Debug to be equal
22    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
23    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
24    
25    ----------------------------------------------------------------------
26    Name: Matthias Blume
27    Date: 2002/02/26 12:00:00 EST
28    Tag: blume-20020226-ffi
29    Description:
30    
31    1. Fixed a minor bug in CM's "noweb" tool:
32       If numbering is turned off, then truly don't number (i.e., do not
33       supply the -L option to noweb).  The previous behavior was to supply
34       -L'' -- which caused noweb to use the "default" line numbering scheme.
35       Thanks to Chris Richards for pointing this out (and supplying the fix).
36    
37    2. Once again, I reworked some aspects of the FFI:
38    
39       A. The incomplete/complete type business:
40    
41       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
42         gone!
43       - ML types representing an incomplete type are now *equal* to
44         ML types representing their corresponding complete types (just like
45         in C).  This is still safe because ml-nlffigen will not generate
46         RTTI for incomplete types, nor will it generate functions that
47         require access to such RTTI.   But when ML code generated from both
48         incomplete and complete versions of the C type meet, the ML types
49         are trivially interoperable.
50    
51         NOTE:  These changes restore the full generality of the translation
52         (which was previously lost when I eliminated functorization)!
53    
54       B. Enum types:
55    
56       - Structure C now has a type constructor "enum" that is similar to
57         how the "su" constructor works.  However, "enum" is not a phantom
58         type because each "T enum" has values (and is isomorphic to
59         MLRep.Signed.int).
60       - There are generic access operations for enum objects (using
61         MLRep.Signed.int).
62       - ml-nlffigen will generate a structure E_foo for each "enum foo".
63         * The structure contains the definition of type "mlrep" (the ML-side
64         representation type of the enum).  Normally, mlrep is the same
65         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
66         then mlrep will be defined as a datatype -- thus facilitating
67         pattern matching on mlrep values.
68         ("-ec" will be suppressed if there are duplicate values in an
69          enumeration.)
70         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
71         will be generated for each C enum constant xxx.
72         * Conversion functions m2i and i2m convert between mlrep and
73         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
74         * Coversion functions c and ml convert between mlrep and "tag enum".
75         * Access functions (get/set) fetch and store mlrep values.
76       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
77         enumerations are merged into one single enumeration represented by
78         structure E_'.
79    
80    ----------------------------------------------------------------------
81    Name: Allen Leung
82    Date: 2002/02/25 04:45:00 EST
83    Tag: leunga-20020225-cps-spill
84    
85    This is a new implementation of the CPS spill phase.
86    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
87    In case of problems, replace it with the old file spill.sml
88    
89    The current compiler runs into some serious performance problems when
90    constructing a large record.  This can happen when we try to compile a
91    structure with many items.  Even a very simple structure like the following
92    makes the compiler slow down.
93    
94        structure Foo = struct
95           val x_1 = 0w1 : Word32.int
96           val x_2 = 0w2 : Word32.int
97           val x_3 = 0w3 : Word32.int
98           ...
99           val x_N = 0wN : Word32.int
100        end
101    
102    The following table shows the compile time, from N=1000 to N=4000,
103    with the old compiler:
104    
105    N
106    1000   CPS 100 spill                           0.04u  0.00s  0.00g
107           MLRISC ra                               0.06u  0.00s  0.05g
108              (spills = 0 reloads = 0)
109           TOTAL                                   0.63u  0.07s  0.21g
110    
111    1100   CPS 100 spill                           8.25u  0.32s  0.64g
112           MLRISC ra                               5.68u  0.59s  3.93g
113              (spills = 0 reloads = 0)
114           TOTAL                                   14.71u  0.99s  4.81g
115    
116    1500   CPS 100 spill                           58.55u  2.34s  1.74g
117           MLRISC ra                               5.54u  0.65s  3.91g
118              (spills = 543 reloads = 1082)
119           TOTAL                                   65.40u  3.13s  6.00g
120    
121    2000   CPS 100 spill                           126.69u  4.84s  3.08g
122           MLRISC ra                               0.80u  0.10s  0.55g
123              (spills = 42 reloads = 84)
124           TOTAL                                   129.42u  5.10s  4.13g
125    
126    3000   CPS 100 spill                           675.59u  19.03s  11.64g
127           MLRISC ra                               2.69u  0.27s  1.38g
128              (spills = 62 reloads = 124)
129           TOTAL                                   682.48u  19.61s  13.99g
130    
131    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
132           MLRISC ra                               4.96u  0.27s  2.72g
133              (spills = 85 reloads = 170)
134           TOTAL                                   2375.26u  57.21s  48.00g
135    
136    As you can see the old cps spill module suffers from some serious
137    performance problem but since I cannot decipher the old code fully,
138    innstead of patching the problems up, I'm reimplementing it
139    with a different algorithm.  The new code is more modular,
140    smaller when compiled, and substantially faster
141    (O(n log n) time and O(n) space).  Timing of the new spill module:
142    
143    4000  CPS 100 spill                           0.02u  0.00s  0.00g
144          MLRISC ra                               0.25u  0.02s  0.15g
145             (spills=1 reloads=3)
146          TOTAL                                   7.74u  0.34s  1.62g
147    
148    Implementation details:
149    
150    As far as I can tell, the purpose of the CPS spill module is to make sure the
151    number of live variables at any program point (the bandwidth)
152    does not exceed a certain limit, which is determined by the
153    size of the spill area.
154    
155    When the bandwidth is too large, we decrease the register pressure by
156    packing live variables into spill records.  How we achieve this is
157    completely different than what we did in the old code.
158    
159    First, there is something about the MLRiscGen code generator
160    that we should be aware of:
161    
162    o MLRiscGen performs code motion!
163    
164       In particular, it will move floating point computations and
165       address computations involving only the heap pointer to
166       their use sites (if there is only a single use).
167       What this means is that if we have a CPS record construction
168       statement
169    
170           RECORD(k,vl,w,e)
171    
172       we should never count the new record address w as live if w
173       has only one use (which is often the case).
174    
175       We should do something similar to floating point, but the transformation
176       there is much more complex, so I won't deal with that.
177    
178    Secondly, there are now two new cps primops at our disposal:
179    
180     1. rawrecord of record_kind option
181        This pure operator allocates some uninitialized storage from the heap.
182        There are two forms:
183    
184         rawrecord NONE [INT n]  allocates a tagless record of length n
185         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
186                                     and initializes the tag.
187    
188     2. rawupdate of cty
189          rawupdate cty (v,i,x)
190          Assigns to x to the ith component of record v.
191          The storelist is not updated.
192    
193    We use these new primops for both spilling and increment record construction.
194    
195     1. Spilling.
196    
197        This is implemented with a linear scan algorithm (but generalized
198        to trees).  The algorithm will create a single spill record at the
199        beginning of the cps function and use rawupdate to spill to it,
200        and SELECT or SELp to reload from it.  So both spills and reloads
201        are fine-grain operations.  In contrast, in the old algorithm
202        "spills" have to be bundled together in records.
203    
204        Ideally, we should sink the spill record construction to where
205        it is needed.  We can even split the spill record into multiple ones
206        at the places where they are needed.  But CPS is not a good
207        representation for global code motion, so I'll keep it simple and
208        am not attempting this.
209    
210     2. Incremental record construction (aka record splitting).
211    
212        Long records with many component values which are simulatenously live
213        (recall that single use record addresses are not considered to
214         be live) are constructed with rawrecord and rawupdate.
215        We allocate space on the heap with rawrecord first, then gradually
216        fill it in with rawupdate.  This is the technique suggested to me
217        by Matthias.
218    
219        Some restrictions on when this is applicable:
220        1. It is not a VECTOR record.  The code generator currently does not handle
221           this case. VECTOR record uses double indirection like arrays.
222        2. All the record component values are defined in the same "basic block"
223           as the record constructor.  This is to prevent speculative
224           record construction.
225    
226    ----------------------------------------------------------------------
227    Name: Allen Leung
228    Date: 2002/02/22 01:02:00 EST
229    Tag: leunga-20020222-mlrisc-tools
230    
231    Minor bug fixes in the parser and rewriter
232    
233    ----------------------------------------------------------------------
234    Name: Allen Leung
235    Date: 2002/02/21 20:20:00 EST
236    Tag: leunga-20020221-peephole
237    
238    Regenerated the peephole files.  Some contained typos in the specification
239    and some didn't compile because of pretty printing bugs in the old version
240    of 'nowhere'.
241    
242    ----------------------------------------------------------------------
243    Name: Allen Leung
244    Date: 2002/02/19 20:20:00 EST
245    Tag: leunga-20020219-mlrisc-tools
246    Description:
247    
248       Minor bug fixes to the mlrisc-tools library:
249    
250       1.  Fixed up parsing colon suffixed keywords
251       2.  Added the ability to shut the error messages up
252       3.  Reimplemented the pretty printer and fixed up/improved
253           the pretty printing of handle and -> types.
254       4.  Fixed up generation of literal symbols in the nowhere tool.
255       5.  Added some SML keywords to to sml.sty
256    
257    ----------------------------------------------------------------------
258    Name: Matthias Blume
259    Date: 2002/02/19 16:20:00 EST
260    Tag: blume-20020219-cmffi
261    Description:
262    
263    A wild mix of changes, some minor, some major:
264    
265    * All C FFI-related libraries are now anchored under $c:
266        $/c.cm      --> $c/c.cm
267        $/c-int.cm  --> $c/internals/c-int.cm
268        $/memory.cm --> $c/memory/memory.cm
269    
270    * "make" tool (in CM) now treats its argument pathname slightly
271      differently:
272        1. If the native expansion is an absolute name, then before invoking
273           the "make" command on it, CM will apply OS.Path.mkRelative
274           (with relativeTo = OS.FileSys.getDir()) to it.
275        2. The argument will be passed through to subsequent phases of CM
276           processing without "going native".  In particular, if the argument
277           was an anchored path, then "make" will not lose track of that anchor.
278    
279    * Compiler backends now "know" their respective C calling conventions
280      instead of having to be told about it by ml-nlffigen.  This relieves
281      ml-nlffigen from one of its burdens.
282    
283    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
284    
285    * Export C_DEBUG and C_Debug from $c/c.cm.
286    
287    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
288      subtyping relationship between incomplete pointers and their complete
289      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
290      with the convention of instantiating 'o with ('t, 'c) obj whenever
291      the pointer target type is complete.  In the incomplete case, 'o
292      will be instantiated with some "'c iobj" -- a type obtained by
293      using one of the functors PointerToIncompleteType or PointerToCompleteType.
294    
295      Operations that work on both incomplete and complete pointer types are
296      typed as taking an 'o ptr while operations that require the target to
297      be known are typed as taking some ('t, 'c) obj ptr.
298    
299      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
300      where void is an eqtype without any values.  This makes it possible
301      to work on voidptr values using functions meant to operate on light
302      incomplete pointers.
303    
304    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
305      been vastly simplified.
306    
307    ----------------------------------------------------------------------
308    Name: Matthias Blume
309    Date: 2002/02/19 10:48:00 EST
310    Tag: blume-20020219-pqfix
311    Description:
312    
313    Applied Chris Okasaki's bug fix for priority queues.
314    
315    ----------------------------------------------------------------------
316    Name: Matthias Blume
317    Date: 2002/02/15 17:05:00
318    Tag: Release_110_39
319    Description:
320    
321    Last-minute retagging is becoming a tradition... :-(
322    
323    This is the working release 110.39.
324    
325    ----------------------------------------------------------------------
326    Name: Matthias Blume
327    Date: 2002/02/15 16:00:00 EST
328    Tag: Release_110_39-orig
329    Description:
330    
331    Working release 110.39.  New bootfiles.
332    
333    (Update: There was a small bug in the installer so it wouldn't work
334    with all shells.  So I retagged. -Matthias)
335    
336    ----------------------------------------------------------------------
337    Name: Matthias Blume
338    Date: 2002/02/15 14:17:00 EST
339    Tag: blume-20020215-showbindings
340    Description:
341    
342    Added EnvRef.listBoundSymbols and CM.State.showBindings.  Especially
343    the latter can be useful for exploring what bindings are available at
344    the interactive prompt.  (The first function returns only the list
345    of symbols that are really bound, the second prints those but also the
346    ones that CM's autoloading mechanism knows about.)
347    
348    ----------------------------------------------------------------------
349    Name: Matthias Blume
350    Date: 2002/02/15 12:08:00 EST
351    Tag: blume-20020215-iptrs
352    Description:
353    
354    Two improvements to ml-nlffigen:
355    
356      1. Write files only if they do not exist or if their current contents
357         do not coincide with what's being written.  (That is, avoid messing
358         with the time stamps unless absolutely necessary.)
359    
360      2. Implement a "repository" mechanism for generated files related
361         to "incomplete pointer types".   See the README file for details.
362    
363    ----------------------------------------------------------------------
364    Name: Matthias Blume
365    Date: 2002/02/14 11:50:00 EST
366    Tag: blume-20020214-quote
367    Description:
368    
369    Added a type 't t_' to tag.sml (in ml-nlffi-lib.cm).  This is required
370    because of the new and improved tag generation scheme.  (Thanks to Allen
371    Leung for pointing it out.)
372    
373    ----------------------------------------------------------------------
374    Name: Lal George
375    Date: 2002/02/14 09:55:27 EST 2002
376    Tag: george-20020214-isabelle-bug
377    Description:
378    
379    Fixed the MLRISC bug sent by Markus Wenzel regarding the compilation
380    of Isabelle on the x86.
381    
382    From Allen:
383    -----------
384     I've found the problem:
385    
386         in ra-core.sml, I use the counter "blocked" to keep track of the
387         true number of elements in the freeze queue.  When the counter goes
388         to zero, I skip examining the queue.  But I've messed up the
389         bookkeeping in combine():
390    
391             else ();
392             case !ucol of
393               PSEUDO => (if !cntv > 0 then
394                     (if !cntu > 0 then blocked := !blocked - 1 else ();
395                                        ^^^^^^^^^^^^^^^^^^^^^^^
396                      moveu := mergeMoveList(!movev, !moveu)
397                     )
398                  else ();
399    
400         combine() is called to coalesce two nodes u and v.
401         I think I was thinking that if the move counts of u and v are both
402         greater than zero then after they are coalesced then one node is
403         removed from the freeze queue.  Apparently I was thinking that
404         both u and v are of low degree, but that's clearly not necessarily true.
405    
406    
407    02/12/2002:
408        Here's the patch.  HOL now compiles.
409    
410        I don't know how this impact on performance (compile
411        time or runtime).  This bug caused the RA (especially on the x86)
412        to go thru the potential spill phase when there are still nodes on the
413        freeze queue.
414    
415    
416    
417    
418    ----------------------------------------------------------------------
419    Name: Matthias Blume
420    Date: 2002/02/13 22:40:00 EST
421    Tag: blume-20020213-fptr-rtti
422    Description:
423    
424    Fixed a bug in ml-nlffigen that was introduced with one of the previous
425    updates.
426    
427    ----------------------------------------------------------------------
428    Name: Matthias Blume
429    Date: 2002/02/13 16:41:00 EST
430    Tag: blume-20020213-cmlpq
431    Description:
432    
433    Added new priority queue export symbols (which have just been added to
434    smlnj-lib.cm) to CML's version of smlnj-lib.cm.  (Otherwise CML would
435    not compile and the installer would choke.)
436    
437    ----------------------------------------------------------------------
438    Name: Matthias Blume
439  Date: 2002/02/13 16:15:00 EST  Date: 2002/02/13 16:15:00 EST
440  Tag: blume-20020213-various  Tag: blume-20020213-various
441  Description:  Description:

Legend:
Removed from v.1062  
changed lines
  Added in v.1098

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0