Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1044, Mon Jan 28 21:36:08 2002 UTC revision 1094, Mon Feb 25 09:58:56 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Allen Leung
17    Date: 2002/02/25 04:45:00 EST
18    Tag: leunga-20020225-cps-spill
19    
20    This is a new implementation of the CPS spill phase.
21    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
22    In case of problems, replace it with the old file spill.sml
23    
24    The current compiler runs into some serious performance problems when
25    constructing a large record.  This can happen when we try to compile a
26    structure with many items.  Even a very simple structure like the following
27    makes the compiler slow down.
28    
29        structure Foo = struct
30           val x_1 = 0w1 : Word32.int
31           val x_2 = 0w2 : Word32.int
32           val x_3 = 0w3 : Word32.int
33           ...
34           val x_N = 0wN : Word32.int
35        end
36    
37    The following table shows the compile time, from N=1000 to N=4000,
38    with the old compiler:
39    
40    N
41    1000   CPS 100 spill                           0.04u  0.00s  0.00g
42           MLRISC ra                               0.06u  0.00s  0.05g
43              (spills = 0 reloads = 0)
44           TOTAL                                   0.63u  0.07s  0.21g
45    
46    1100   CPS 100 spill                           8.25u  0.32s  0.64g
47           MLRISC ra                               5.68u  0.59s  3.93g
48              (spills = 0 reloads = 0)
49           TOTAL                                   14.71u  0.99s  4.81g
50    
51    1500   CPS 100 spill                           58.55u  2.34s  1.74g
52           MLRISC ra                               5.54u  0.65s  3.91g
53              (spills = 543 reloads = 1082)
54           TOTAL                                   65.40u  3.13s  6.00g
55    
56    2000   CPS 100 spill                           126.69u  4.84s  3.08g
57           MLRISC ra                               0.80u  0.10s  0.55g
58              (spills = 42 reloads = 84)
59           TOTAL                                   129.42u  5.10s  4.13g
60    
61    3000   CPS 100 spill                           675.59u  19.03s  11.64g
62           MLRISC ra                               2.69u  0.27s  1.38g
63              (spills = 62 reloads = 124)
64           TOTAL                                   682.48u  19.61s  13.99g
65    
66    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
67           MLRISC ra                               4.96u  0.27s  2.72g
68              (spills = 85 reloads = 170)
69           TOTAL                                   2375.26u  57.21s  48.00g
70    
71    As you can see the old cps spill module suffers from some serious
72    performance problem but since I cannot decipher the old code fully,
73    innstead of patching the problems up, I'm reimplementing it
74    with a different algorithm.  The new code is more modular,
75    smaller when compiled, and substantially faster
76    (O(n log n) time and O(n) space).  Timing of the new spill module:
77    
78    4000  CPS 100 spill                           0.02u  0.00s  0.00g
79          MLRISC ra                               0.25u  0.02s  0.15g
80             (spills=1 reloads=3)
81          TOTAL                                   7.74u  0.34s  1.62g
82    
83    Implementation details:
84    
85    As far as I can tell, the purpose of the CPS spill module is to make sure the
86    number of live variables at any program point (the bandwidth)
87    does not exceed a certain limit, which is determined by the
88    size of the spill area.
89    
90    When the bandwidth is too large, we decrease the register pressure by
91    packing live variables into spill records.  How we achieve this is
92    completely different than what we did in the old code.
93    
94    First, there is something about the MLRiscGen code generator
95    that we should be aware of:
96    
97    o MLRiscGen performs code motion!
98    
99       In particular, it will move floating point computations and
100       address computations involving only the heap pointer to
101       their use sites (if there is only a single use).
102       What this means is that if we have a CPS record construction
103       statement
104    
105           RECORD(k,vl,w,e)
106    
107       we should never count the new record address w as live if w
108       has only one use (which is often the case).
109    
110       We should do something similar to floating point, but the transformation
111       there is much more complex, so I won't deal with that.
112    
113    Secondly, there are now two new cps primops at our disposal:
114    
115     1. rawrecord of record_kind option
116        This pure operator allocates some uninitialized storage from the heap.
117        There are two forms:
118    
119         rawrecord NONE [INT n]  allocates a tagless record of length n
120         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
121                                     and initializes the tag.
122    
123     2. rawupdate of cty
124          rawupdate cty (v,i,x)
125          Assigns to x to the ith component of record v.
126          The storelist is not updated.
127    
128    We use these new primops for both spilling and increment record construction.
129    
130     1. Spilling.
131    
132        This is implemented with a linear scan algorithm (but generalized
133        to trees).  The algorithm will create a single spill record at the
134        beginning of the cps function and use rawupdate to spill to it,
135        and SELECT or SELp to reload from it.  So both spills and reloads
136        are fine-grain operations.  In contrast, in the old algorithm
137        "spills" have to be bundled together in records.
138    
139        Ideally, we should sink the spill record construction to where
140        it is needed.  We can even split the spill record into multiple ones
141        at the places where they are needed.  But CPS is not a good
142        representation for global code motion, so I'll keep it simple and
143        am not attempting this.
144    
145     2. Incremental record construction (aka record splitting).
146    
147        Long records with many component values which are simulatenously live
148        (recall that single use record addresses are not considered to
149         be live) are constructed with rawrecord and rawupdate.
150        We allocate space on the heap with rawrecord first, then gradually
151        fill it in with rawupdate.  This is the technique suggested to me
152        by Matthias.
153    
154        Some restrictions on when this is applicable:
155        1. It is not a VECTOR record.  The code generator currently does not handle
156           this case. VECTOR record uses double indirection like arrays.
157        2. All the record component values are defined in the same "basic block"
158           as the record constructor.  This is to prevent speculative
159           record construction.
160    
161    ----------------------------------------------------------------------
162    Name: Allen Leung
163    Date: 2002/02/22 01:02:00 EST
164    Tag: leunga-20020222-mlrisc-tools
165    
166    Minor bug fixes in the parser and rewriter
167    
168    ----------------------------------------------------------------------
169    Name: Allen Leung
170    Date: 2002/02/21 20:20:00 EST
171    Tag: leunga-20020221-peephole
172    
173    Regenerated the peephole files.  Some contained typos in the specification
174    and some didn't compile because of pretty printing bugs in the old version
175    of 'nowhere'.
176    
177    ----------------------------------------------------------------------
178    Name: Allen Leung
179    Date: 2002/02/19 20:20:00 EST
180    Tag: leunga-20020219-mlrisc-tools
181    Description:
182    
183       Minor bug fixes to the mlrisc-tools library:
184    
185       1.  Fixed up parsing colon suffixed keywords
186       2.  Added the ability to shut the error messages up
187       3.  Reimplemented the pretty printer and fixed up/improved
188           the pretty printing of handle and -> types.
189       4.  Fixed up generation of literal symbols in the nowhere tool.
190       5.  Added some SML keywords to to sml.sty
191    
192    ----------------------------------------------------------------------
193    Name: Matthias Blume
194    Date: 2002/02/19 16:20:00 EST
195    Tag: blume-20020219-cmffi
196    Description:
197    
198    A wild mix of changes, some minor, some major:
199    
200    * All C FFI-related libraries are now anchored under $c:
201        $/c.cm      --> $c/c.cm
202        $/c-int.cm  --> $c/internals/c-int.cm
203        $/memory.cm --> $c/memory/memory.cm
204    
205    * "make" tool (in CM) now treats its argument pathname slightly
206      differently:
207        1. If the native expansion is an absolute name, then before invoking
208           the "make" command on it, CM will apply OS.Path.mkRelative
209           (with relativeTo = OS.FileSys.getDir()) to it.
210        2. The argument will be passed through to subsequent phases of CM
211           processing without "going native".  In particular, if the argument
212           was an anchored path, then "make" will not lose track of that anchor.
213    
214    * Compiler backends now "know" their respective C calling conventions
215      instead of having to be told about it by ml-nlffigen.  This relieves
216      ml-nlffigen from one of its burdens.
217    
218    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
219    
220    * Export C_DEBUG and C_Debug from $c/c.cm.
221    
222    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
223      subtyping relationship between incomplete pointers and their complete
224      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
225      with the convention of instantiating 'o with ('t, 'c) obj whenever
226      the pointer target type is complete.  In the incomplete case, 'o
227      will be instantiated with some "'c iobj" -- a type obtained by
228      using one of the functors PointerToIncompleteType or PointerToCompleteType.
229    
230      Operations that work on both incomplete and complete pointer types are
231      typed as taking an 'o ptr while operations that require the target to
232      be known are typed as taking some ('t, 'c) obj ptr.
233    
234      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
235      where void is an eqtype without any values.  This makes it possible
236      to work on voidptr values using functions meant to operate on light
237      incomplete pointers.
238    
239    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
240      been vastly simplified.
241    
242    ----------------------------------------------------------------------
243    Name: Matthias Blume
244    Date: 2002/02/19 10:48:00 EST
245    Tag: blume-20020219-pqfix
246    Description:
247    
248    Applied Chris Okasaki's bug fix for priority queues.
249    
250    ----------------------------------------------------------------------
251    Name: Matthias Blume
252    Date: 2002/02/15 17:05:00
253    Tag: Release_110_39
254    Description:
255    
256    Last-minute retagging is becoming a tradition... :-(
257    
258    This is the working release 110.39.
259    
260    ----------------------------------------------------------------------
261    Name: Matthias Blume
262    Date: 2002/02/15 16:00:00 EST
263    Tag: Release_110_39-orig
264    Description:
265    
266    Working release 110.39.  New bootfiles.
267    
268    (Update: There was a small bug in the installer so it wouldn't work
269    with all shells.  So I retagged. -Matthias)
270    
271    ----------------------------------------------------------------------
272    Name: Matthias Blume
273    Date: 2002/02/15 14:17:00 EST
274    Tag: blume-20020215-showbindings
275    Description:
276    
277    Added EnvRef.listBoundSymbols and CM.State.showBindings.  Especially
278    the latter can be useful for exploring what bindings are available at
279    the interactive prompt.  (The first function returns only the list
280    of symbols that are really bound, the second prints those but also the
281    ones that CM's autoloading mechanism knows about.)
282    
283    ----------------------------------------------------------------------
284    Name: Matthias Blume
285    Date: 2002/02/15 12:08:00 EST
286    Tag: blume-20020215-iptrs
287    Description:
288    
289    Two improvements to ml-nlffigen:
290    
291      1. Write files only if they do not exist or if their current contents
292         do not coincide with what's being written.  (That is, avoid messing
293         with the time stamps unless absolutely necessary.)
294    
295      2. Implement a "repository" mechanism for generated files related
296         to "incomplete pointer types".   See the README file for details.
297    
298    ----------------------------------------------------------------------
299    Name: Matthias Blume
300    Date: 2002/02/14 11:50:00 EST
301    Tag: blume-20020214-quote
302    Description:
303    
304    Added a type 't t_' to tag.sml (in ml-nlffi-lib.cm).  This is required
305    because of the new and improved tag generation scheme.  (Thanks to Allen
306    Leung for pointing it out.)
307    
308    ----------------------------------------------------------------------
309    Name: Lal George
310    Date: 2002/02/14 09:55:27 EST 2002
311    Tag: george-20020214-isabelle-bug
312    Description:
313    
314    Fixed the MLRISC bug sent by Markus Wenzel regarding the compilation
315    of Isabelle on the x86.
316    
317    From Allen:
318    -----------
319     I've found the problem:
320    
321         in ra-core.sml, I use the counter "blocked" to keep track of the
322         true number of elements in the freeze queue.  When the counter goes
323         to zero, I skip examining the queue.  But I've messed up the
324         bookkeeping in combine():
325    
326             else ();
327             case !ucol of
328               PSEUDO => (if !cntv > 0 then
329                     (if !cntu > 0 then blocked := !blocked - 1 else ();
330                                        ^^^^^^^^^^^^^^^^^^^^^^^
331                      moveu := mergeMoveList(!movev, !moveu)
332                     )
333                  else ();
334    
335         combine() is called to coalesce two nodes u and v.
336         I think I was thinking that if the move counts of u and v are both
337         greater than zero then after they are coalesced then one node is
338         removed from the freeze queue.  Apparently I was thinking that
339         both u and v are of low degree, but that's clearly not necessarily true.
340    
341    
342    02/12/2002:
343        Here's the patch.  HOL now compiles.
344    
345        I don't know how this impact on performance (compile
346        time or runtime).  This bug caused the RA (especially on the x86)
347        to go thru the potential spill phase when there are still nodes on the
348        freeze queue.
349    
350    
351    
352    
353    ----------------------------------------------------------------------
354    Name: Matthias Blume
355    Date: 2002/02/13 22:40:00 EST
356    Tag: blume-20020213-fptr-rtti
357    Description:
358    
359    Fixed a bug in ml-nlffigen that was introduced with one of the previous
360    updates.
361    
362    ----------------------------------------------------------------------
363    Name: Matthias Blume
364    Date: 2002/02/13 16:41:00 EST
365    Tag: blume-20020213-cmlpq
366    Description:
367    
368    Added new priority queue export symbols (which have just been added to
369    smlnj-lib.cm) to CML's version of smlnj-lib.cm.  (Otherwise CML would
370    not compile and the installer would choke.)
371    
372    ----------------------------------------------------------------------
373    Name: Matthias Blume
374    Date: 2002/02/13 16:15:00 EST
375    Tag: blume-20020213-various
376    Description:
377    
378    1. More tweaks to ml-nlffigen:
379    
380       - better internal datastructures (resulting in slight speedup)
381       - "-match" option requires exact match
382       - "localized" gensym counters (untagged structs/unions nested within
383         other structs/unions or within typedefs get a fresh counter; their
384         tag will be prefixed by a concatenation of their parents' tags)
385       - bug fixes (related to calculation of transitive closure of types
386         to be included in the output)
387    
388    2. Minor Basis updates:
389    
390       - added implementations for List.collate and Option.app
391    
392    ----------------------------------------------------------------------
393    Name: Matthias Blume
394    Date: 2002/02/11 15:55:00 EST
395    Tag: blume-20020211-gensym
396    Description:
397    
398    Added a "-gensym" option to command line of ml-nlffigen.  This can be
399    used to specify a "stem" -- a string that is inserted in all "gensym'd"
400    names (ML structure names that correspond to unnamed C structs, unions,
401    and enums), so that separate runs of ml-nlffigen do not clash.
402    
403    ----------------------------------------------------------------------
404    Name: Matthias Blume
405    Date: 2002/02/11 12:05:00 EST
406    Tag: blume-20020211-gensml
407    Description:
408    
409    A quick fix for a problem with GenSML (in the pgraph-util library):
410    Make generation of toplevel "local" optional.  (Strictly speaking,
411    signature definitions within "local" are not legal SML.)
412    
413    Other than that: updates to INSTALL and cm/TODO.
414    
415    ----------------------------------------------------------------------
416    Name: Matthias Blume
417    Date: 2002/02/08 15:00:00 EST
418    Tag: blume-20020208-uniquepid
419    Description:
420    
421    0. Version number has been bumped to 110.38.1.  NEW BOOTFILES!!!
422    
423    1. The installer (config/install.sh) has gotten smarter:
424    
425         - Configuration options are a bit easier to specify now
426           (in config/targets).
427         - Bug in recognizing .tar.bz2 files fixed.
428         - Installer automatically resolves dependencies between
429           configuration options (e.g., if you ask for eXene, you will
430           also get cml -- regardless whether you asked for it or not).
431         - Installer can run in "quieter mode" by setting the environment
432           variable INSTALL_QUIETLY to "true".  "Quieter" does not mean
433           "completely silent", though.
434         - Build HashCons library as part of smlnj-lib.
435    
436    2. A new scheme for assigning persistent identifiers to compilation
437       units (and, by extension, to types etc.) has been put into place.
438       This fixes a long-standing bug where types and even dynamic values
439       can get internally confused, thereby compromising type safety
440       (abstraction) and dynamic correctness.  See
441    
442         http://cm.bell-labs.com/cm/cs/who/blume/pid-confusion.tgz
443    
444       for an example of how things could go wrong until now.
445    
446       The downside of the new scheme is that pids are not quite as
447       persistent as they used to be: CM will generate a fresh pid
448       for every compilation unit that it thinks it sees for the first
449       time.  That means that if you compile starting from a clean, fresh
450       source tree at two different times, you end up with different
451       binaries.
452    
453       Cutoff recompilation, however, has not been compromised because
454       CM keeps pid information in special caches between runs.
455    
456    ----------------------------------------------------------------------
457    Name: Lal George
458    Date: 2002/02/07 15:34:13 EST 2002
459    Tag: <none>
460    Description:
461    
462    Compilers that generate assembly code may produce  global labels
463    whose value is resolved at link time. The various peephole optimization
464    modules did not take this in account.
465    
466    TODO. The Labels.addrOf function should really return an option
467    type so that clients are forced to deal with this issue, rather
468    than an exception being raised.
469    
470    ----------------------------------------------------------------------
471    Name: Lal George
472    Date: 2002/02/06 13:55:02 EST
473    Tag: george-20020206-ra-breakup
474    Description:
475    
476    1. A bug fix from Allen.
477    
478        A typo causes extra fstp %st(0)'s to be generated at compensation
479        edges, which might cause stack underflow traps at runtime.  This
480        occurs in fft where there are extraneous fstps right before the 'into'
481        trap instruction (in this case they are harmless since none of the
482        integers overflow.)
483    
484    2. Pulled out various utility modules that were embedded in the modules
485       of the register allocator. I need these modules for other purposes, but
486       they are not complete enough to put into a library (just yet).
487    ----------------------------------------------------------------------
488    Name: Matthias Blume
489    Date: 2002/01/31 16:05:00 EST
490    Tag: blume-20020131-sparc-ccalls
491    Description:
492    
493    1. C-calls on Sparc needlessly allocated a huge chunk (96 bytes)
494       of extra stack space by mistake.  Fixed.
495    
496    2. Bug in logic of handling of command-line options in ml-nlffigen fixed.
497    
498    ----------------------------------------------------------------------
499    Name: Allen Leung
500    Date: 2002/01/30
501    Tag: leunga-20020130-nowhere-bug-fix
502    Description:
503    
504       MLRISC bug fixes:
505       1. Fixed a bindings computation bug in the 'nowhere' program generator tool.
506       2. MachineInt.fromString was negating its value.
507    
508    ----------------------------------------------------------------------
509    Name: Matthias Blume
510    Date: 2002/01/29
511    Tag: blume-20020129-INSTALL
512    Description:
513    
514    - Added somewhat detailed installation instructions (file INSTALL).
515    - Fixed curl-detection bug in config/install.sh.
516    - It is now possible to select the URL getter using the URLGETTER
517      environment variable:
518    
519          not set / "unknown"      --> automatic detection (script tries wget,
520                                       curl, and lynx)
521          "wget" / "curl" / "lynx" --> use the specified program (script "knows"
522                                       how to properly invoke them)
523          other                    --> use $URLGETTER directly, it must take
524                                       precisely two command-line arguments
525                                       (source URL and destination file name)
526    
527    ----------------------------------------------------------------------
528    Name: Matthias Blume
529    Date: 2002/01/28
530    Tag: blume-20020128-sparc-ccalls
531    Description:
532    
533    - Fixed problem with calculation of "used" registers in sparc-c-calls.
534    - Make use of the allocParam argument in sparc-c-calls.
535    
536    ----------------------------------------------------------------------
537  Name: Matthias Blume  Name: Matthias Blume
538  Date: 2002/01/28  Date: 2002/01/28
539  Tag: blume-20020128-allocParam  Tag: blume-20020128-allocParam
# Line 584  Line 1105 
1105    
1106  ----------------------------------------------------------------------  ----------------------------------------------------------------------
1107  Name: Matthias Blume  Name: Matthias Blume
 >>>>>>> 1.169  
1108  Date: 2001/09/18 15:35:00 EDT  Date: 2001/09/18 15:35:00 EDT
1109  Tag: blume-20010918-readme11036  Tag: blume-20010918-readme11036
1110  Description:  Description:

Legend:
Removed from v.1044  
changed lines
  Added in v.1094

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0