Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1048, Thu Jan 31 06:36:17 2002 UTC revision 1116, Tue Mar 5 23:17:18 2002 UTC
# Line 13  Line 13 
13  Description:  Description:
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16    Name: Lal George
17    Date: 2002/03/05 17:24:48 EST
18    Tag: george-20020305-linkage-cluster
19    
20    In order to support the block placement optimization, the first
21    cluster that is generated (called the linkage cluster) contains a jump
22    to the entry point for the compilation unit. The linkage cluster
23    contains only one function, so block placement will have no effect on
24    the linkage cluster itself, but all the other clusters have full
25    freedom in the manner in which they reorder blocks or functions.
26    
27    On the x86 the typical linkage code that is generated is:
28       ----------------------
29            .align 2
30       L0:
31            addl    $L1-L0, 72(%esp)
32            jmp     L0
33    
34    
35            .align  2
36       L1:
37       ----------------------
38    
39    72(%esp) is the memory location for the stdlink register. This
40    must contain the address of the CPS function being called. In the
41    above example, it contains the address of memory for  L0; before
42    calling L1 (the real entry point for the compilation unit), it
43    must contain the address for L1, and hence
44    
45            addl $L1-L0, 72(%esp)
46    
47    I have tested this on all architectures except the hppa.
48    
49    ----------------------------------------------------------------------
50    Name: Allen Leung
51    Date: 2002/03/03 13:20:00 EST
52    Tag: leunga-20020303-mlrisc-tools
53    
54      Added #[ ... ] expressions to mlrisc tools
55    
56    ----------------------------------------------------------------------
57    Name: Matthias Blume
58    Date: 2002/02/27 12:29:00 EST
59    Tag: blume-20020227-cdebug
60    Description:
61    
62    - made types in structure C and C_Debug to be equal
63    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
64    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
65    
66    ----------------------------------------------------------------------
67    Name: Matthias Blume
68    Date: 2002/02/26 12:00:00 EST
69    Tag: blume-20020226-ffi
70    Description:
71    
72    1. Fixed a minor bug in CM's "noweb" tool:
73       If numbering is turned off, then truly don't number (i.e., do not
74       supply the -L option to noweb).  The previous behavior was to supply
75       -L'' -- which caused noweb to use the "default" line numbering scheme.
76       Thanks to Chris Richards for pointing this out (and supplying the fix).
77    
78    2. Once again, I reworked some aspects of the FFI:
79    
80       A. The incomplete/complete type business:
81    
82       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
83         gone!
84       - ML types representing an incomplete type are now *equal* to
85         ML types representing their corresponding complete types (just like
86         in C).  This is still safe because ml-nlffigen will not generate
87         RTTI for incomplete types, nor will it generate functions that
88         require access to such RTTI.   But when ML code generated from both
89         incomplete and complete versions of the C type meet, the ML types
90         are trivially interoperable.
91    
92         NOTE:  These changes restore the full generality of the translation
93         (which was previously lost when I eliminated functorization)!
94    
95       B. Enum types:
96    
97       - Structure C now has a type constructor "enum" that is similar to
98         how the "su" constructor works.  However, "enum" is not a phantom
99         type because each "T enum" has values (and is isomorphic to
100         MLRep.Signed.int).
101       - There are generic access operations for enum objects (using
102         MLRep.Signed.int).
103       - ml-nlffigen will generate a structure E_foo for each "enum foo".
104         * The structure contains the definition of type "mlrep" (the ML-side
105         representation type of the enum).  Normally, mlrep is the same
106         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
107         then mlrep will be defined as a datatype -- thus facilitating
108         pattern matching on mlrep values.
109         ("-ec" will be suppressed if there are duplicate values in an
110          enumeration.)
111         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
112         will be generated for each C enum constant xxx.
113         * Conversion functions m2i and i2m convert between mlrep and
114         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
115         * Coversion functions c and ml convert between mlrep and "tag enum".
116         * Access functions (get/set) fetch and store mlrep values.
117       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
118         enumerations are merged into one single enumeration represented by
119         structure E_'.
120    
121    ----------------------------------------------------------------------
122    Name: Allen Leung
123    Date: 2002/02/25 04:45:00 EST
124    Tag: leunga-20020225-cps-spill
125    
126    This is a new implementation of the CPS spill phase.
127    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
128    In case of problems, replace it with the old file spill.sml
129    
130    The current compiler runs into some serious performance problems when
131    constructing a large record.  This can happen when we try to compile a
132    structure with many items.  Even a very simple structure like the following
133    makes the compiler slow down.
134    
135        structure Foo = struct
136           val x_1 = 0w1 : Word32.int
137           val x_2 = 0w2 : Word32.int
138           val x_3 = 0w3 : Word32.int
139           ...
140           val x_N = 0wN : Word32.int
141        end
142    
143    The following table shows the compile time, from N=1000 to N=4000,
144    with the old compiler:
145    
146    N
147    1000   CPS 100 spill                           0.04u  0.00s  0.00g
148           MLRISC ra                               0.06u  0.00s  0.05g
149              (spills = 0 reloads = 0)
150           TOTAL                                   0.63u  0.07s  0.21g
151    
152    1100   CPS 100 spill                           8.25u  0.32s  0.64g
153           MLRISC ra                               5.68u  0.59s  3.93g
154              (spills = 0 reloads = 0)
155           TOTAL                                   14.71u  0.99s  4.81g
156    
157    1500   CPS 100 spill                           58.55u  2.34s  1.74g
158           MLRISC ra                               5.54u  0.65s  3.91g
159              (spills = 543 reloads = 1082)
160           TOTAL                                   65.40u  3.13s  6.00g
161    
162    2000   CPS 100 spill                           126.69u  4.84s  3.08g
163           MLRISC ra                               0.80u  0.10s  0.55g
164              (spills = 42 reloads = 84)
165           TOTAL                                   129.42u  5.10s  4.13g
166    
167    3000   CPS 100 spill                           675.59u  19.03s  11.64g
168           MLRISC ra                               2.69u  0.27s  1.38g
169              (spills = 62 reloads = 124)
170           TOTAL                                   682.48u  19.61s  13.99g
171    
172    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
173           MLRISC ra                               4.96u  0.27s  2.72g
174              (spills = 85 reloads = 170)
175           TOTAL                                   2375.26u  57.21s  48.00g
176    
177    As you can see the old cps spill module suffers from some serious
178    performance problem.  But since I cannot decipher the old code fully,
179    instead of patching the problems up, I'm reimplementing it
180    with a different algorithm.  The new code is more modular,
181    smaller when compiled, and substantially faster
182    (O(n log n) time and O(n) space).  Timing of the new spill module:
183    
184    4000  CPS 100 spill                           0.02u  0.00s  0.00g
185          MLRISC ra                               0.25u  0.02s  0.15g
186             (spills=1 reloads=3)
187          TOTAL                                   7.74u  0.34s  1.62g
188    
189    Implementation details:
190    
191    As far as I can tell, the purpose of the CPS spill module is to make sure the
192    number of live variables at any program point (the bandwidth)
193    does not exceed a certain limit, which is determined by the
194    size of the spill area.
195    
196    When the bandwidth is too large, we decrease the register pressure by
197    packing live variables into spill records.  How we achieve this is
198    completely different than what we did in the old code.
199    
200    First, there is something about the MLRiscGen code generator
201    that we should be aware of:
202    
203    o MLRiscGen performs code motion!
204    
205       In particular, it will move floating point computations and
206       address computations involving only the heap pointer to
207       their use sites (if there is only a single use).
208       What this means is that if we have a CPS record construction
209       statement
210    
211           RECORD(k,vl,w,e)
212    
213       we should never count the new record address w as live if w
214       has only one use (which is often the case).
215    
216       We should do something similar to floating point, but the transformation
217       there is much more complex, so I won't deal with that.
218    
219    Secondly, there are now two new cps primops at our disposal:
220    
221     1. rawrecord of record_kind option
222        This pure operator allocates some uninitialized storage from the heap.
223        There are two forms:
224    
225         rawrecord NONE [INT n]  allocates a tagless record of length n
226         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
227                                     and initializes the tag.
228    
229     2. rawupdate of cty
230          rawupdate cty (v,i,x)
231          Assigns to x to the ith component of record v.
232          The storelist is not updated.
233    
234    We use these new primops for both spilling and increment record construction.
235    
236     1. Spilling.
237    
238        This is implemented with a linear scan algorithm (but generalized
239        to trees).  The algorithm will create a single spill record at the
240        beginning of the cps function and use rawupdate to spill to it,
241        and SELECT or SELp to reload from it.  So both spills and reloads
242        are fine-grain operations.  In contrast, in the old algorithm
243        "spills" have to be bundled together in records.
244    
245        Ideally, we should sink the spill record construction to where
246        it is needed.  We can even split the spill record into multiple ones
247        at the places where they are needed.  But CPS is not a good
248        representation for global code motion, so I'll keep it simple and
249        am not attempting this.
250    
251     2. Incremental record construction (aka record splitting).
252    
253        Long records with many component values which are simulatenously live
254        (recall that single use record addresses are not considered to
255         be live) are constructed with rawrecord and rawupdate.
256        We allocate space on the heap with rawrecord first, then gradually
257        fill it in with rawupdate.  This is the technique suggested to me
258        by Matthias.
259    
260        Some restrictions on when this is applicable:
261        1. It is not a VECTOR record.  The code generator currently does not handle
262           this case. VECTOR record uses double indirection like arrays.
263        2. All the record component values are defined in the same "basic block"
264           as the record constructor.  This is to prevent speculative
265           record construction.
266    
267    ----------------------------------------------------------------------
268    Name: Allen Leung
269    Date: 2002/02/22 01:02:00 EST
270    Tag: leunga-20020222-mlrisc-tools
271    
272    Minor bug fixes in the parser and rewriter
273    
274    ----------------------------------------------------------------------
275    Name: Allen Leung
276    Date: 2002/02/21 20:20:00 EST
277    Tag: leunga-20020221-peephole
278    
279    Regenerated the peephole files.  Some contained typos in the specification
280    and some didn't compile because of pretty printing bugs in the old version
281    of 'nowhere'.
282    
283    ----------------------------------------------------------------------
284    Name: Allen Leung
285    Date: 2002/02/19 20:20:00 EST
286    Tag: leunga-20020219-mlrisc-tools
287    Description:
288    
289       Minor bug fixes to the mlrisc-tools library:
290    
291       1.  Fixed up parsing colon suffixed keywords
292       2.  Added the ability to shut the error messages up
293       3.  Reimplemented the pretty printer and fixed up/improved
294           the pretty printing of handle and -> types.
295       4.  Fixed up generation of literal symbols in the nowhere tool.
296       5.  Added some SML keywords to to sml.sty
297    
298    ----------------------------------------------------------------------
299    Name: Matthias Blume
300    Date: 2002/02/19 16:20:00 EST
301    Tag: blume-20020219-cmffi
302    Description:
303    
304    A wild mix of changes, some minor, some major:
305    
306    * All C FFI-related libraries are now anchored under $c:
307        $/c.cm      --> $c/c.cm
308        $/c-int.cm  --> $c/internals/c-int.cm
309        $/memory.cm --> $c/memory/memory.cm
310    
311    * "make" tool (in CM) now treats its argument pathname slightly
312      differently:
313        1. If the native expansion is an absolute name, then before invoking
314           the "make" command on it, CM will apply OS.Path.mkRelative
315           (with relativeTo = OS.FileSys.getDir()) to it.
316        2. The argument will be passed through to subsequent phases of CM
317           processing without "going native".  In particular, if the argument
318           was an anchored path, then "make" will not lose track of that anchor.
319    
320    * Compiler backends now "know" their respective C calling conventions
321      instead of having to be told about it by ml-nlffigen.  This relieves
322      ml-nlffigen from one of its burdens.
323    
324    * The X86Backend has been split into X86CCallBackend and X86StdCallBackend.
325    
326    * Export C_DEBUG and C_Debug from $c/c.cm.
327    
328    * C type encoding in ml-nlffi-lib has been improved to model the conceptual
329      subtyping relationship between incomplete pointers and their complete
330      counterparts.  For this, ('t, 'c) ptr has been changed to 'o ptr --
331      with the convention of instantiating 'o with ('t, 'c) obj whenever
332      the pointer target type is complete.  In the incomplete case, 'o
333      will be instantiated with some "'c iobj" -- a type obtained by
334      using one of the functors PointerToIncompleteType or PointerToCompleteType.
335    
336      Operations that work on both incomplete and complete pointer types are
337      typed as taking an 'o ptr while operations that require the target to
338      be known are typed as taking some ('t, 'c) obj ptr.
339    
340      voidptr is now a bit "more concrete", namely "type voidptr = void ptr'"
341      where void is an eqtype without any values.  This makes it possible
342      to work on voidptr values using functions meant to operate on light
343      incomplete pointers.
344    
345    * As a result of the above, signature POINTER_TO_INCOMPLETE_TYPE has
346      been vastly simplified.
347    
348    ----------------------------------------------------------------------
349    Name: Matthias Blume
350    Date: 2002/02/19 10:48:00 EST
351    Tag: blume-20020219-pqfix
352    Description:
353    
354    Applied Chris Okasaki's bug fix for priority queues.
355    
356    ----------------------------------------------------------------------
357    Name: Matthias Blume
358    Date: 2002/02/15 17:05:00
359    Tag: Release_110_39
360    Description:
361    
362    Last-minute retagging is becoming a tradition... :-(
363    
364    This is the working release 110.39.
365    
366    ----------------------------------------------------------------------
367    Name: Matthias Blume
368    Date: 2002/02/15 16:00:00 EST
369    Tag: Release_110_39-orig
370    Description:
371    
372    Working release 110.39.  New bootfiles.
373    
374    (Update: There was a small bug in the installer so it wouldn't work
375    with all shells.  So I retagged. -Matthias)
376    
377    ----------------------------------------------------------------------
378    Name: Matthias Blume
379    Date: 2002/02/15 14:17:00 EST
380    Tag: blume-20020215-showbindings
381    Description:
382    
383    Added EnvRef.listBoundSymbols and CM.State.showBindings.  Especially
384    the latter can be useful for exploring what bindings are available at
385    the interactive prompt.  (The first function returns only the list
386    of symbols that are really bound, the second prints those but also the
387    ones that CM's autoloading mechanism knows about.)
388    
389    ----------------------------------------------------------------------
390    Name: Matthias Blume
391    Date: 2002/02/15 12:08:00 EST
392    Tag: blume-20020215-iptrs
393    Description:
394    
395    Two improvements to ml-nlffigen:
396    
397      1. Write files only if they do not exist or if their current contents
398         do not coincide with what's being written.  (That is, avoid messing
399         with the time stamps unless absolutely necessary.)
400    
401      2. Implement a "repository" mechanism for generated files related
402         to "incomplete pointer types".   See the README file for details.
403    
404    ----------------------------------------------------------------------
405    Name: Matthias Blume
406    Date: 2002/02/14 11:50:00 EST
407    Tag: blume-20020214-quote
408    Description:
409    
410    Added a type 't t_' to tag.sml (in ml-nlffi-lib.cm).  This is required
411    because of the new and improved tag generation scheme.  (Thanks to Allen
412    Leung for pointing it out.)
413    
414    ----------------------------------------------------------------------
415    Name: Lal George
416    Date: 2002/02/14 09:55:27 EST 2002
417    Tag: george-20020214-isabelle-bug
418    Description:
419    
420    Fixed the MLRISC bug sent by Markus Wenzel regarding the compilation
421    of Isabelle on the x86.
422    
423    From Allen:
424    -----------
425     I've found the problem:
426    
427         in ra-core.sml, I use the counter "blocked" to keep track of the
428         true number of elements in the freeze queue.  When the counter goes
429         to zero, I skip examining the queue.  But I've messed up the
430         bookkeeping in combine():
431    
432             else ();
433             case !ucol of
434               PSEUDO => (if !cntv > 0 then
435                     (if !cntu > 0 then blocked := !blocked - 1 else ();
436                                        ^^^^^^^^^^^^^^^^^^^^^^^
437                      moveu := mergeMoveList(!movev, !moveu)
438                     )
439                  else ();
440    
441         combine() is called to coalesce two nodes u and v.
442         I think I was thinking that if the move counts of u and v are both
443         greater than zero then after they are coalesced then one node is
444         removed from the freeze queue.  Apparently I was thinking that
445         both u and v are of low degree, but that's clearly not necessarily true.
446    
447    
448    02/12/2002:
449        Here's the patch.  HOL now compiles.
450    
451        I don't know how this impact on performance (compile
452        time or runtime).  This bug caused the RA (especially on the x86)
453        to go thru the potential spill phase when there are still nodes on the
454        freeze queue.
455    
456    
457    
458    
459    ----------------------------------------------------------------------
460    Name: Matthias Blume
461    Date: 2002/02/13 22:40:00 EST
462    Tag: blume-20020213-fptr-rtti
463    Description:
464    
465    Fixed a bug in ml-nlffigen that was introduced with one of the previous
466    updates.
467    
468    ----------------------------------------------------------------------
469    Name: Matthias Blume
470    Date: 2002/02/13 16:41:00 EST
471    Tag: blume-20020213-cmlpq
472    Description:
473    
474    Added new priority queue export symbols (which have just been added to
475    smlnj-lib.cm) to CML's version of smlnj-lib.cm.  (Otherwise CML would
476    not compile and the installer would choke.)
477    
478    ----------------------------------------------------------------------
479    Name: Matthias Blume
480    Date: 2002/02/13 16:15:00 EST
481    Tag: blume-20020213-various
482    Description:
483    
484    1. More tweaks to ml-nlffigen:
485    
486       - better internal datastructures (resulting in slight speedup)
487       - "-match" option requires exact match
488       - "localized" gensym counters (untagged structs/unions nested within
489         other structs/unions or within typedefs get a fresh counter; their
490         tag will be prefixed by a concatenation of their parents' tags)
491       - bug fixes (related to calculation of transitive closure of types
492         to be included in the output)
493    
494    2. Minor Basis updates:
495    
496       - added implementations for List.collate and Option.app
497    
498    ----------------------------------------------------------------------
499    Name: Matthias Blume
500    Date: 2002/02/11 15:55:00 EST
501    Tag: blume-20020211-gensym
502    Description:
503    
504    Added a "-gensym" option to command line of ml-nlffigen.  This can be
505    used to specify a "stem" -- a string that is inserted in all "gensym'd"
506    names (ML structure names that correspond to unnamed C structs, unions,
507    and enums), so that separate runs of ml-nlffigen do not clash.
508    
509    ----------------------------------------------------------------------
510    Name: Matthias Blume
511    Date: 2002/02/11 12:05:00 EST
512    Tag: blume-20020211-gensml
513    Description:
514    
515    A quick fix for a problem with GenSML (in the pgraph-util library):
516    Make generation of toplevel "local" optional.  (Strictly speaking,
517    signature definitions within "local" are not legal SML.)
518    
519    Other than that: updates to INSTALL and cm/TODO.
520    
521    ----------------------------------------------------------------------
522    Name: Matthias Blume
523    Date: 2002/02/08 15:00:00 EST
524    Tag: blume-20020208-uniquepid
525    Description:
526    
527    0. Version number has been bumped to 110.38.1.  NEW BOOTFILES!!!
528    
529    1. The installer (config/install.sh) has gotten smarter:
530    
531         - Configuration options are a bit easier to specify now
532           (in config/targets).
533         - Bug in recognizing .tar.bz2 files fixed.
534         - Installer automatically resolves dependencies between
535           configuration options (e.g., if you ask for eXene, you will
536           also get cml -- regardless whether you asked for it or not).
537         - Installer can run in "quieter mode" by setting the environment
538           variable INSTALL_QUIETLY to "true".  "Quieter" does not mean
539           "completely silent", though.
540         - Build HashCons library as part of smlnj-lib.
541    
542    2. A new scheme for assigning persistent identifiers to compilation
543       units (and, by extension, to types etc.) has been put into place.
544       This fixes a long-standing bug where types and even dynamic values
545       can get internally confused, thereby compromising type safety
546       (abstraction) and dynamic correctness.  See
547    
548         http://cm.bell-labs.com/cm/cs/who/blume/pid-confusion.tgz
549    
550       for an example of how things could go wrong until now.
551    
552       The downside of the new scheme is that pids are not quite as
553       persistent as they used to be: CM will generate a fresh pid
554       for every compilation unit that it thinks it sees for the first
555       time.  That means that if you compile starting from a clean, fresh
556       source tree at two different times, you end up with different
557       binaries.
558    
559       Cutoff recompilation, however, has not been compromised because
560       CM keeps pid information in special caches between runs.
561    
562    ----------------------------------------------------------------------
563    Name: Lal George
564    Date: 2002/02/07 15:34:13 EST 2002
565    Tag: <none>
566    Description:
567    
568    Compilers that generate assembly code may produce  global labels
569    whose value is resolved at link time. The various peephole optimization
570    modules did not take this in account.
571    
572    TODO. The Labels.addrOf function should really return an option
573    type so that clients are forced to deal with this issue, rather
574    than an exception being raised.
575    
576    ----------------------------------------------------------------------
577    Name: Lal George
578    Date: 2002/02/06 13:55:02 EST
579    Tag: george-20020206-ra-breakup
580    Description:
581    
582    1. A bug fix from Allen.
583    
584        A typo causes extra fstp %st(0)'s to be generated at compensation
585        edges, which might cause stack underflow traps at runtime.  This
586        occurs in fft where there are extraneous fstps right before the 'into'
587        trap instruction (in this case they are harmless since none of the
588        integers overflow.)
589    
590    2. Pulled out various utility modules that were embedded in the modules
591       of the register allocator. I need these modules for other purposes, but
592       they are not complete enough to put into a library (just yet).
593    ----------------------------------------------------------------------
594    Name: Matthias Blume
595    Date: 2002/01/31 16:05:00 EST
596    Tag: blume-20020131-sparc-ccalls
597    Description:
598    
599    1. C-calls on Sparc needlessly allocated a huge chunk (96 bytes)
600       of extra stack space by mistake.  Fixed.
601    
602    2. Bug in logic of handling of command-line options in ml-nlffigen fixed.
603    
604    ----------------------------------------------------------------------
605  Name: Allen Leung  Name: Allen Leung
606  Date: 2002/01/30  Date: 2002/01/30
607  Tag: leunga-20020130-nowhere-bug-fix  Tag: leunga-20020130-nowhere-bug-fix
# Line 622  Line 1211 
1211    
1212  ----------------------------------------------------------------------  ----------------------------------------------------------------------
1213  Name: Matthias Blume  Name: Matthias Blume
 >>>>>>> 1.169  
1214  Date: 2001/09/18 15:35:00 EDT  Date: 2001/09/18 15:35:00 EDT
1215  Tag: blume-20010918-readme11036  Tag: blume-20010918-readme11036
1216  Description:  Description:

Legend:
Removed from v.1048  
changed lines
  Added in v.1116

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0