Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Diff of /sml/trunk/HISTORY
ViewVC logotype

Diff of /sml/trunk/HISTORY

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1085, Fri Feb 22 00:15:55 2002 UTC revision 1142, Wed Mar 13 22:25:37 2002 UTC
# Line 14  Line 14 
14    
15  ----------------------------------------------------------------------  ----------------------------------------------------------------------
16  Name: Allen Leung  Name: Allen Leung
17    Date: 2002/03/13 17:30:00 EST
18    Tag: leunga-20020313-x86-fp-unary
19    Description:
20    
21    Bug fix for:
22    
23    > leunga@weaselbane:~/Yale/tmp/sml-dist{21} bin/sml
24    > Standard ML of New Jersey v110.39.1 [FLINT v1.5], March 08, 2002
25    > - fun f(x,(y,z)) = Real.~ y;
26    > [autoloading]
27    > [autoloading done]
28    >       fchsl   (%eax), 184(%esp)
29    > Error: MLRisc bug: X86MCEmitter.emitInstr
30    >
31    > uncaught exception Error
32    >   raised at: ../MLRISC/control/mlriscErrormsg.sml:16.14-16.19
33    
34    The problem was that the code generator did not generate any fp registers
35    in this case, and the ra didn't know that it needed to run the X86FP phase to
36    translate the pseudo fp instruction.   This only happened with unary fp
37    operators in certain situations.
38    
39    ----------------------------------------------------------------------
40    Name: Matthias Blume
41    Date: 2002/03/13 14:00:00 EST
42    Tag: blume-20020313-overload-etc
43    Description:
44    
45    1. Added _overload as a synonym for overload for backward compatibility.
46       (Control.overloadKW must be true for either version to be accepted.)
47    
48    2. Fixed bug in install script that caused more things to be installed
49       than what was requested in config/targets.
50    
51    3. Made CM aware of the (_)overload construct so that autoloading
52       works.
53    
54    ----------------------------------------------------------------------
55    Name: Matthias Blume
56    Date: 2002/03/12 22:03:00 EST
57    Tag: blume-20020312-url
58    Description:
59    
60    Forgot to update BOOT and srcarchiveurl.
61    
62    ----------------------------------------------------------------------
63    Name: Matthias Blume
64    Date: 2002/03/12 17:30:00 EST
65    Tag: blume-20020312-version110392
66    Description:
67    
68    Yet another version number bump (because of small changes to the
69    binfile format).  Version number is now 110.39.2.  NEW BOOTFILES!
70    
71    Changes:
72    
73      The new pid generation scheme described a few weeks ago was overly
74      complicated.  I implemented a new mechanism that is simpler and
75      provides a bit more "stability":  Once CM has seen a compilation
76      unit, it keeps its identity constant (as long as you do not delete
77      those crucial CM/GUID/* files).  This means that when you change
78      and interface, compiler, then go back to the old interface, and
79      compile again, you arrive at the original pid.
80    
81      There now also is a mechanism that instructs CM to use the plain
82      environment hash as a module's pid (effectively making its GUID
83      the empty string).  For this, "noguid" must be specified as an
84      option to the .sml file in question within its .cm file.
85      This is most useful for code that is being generated by tools such
86      as ml-nlffigen (because during development programmers tend to
87      erase the tool's entire output directory tree including CM's cached
88      GUIDs).  "noguid" is somewhat dangerous (since it can be used to locally
89      revert to the old, broken behavior of SML/NJ, but in specific cases
90      where there is no danger of interface confusion, its use is ok
91      (I think).
92    
93      ml-nlffigen by default generates "noguid" annotations.  They can be
94      turned off by specifying -guid in its command line.
95    
96    ----------------------------------------------------------------------
97    Name: Lal George
98    Date: 2002/03/12 12 14:42:36 EST
99    Tag: george-20020312-frequency-computation
100    Description:
101    
102    Integrated jump chaining and static block frequency into the
103    compiler. More details and numbers later.
104    
105    ----------------------------------------------------------------------
106    Name: Lal George
107    Date: 2002/03/11 11 22:38:53 EST
108    Tag: george-20020311-jump-chain-elim
109    Description:
110    
111    Tested the jump chain elimination on all architectures (except the
112    hppa).  This is on by default right now and is profitable for the
113    alpha and x86, however, it may not be profitable for the sparc and ppc
114    when compiling the compiler.
115    
116    The gc test will typically jump to a label at the end of the cluster,
117    where there is another jump to an external cluster containing the actual
118    code to invoke gc. This is to allow factoring of common gc invocation
119    sequences. That is to say, we generate:
120    
121            f:
122               testgc
123               ja   L1      % jump if above to L1
124    
125            L1:
126               jmp L2
127    
128    
129    After jump chain elimination the 'ja L1' instructions is converted to
130    'ja L2'. On the sparc and ppc, many of the 'ja L2' instructions may end
131    up being implemented in their long form (if L2 is far away) using:
132    
133            jbe     L3      % jump if below or equal to L3
134            jmp     L2
135         L3:
136            ...
137    
138    
139    For large compilation units L2  may be far away.
140    
141    
142    ----------------------------------------------------------------------
143    Name: Matthias Blume
144    Date: 2002/03/11 13:30:00 EST
145    Tag: blume-20020311-mltreeeval
146    Description:
147    
148    A functor parameter was missing.
149    
150    ----------------------------------------------------------------------
151    Name: Allen Leung
152    Date: 2002/03/11 10:30:00 EST
153    Tag: leunga-20020311-runtime-string0
154    Description:
155    
156       The representation of the empty string now points to a
157    legal null terminated C string instead of unit.  It is now possible
158    to convert an ML string into C string with InlineT.CharVector.getData.
159    This compiles into one single machine instruction.
160    
161    ----------------------------------------------------------------------
162    Name: Allen Leung
163    Date: 2002/03/10 23:55:00 EST
164    Tag: leunga-20020310-x86-call
165    Description:
166    
167       Added machine generation for CALL instruction (relative displacement mode)
168    
169    ----------------------------------------------------------------------
170    Name: Matthias Blume
171    Date: 2002/03/08 16:05:00
172    Tag: blume-20020308-entrypoints
173    Description:
174    
175    Version number bumped to 110.39.1.  NEW BOOTFILES!
176    
177    Entrypoints: non-zero offset into a code object where execution should begin.
178    
179    - Added the notion of an entrypoint to CodeObj.
180    - Added reading/writing of entrypoint info to Binfile.
181    - Made runtime system bootloader aware of entrypoints.
182    - Use the address of the label of the first function given to mlriscGen
183      as the entrypoint.  This address is currently always 0, but it will
184      not be 0 once we turn on block placement.
185    - Removed the linkage cluster code (which was The Other Way(tm) of dealing
186      with entry points) from mlriscGen.
187    
188    ----------------------------------------------------------------------
189    Name: Allen Leung
190    Date: 2002/03/07 20:45:00 EST
191    Tag: leunga-20020307-x86-cmov
192    Description:
193    
194       Bug fixes for CMOVcc on x86.
195    
196       1. Added machine code generation for CMOVcc
197       2. CMOVcc is now generated in preference over SETcc on PentiumPro or above.
198       3. CMOVcc cannot have an immediate operand as argument.
199    
200    ----------------------------------------------------------------------
201    Name: Matthias Blume
202    Date: 2002/03/07 16:15:00 EST
203    Tag: blume-20020307-controls
204    Description:
205    
206    This is a very large but mostly boring patch which makes (almost)
207    every tuneable compiler knob (i.e., pretty much everything under
208    Control.* plus a few other things) configurable via both the command
209    line and environment variables in the style CM did its configuration
210    until now.
211    
212    Try starting sml with '-h' (or, if you are brave, '-H')
213    
214    To this end, I added a structure Controls : CONTROLS to smlnj-lib.cm which
215    implements the underlying generic mechanism.
216    
217    The interface to some of the existing such facilities has changed somewhat.
218    For example, the MLRiscControl module now provides mkFoo instead of getFoo.
219    (The getFoo interface is still there for backward-compatibility, but its
220    use is deprecated.)
221    
222    The ml-build script passes -Cxxx=yyy command-line arguments through so
223    that one can now twiddle the compiler settings when using this "batch"
224    compiler.
225    
226    TODO items:
227    
228    We should go through and throw out all controls that are no longer
229    connected to anything.  Moreover, we should go through and provide
230    meaningful (and correct!) documentation strings for those controls
231    that still are connected.
232    
233    Currently, multiple calls to Controls.new are accepted (only the first
234    has any effect).  Eventually we should make sure that every control
235    is being made (via Controls.new) exactly once.  Future access can then
236    be done using Controls.acc.
237    
238    Finally, it would probably be a good idea to use the getter-setter
239    interface to controls rather than ref cells.  For the time being, both
240    styles are provided by the Controls module, but getter-setter pairs are
241    better if thread-safety is of any concern because they can be wrapped.
242    
243    *****************************************
244    
245    One bug fix: The function blockPlacement in three of the MLRISC
246    backpatch files used to be hard-wired to one of two possibilities at
247    link time (according to the value of the placementFlag).  But (I
248    think) it should rather sense the flag every time.
249    
250    *****************************************
251    
252    Other assorted changes (by other people who did not supply a HISTORY entry):
253    
254    1. the cross-module inliner now works much better (Monnier)
255    2. representation of weights, frequencies, and probabilities in MLRISC
256       changed in preparation of using those for weighted block placement
257       (Reppy, George)
258    
259    ----------------------------------------------------------------------
260    Name: Lal George
261    Date: 2002/03/07 14:44:24 EST 2002
262    Tag: george-20020307-weighted-block-placement
263    
264    Tested the weighted block placement optimization on all architectures
265    (except the hppa) using AMPL to generate the block and edge frequencies.
266    Changes were required in the machine properties to correctly
267    categorize trap instructions. There is an MLRISC flag
268    "weighted-block-placement" that can be used to enable weighted block
269    placement, but this will be ineffective without block/edge
270    frequencies (coming soon).
271    
272    
273    ----------------------------------------------------------------------
274    Name: Lal George
275    Date: 2002/03/05 17:24:48 EST
276    Tag: george-20020305-linkage-cluster
277    
278    In order to support the block placement optimization, a new cluster
279    is generated as the very first cluster (called the linkage cluster).
280    It contains a single jump to the 'real' entry point for the compilation
281    unit. Block placement has no effect on the linkage cluster itself, but
282    all the other clusters  have full freedom in the manner in which they
283    reorder blocks or functions.
284    
285    On the x86 the typical linkage code that is generated is:
286       ----------------------
287            .align 2
288       L0:
289            addl    $L1-L0, 72(%esp)
290            jmp     L1
291    
292    
293            .align  2
294       L1:
295       ----------------------
296    
297    72(%esp) is the memory location for the stdlink register. This
298    must contain the address of the CPS function being called. In the
299    above example, it contains the address of  L0; before
300    calling L1 (the real entry point for the compilation unit), it
301    must contain the address for L1, and hence
302    
303            addl $L1-L0, 72(%esp)
304    
305    I have tested this on all architectures except the hppa.The increase
306    in code size is of course negligible
307    
308    ----------------------------------------------------------------------
309    Name: Allen Leung
310    Date: 2002/03/03 13:20:00 EST
311    Tag: leunga-20020303-mlrisc-tools
312    
313      Added #[ ... ] expressions to mlrisc tools
314    
315    ----------------------------------------------------------------------
316    Name: Matthias Blume
317    Date: 2002/02/27 12:29:00 EST
318    Tag: blume-20020227-cdebug
319    Description:
320    
321    - made types in structure C and C_Debug to be equal
322    - got rid of code duplication (c-int.sml vs. c-int-debug.sml)
323    - there no longer is a C_Int_Debug (C_Debug is directly derived from C)
324    
325    ----------------------------------------------------------------------
326    Name: Matthias Blume
327    Date: 2002/02/26 12:00:00 EST
328    Tag: blume-20020226-ffi
329    Description:
330    
331    1. Fixed a minor bug in CM's "noweb" tool:
332       If numbering is turned off, then truly don't number (i.e., do not
333       supply the -L option to noweb).  The previous behavior was to supply
334       -L'' -- which caused noweb to use the "default" line numbering scheme.
335       Thanks to Chris Richards for pointing this out (and supplying the fix).
336    
337    2. Once again, I reworked some aspects of the FFI:
338    
339       A. The incomplete/complete type business:
340    
341       - Signatures POINTER_TO_INCOMPLETE_TYPE and accompanying functors are
342         gone!
343       - ML types representing an incomplete type are now *equal* to
344         ML types representing their corresponding complete types (just like
345         in C).  This is still safe because ml-nlffigen will not generate
346         RTTI for incomplete types, nor will it generate functions that
347         require access to such RTTI.   But when ML code generated from both
348         incomplete and complete versions of the C type meet, the ML types
349         are trivially interoperable.
350    
351         NOTE:  These changes restore the full generality of the translation
352         (which was previously lost when I eliminated functorization)!
353    
354       B. Enum types:
355    
356       - Structure C now has a type constructor "enum" that is similar to
357         how the "su" constructor works.  However, "enum" is not a phantom
358         type because each "T enum" has values (and is isomorphic to
359         MLRep.Signed.int).
360       - There are generic access operations for enum objects (using
361         MLRep.Signed.int).
362       - ml-nlffigen will generate a structure E_foo for each "enum foo".
363         * The structure contains the definition of type "mlrep" (the ML-side
364         representation type of the enum).  Normally, mlrep is the same
365         as "MLRep.Signed.int", but if ml-nlffigen was invoked with "-ec",
366         then mlrep will be defined as a datatype -- thus facilitating
367         pattern matching on mlrep values.
368         ("-ec" will be suppressed if there are duplicate values in an
369          enumeration.)
370         * Constructors ("-ec") or values (no "-ec") e_xxx of type mlrep
371         will be generated for each C enum constant xxx.
372         * Conversion functions m2i and i2m convert between mlrep and
373         MLRep.Signed.int.  (Without "-ec", these functions are identities.)
374         * Coversion functions c and ml convert between mlrep and "tag enum".
375         * Access functions (get/set) fetch and store mlrep values.
376       - By default (unless ml-nlffigen was invoked with "-nocollect"), unnamed
377         enumerations are merged into one single enumeration represented by
378         structure E_'.
379    
380    ----------------------------------------------------------------------
381    Name: Allen Leung
382    Date: 2002/02/25 04:45:00 EST
383    Tag: leunga-20020225-cps-spill
384    
385    This is a new implementation of the CPS spill phase.
386    The new phase is in the new file compiler/CodeGen/cpscompile/spill-new.sml
387    In case of problems, replace it with the old file spill.sml
388    
389    The current compiler runs into some serious performance problems when
390    constructing a large record.  This can happen when we try to compile a
391    structure with many items.  Even a very simple structure like the following
392    makes the compiler slow down.
393    
394        structure Foo = struct
395           val x_1 = 0w1 : Word32.int
396           val x_2 = 0w2 : Word32.int
397           val x_3 = 0w3 : Word32.int
398           ...
399           val x_N = 0wN : Word32.int
400        end
401    
402    The following table shows the compile time, from N=1000 to N=4000,
403    with the old compiler:
404    
405    N
406    1000   CPS 100 spill                           0.04u  0.00s  0.00g
407           MLRISC ra                               0.06u  0.00s  0.05g
408              (spills = 0 reloads = 0)
409           TOTAL                                   0.63u  0.07s  0.21g
410    
411    1100   CPS 100 spill                           8.25u  0.32s  0.64g
412           MLRISC ra                               5.68u  0.59s  3.93g
413              (spills = 0 reloads = 0)
414           TOTAL                                   14.71u  0.99s  4.81g
415    
416    1500   CPS 100 spill                           58.55u  2.34s  1.74g
417           MLRISC ra                               5.54u  0.65s  3.91g
418              (spills = 543 reloads = 1082)
419           TOTAL                                   65.40u  3.13s  6.00g
420    
421    2000   CPS 100 spill                           126.69u  4.84s  3.08g
422           MLRISC ra                               0.80u  0.10s  0.55g
423              (spills = 42 reloads = 84)
424           TOTAL                                   129.42u  5.10s  4.13g
425    
426    3000   CPS 100 spill                           675.59u  19.03s  11.64g
427           MLRISC ra                               2.69u  0.27s  1.38g
428              (spills = 62 reloads = 124)
429           TOTAL                                   682.48u  19.61s  13.99g
430    
431    4000   CPS 100 spill                           2362.82u  56.28s  43.60g
432           MLRISC ra                               4.96u  0.27s  2.72g
433              (spills = 85 reloads = 170)
434           TOTAL                                   2375.26u  57.21s  48.00g
435    
436    As you can see the old cps spill module suffers from some serious
437    performance problem.  But since I cannot decipher the old code fully,
438    instead of patching the problems up, I'm reimplementing it
439    with a different algorithm.  The new code is more modular,
440    smaller when compiled, and substantially faster
441    (O(n log n) time and O(n) space).  Timing of the new spill module:
442    
443    4000  CPS 100 spill                           0.02u  0.00s  0.00g
444          MLRISC ra                               0.25u  0.02s  0.15g
445             (spills=1 reloads=3)
446          TOTAL                                   7.74u  0.34s  1.62g
447    
448    Implementation details:
449    
450    As far as I can tell, the purpose of the CPS spill module is to make sure the
451    number of live variables at any program point (the bandwidth)
452    does not exceed a certain limit, which is determined by the
453    size of the spill area.
454    
455    When the bandwidth is too large, we decrease the register pressure by
456    packing live variables into spill records.  How we achieve this is
457    completely different than what we did in the old code.
458    
459    First, there is something about the MLRiscGen code generator
460    that we should be aware of:
461    
462    o MLRiscGen performs code motion!
463    
464       In particular, it will move floating point computations and
465       address computations involving only the heap pointer to
466       their use sites (if there is only a single use).
467       What this means is that if we have a CPS record construction
468       statement
469    
470           RECORD(k,vl,w,e)
471    
472       we should never count the new record address w as live if w
473       has only one use (which is often the case).
474    
475       We should do something similar to floating point, but the transformation
476       there is much more complex, so I won't deal with that.
477    
478    Secondly, there are now two new cps primops at our disposal:
479    
480     1. rawrecord of record_kind option
481        This pure operator allocates some uninitialized storage from the heap.
482        There are two forms:
483    
484         rawrecord NONE [INT n]  allocates a tagless record of length n
485         rawrecord (SOME rk) [INT n] allocates a tagged record of length n
486                                     and initializes the tag.
487    
488     2. rawupdate of cty
489          rawupdate cty (v,i,x)
490          Assigns to x to the ith component of record v.
491          The storelist is not updated.
492    
493    We use these new primops for both spilling and increment record construction.
494    
495     1. Spilling.
496    
497        This is implemented with a linear scan algorithm (but generalized
498        to trees).  The algorithm will create a single spill record at the
499        beginning of the cps function and use rawupdate to spill to it,
500        and SELECT or SELp to reload from it.  So both spills and reloads
501        are fine-grain operations.  In contrast, in the old algorithm
502        "spills" have to be bundled together in records.
503    
504        Ideally, we should sink the spill record construction to where
505        it is needed.  We can even split the spill record into multiple ones
506        at the places where they are needed.  But CPS is not a good
507        representation for global code motion, so I'll keep it simple and
508        am not attempting this.
509    
510     2. Incremental record construction (aka record splitting).
511    
512        Long records with many component values which are simulatenously live
513        (recall that single use record addresses are not considered to
514         be live) are constructed with rawrecord and rawupdate.
515        We allocate space on the heap with rawrecord first, then gradually
516        fill it in with rawupdate.  This is the technique suggested to me
517        by Matthias.
518    
519        Some restrictions on when this is applicable:
520        1. It is not a VECTOR record.  The code generator currently does not handle
521           this case. VECTOR record uses double indirection like arrays.
522        2. All the record component values are defined in the same "basic block"
523           as the record constructor.  This is to prevent speculative
524           record construction.
525    
526    ----------------------------------------------------------------------
527    Name: Allen Leung
528    Date: 2002/02/22 01:02:00 EST
529    Tag: leunga-20020222-mlrisc-tools
530    
531    Minor bug fixes in the parser and rewriter
532    
533    ----------------------------------------------------------------------
534    Name: Allen Leung
535  Date: 2002/02/21 20:20:00 EST  Date: 2002/02/21 20:20:00 EST
536  Tag: leunga-20020221-peephole  Tag: leunga-20020221-peephole
537    

Legend:
Removed from v.1085  
changed lines
  Added in v.1142

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0