Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/compiler/FLINT/opt/flintopt.txt
ViewVC logotype

Annotation of /sml/trunk/src/compiler/FLINT/opt/flintopt.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 651 - (view) (download)

1 : monnier 626 -*- mode: outline; tab-width: 4 -*-
2 :    
3 :     * Sequentially dependent tests
4 :    
5 :     * Ordering issues
6 :     ** Specialize -> Wrap -> Reify
7 :     Neither Wrap nor Specialize can be done after Reify.
8 :     I don't know whether Specialize works after Wrap, but I wouldn't bet on it
9 :     and in any case, I can't think of a reason why you'd want to do that.
10 :     ** Lambda Split
11 :     Should take place right after a `fixfix' so as to have good inlining
12 :     annotations. Also, Split should be done before Reify and Wrap since
13 :     both slightly change the language.
14 :     ** Loopify
15 :     Loopify does not consider mutually recursive functions, so you want to
16 :     have a `fixfix' before to eliminate apparent mutual recursion.
17 :     Also, to inline small loops, you want to run fixfix afterwards to
18 :     analyze the newly introduced preheaders.
19 :    
20 :     * Current ordering
21 :     ** lcontract
22 :     Because the input code has lot and lots of called-once functions and
23 :     fcontract is much slower at inlining them.
24 :     ** fixfix+fcontract
25 :     Do a first pass of inlining and careful contraction.
26 :     The fixfix also reduces mutual recursion, which later helps loopify.
27 :     ** specialize
28 :     Might as well do it now to get rid of some of those pesky TFN/TAPP.
29 :     ** loopify+fixfix+(split+)fcontract
30 :     Introduce loop headers and do the best job you can before `wrap'.
31 :     Split needs to be done before going through `wrap', so the earlier
32 :     we can do the various optimizations, the better.
33 :     Even more true for `reify' since we want to do optimizations on a
34 :     really type-safe language rather than on a pseudo-typed language such
35 :     as post-reify-FLINT.
36 :     ** wrap+fcontract+reify
37 :     Representation analysis, cleanup and "reification" (introduces runtime
38 :     types: turns TFN into FCT, TAPP into APP, ...)
39 :     ** fcontract+fixfix+fcontract+eta
40 :     We finally got rid of all TFN/TAPP so we can at last inline/uncurry
41 :     polymorphic functions. We also introduce eta-wrappers for escaping
42 :     functions, which is used by closure conversion.
43 :    
44 :     * CPS
45 :     After CPS conversion, we need to clean up the code a little.
46 :     ** last_contract
47 :     `contract' would work as well, since it basically does the same, but
48 :     `first_contract' doesn't cut it because we need to be able to drop unused
49 :     arguments. This is because of functions that always raise an exception and
50 :     hence don't use their continuation argument. This optimization cannot be
51 :     done in the A-normal form and doesn't seem particularly important on
52 :     benchmarks, but it is crucial to avoid blowing up the register allocator
53 :     when compiling ml.grm.sml (from about 150MB to 400MB).
54 :     ** minexpand
55 :     The CPS conversion phase does a good bit more work than just convert
56 :     from A-normal to CPS form. Among other things, it also compiles SWITCHes.
57 :     This can introduce new functions, f.ex. when
58 :    
59 :     SWITCH v
60 :     1 => bla
61 :     3 => bli
62 :     _ => foo
63 :    
64 :     is turned into
65 :    
66 :     def(_) = foo
67 :     if v < 1 then def(0) else
68 :     if v > 2 then def(0) else
69 :     case v
70 :     1 => bla
71 :     2 => def(0)
72 :     3 => bli
73 :    
74 :     Ideally, new functions like `def' should go through the usual
75 :     optimizations, but there really isn't much to optimize there. The only
76 :     useful thing we might want to do is inline them. This is particularly true
77 :     if `def' is a trivial function. `minexpand' hence goes through one pass of
78 :     `expand' with unrolling and loop-preheaders turned off and with a max
79 :     size-increase of 0 (we only inline if the body is as big as the function
80 :     call). Again, this is really only useful to avoid blowing up the register
81 :     allocator when compiling ml.grm.sml (from about 150MB down to 80MB).
82 :    
83 :     * Problems
84 :     The presence of types introduces some difficulties.
85 :     ** LET([...], RAISE ..., lexp)
86 :     Fcontract cannot drop the `lexp' although it is dead. The problem is that
87 :     fcontract would need to know the type of `lexp' to update the type
88 :     annotation on RAISE, but fcontract does not have this info.
89 :     Interestingly, even using `recover' would not provide fcontract with the
90 :     necessary type.
91 :     ** Inlining/Uncurrying in the presence of TFN
92 :     Many small inlinable loops are polymorphic (think of map, fold, ...).
93 :     Flintopt currently cannot inline TFNs which is why we need a `fixfix' phase
94 :     after reify. Sadly, inlining TFNs would not completely solve the problem
95 :     because the inlining heuristic relies on conditional inlining:
96 :    
97 :     TFN(map, FIX([(map, [f], body)], map), ...)
98 :    
99 :     Here `map' might be big enough that we don't want to always inline it.
100 :     Instead we only want to inline it when the `f' argument is a known
101 :     function. But that means that when we see TAPP(map, tic), we can't know
102 :     whether to inline `map' or not until we see the corresponding(s) APP(s).
103 :     Better yet, we don't want to inline at the TAPP site but at the APP site.
104 :    
105 :     So one way to solve this would be to extend fcontract so that it expands
106 :     the TAPP in its environment (allowing inlining at the APP site) but not in
107 :     the code.
108 :    
109 :     Another way is to simply always inline at the TAPP site if the body
110 :     of the TFN is something potentially inlinable. It would introduce
111 :     unnecessary code duplication in some cases, but since TFNs are typically
112 :     only applied a small number of times, it shouldn't be too bad.
113 :     The problem is that there are some very big and never-inlined functions
114 :     that are marked as `potentially inlinable'. My guess is that this should
115 :     work just fine and might be "easily" doable inside `specialize'.
116 :     Of course, ordering issues would then come up: for the small loop to be
117 :     marked as `maybe-inline', we might need to run loopify+fixfix before
118 :     specialize,
119 :     ** WCAST
120 :     As noted earlier, we currently work around the above problem by adding
121 :     a `fixfix' phase after reify. Sadly, this does not always work because
122 :     sometimes reify translates a TFN into:
123 :    
124 :     FIX(f2[] = FN(...))
125 :     f1 = f2[]
126 :     f = PRIMOP(wcast, f1)
127 :    
128 :     Here again, we get stuck because the relationship between `f1' and `f' is
129 :     obfuscated by the WCAST. The knuth-bendix benchmark suffers a more then
130 :     20% slowdown because of this problem.
131 :     Solving the above problem would solve this one as well, but eliminating
132 :     WCAST (which has been planned for some time now) would also help.

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0