SCM Repository
Annotation of /sml/trunk/src/compiler/FLINT/opt/flintopt.txt
Parent Directory
|
Revision Log
Revision 626 - (view) (download)
1 : | monnier | 626 | -*- mode: outline; tab-width: 4 -*- |
2 : | |||
3 : | * Sequentially dependent tests | ||
4 : | |||
5 : | * Ordering issues | ||
6 : | ** Specialize -> Wrap -> Reify | ||
7 : | Neither Wrap nor Specialize can be done after Reify. | ||
8 : | I don't know whether Specialize works after Wrap, but I wouldn't bet on it | ||
9 : | and in any case, I can't think of a reason why you'd want to do that. | ||
10 : | ** Lambda Split | ||
11 : | Should take place right after a `fixfix' so as to have good inlining | ||
12 : | annotations. Also, Split should be done before Reify and Wrap since | ||
13 : | both slightly change the language. | ||
14 : | ** Loopify | ||
15 : | Loopify does not consider mutually recursive functions, so you want to | ||
16 : | have a `fixfix' before to eliminate apparent mutual recursion. | ||
17 : | Also, to inline small loops, you want to run fixfix afterwards to | ||
18 : | analyze the newly introduced preheaders. | ||
19 : | |||
20 : | * Current ordering | ||
21 : | ** lcontract | ||
22 : | Because the input code has lot and lots of called-once functions and | ||
23 : | fcontract is much slower at inlining them. | ||
24 : | ** fixfix+fcontract | ||
25 : | Do a first pass of inlining and careful contraction. | ||
26 : | The fixfix also reduces mutual recursion, which later helps loopify. | ||
27 : | ** specialize | ||
28 : | Might as well do it now to get rid of some of those pesky TFN/TAPP. | ||
29 : | ** loopify+fixfix+(split+)fcontract | ||
30 : | Introduce loop headers and do the best job you can before `wrap'. | ||
31 : | Split needs to be done before going through `wrap', so the earlier | ||
32 : | we can do the various optimizations, the better. | ||
33 : | Even more true for `reify' since we want to do optimizations on a | ||
34 : | really type-safe language rather than on a pseudo-typed language such | ||
35 : | as post-reify-FLINT. | ||
36 : | ** wrap+fcontract+reify | ||
37 : | Representation analysis, cleanup and "reification" (introduces runtime | ||
38 : | types: turns TFN into FCT, TAPP into APP, ...) | ||
39 : | ** fcontract+fixfix+fcontract+eta | ||
40 : | We finally got rid of all TFN/TAPP so we can at last inline/uncurry | ||
41 : | polymorphic functions. We also introduce eta-wrappers for escaping | ||
42 : | functions, which is used by closure conversion. | ||
43 : | |||
44 : | * CPS | ||
45 : | After CPS conversion, we need to clean up the code a little. | ||
46 : | ** last_contract | ||
47 : | `contract' would work as well, since it basically does the same, but | ||
48 : | `first_contract' doesn't cut it because we need to be able to drop unused | ||
49 : | arguments. This is because of functions that always raise an exception and | ||
50 : | hence don't use their continuation argument. This optimization cannot be | ||
51 : | done in the A-normal form and doesn't seem particularly important on | ||
52 : | benchmarks, but it is crucial to avoid blowing up the register allocator | ||
53 : | when compiling ml.grm.sml (from about 150MB to 400MB). | ||
54 : | ** minexpand | ||
55 : | The CPS conversion phase does a good bit more work than just convert | ||
56 : | from A-normal to CPS form. Among other things, it also compiles SWITCHes. | ||
57 : | This can introduce new functions, f.ex. when | ||
58 : | |||
59 : | SWITCH v | ||
60 : | 1 => bla | ||
61 : | 3 => bli | ||
62 : | _ => foo | ||
63 : | |||
64 : | is turned into | ||
65 : | |||
66 : | def(_) = foo | ||
67 : | if v < 1 then def(0) else | ||
68 : | if v > 2 then def(0) else | ||
69 : | case v | ||
70 : | 1 => bla | ||
71 : | 2 => def(0) | ||
72 : | 3 => bli | ||
73 : | |||
74 : | Ideally, new functions like `def' should go through the usual | ||
75 : | optimizations, but there really isn't much to optimize there. The only | ||
76 : | useful thing we might want to do is inline them. This is particularly true | ||
77 : | if `def' is a trivial function. `minexpand' hence goes through one pass of | ||
78 : | `expand' with unrolling and loop-preheaders turned off and with a max | ||
79 : | size-increase of 0 (we only inline if the body is as big as the function | ||
80 : | call). Again, this is really only useful to avoid blowing up the register | ||
81 : | allocator when compiling ml.grm.sml (from about 150MB down to 80MB). | ||
82 : | |||
83 : | * Problems | ||
84 : | The presence of types introduces some difficulties. | ||
85 : | ** LET([...], RAISE ..., lexp) | ||
86 : | Fcontract cannot drop the `lexp' although it is dead. The problem is that | ||
87 : | fcontract would need to know the type of `lexp' to update the type | ||
88 : | annotation on RAISE, but fcontract does not have this info. | ||
89 : | Interestingly, even using `recover' would not provide fcontract with the | ||
90 : | necessary type. | ||
91 : | ** Inlining/Uncurrying in the presence of TFN | ||
92 : | Many small inlinable loops are polymorphic (think of map, fold, ...). | ||
93 : | Flintopt currently cannot inline TFNs which is why we need a `fixfix' phase | ||
94 : | after reify. Sadly, inlining TFNs would not completely solve the problem | ||
95 : | because the inlining heuristic relies on conditional inlining: | ||
96 : | |||
97 : | TFN(map, FIX([(map, [f], body)], map), ...) | ||
98 : | |||
99 : | Here `map' might be big enough that we don't want to always inline it. | ||
100 : | Instead we only want to inline it when the `f' argument is a known | ||
101 : | function. But that means that when we see TAPP(map, tic), we can't know | ||
102 : | whether to inline `map' or not until we see the corresponding(s) APP(s). | ||
103 : | Better yet, we don't want to inline at the TAPP site but at the APP site. | ||
104 : | |||
105 : | So one way to solve this would be to extend fcontract so that it expands | ||
106 : | the TAPP in its environment (allowing inlining at the APP site) but not in | ||
107 : | the code. | ||
108 : | |||
109 : | Another way is to simply always inline at the TAPP site if the body | ||
110 : | of the TFN is something potentially inlinable. It would introduce | ||
111 : | unnecessary code duplication in some cases, but since TFNs are typically | ||
112 : | only applied a small number of times, it shouldn't be too bad. | ||
113 : | The problem is that there are some very big and never-inlined functions | ||
114 : | that are marked as `potentially inlinable'. My guess is that this should | ||
115 : | work just fine and might be "easily" doable inside `specialize'. | ||
116 : | Of course, ordering issues would then come up: for the small loop to be | ||
117 : | marked as `maybe-inline', we might need to run loopify+fixfix before | ||
118 : | specialize, | ||
119 : | ** WCAST | ||
120 : | As noted earlier, we currently work around the above problem by adding | ||
121 : | a `fixfix' phase after reify. Sadly, this does not always work because | ||
122 : | sometimes reify translates a TFN into: | ||
123 : | |||
124 : | FIX(f2[] = FN(...)) | ||
125 : | f1 = f2[] | ||
126 : | f = PRIMOP(wcast, f1) | ||
127 : | |||
128 : | Here again, we get stuck because the relationship between `f1' and `f' is | ||
129 : | obfuscated by the WCAST. The knuth-bendix benchmark suffers a more then | ||
130 : | 20% slowdown because of this problem. | ||
131 : | Solving the above problem would solve this one as well, but eliminating | ||
132 : | WCAST (which has been planned for some time now) would also help. |
root@smlnj-gforge.cs.uchicago.edu | ViewVC Help |
Powered by ViewVC 1.0.0 |