Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/cm/Overview
ViewVC logotype

Annotation of /sml/trunk/src/cm/Overview

Parent Directory Parent Directory | Revision Log Revision Log


Revision 340 - (view) (download)

1 : blume 338 OVERVIEW
2 :     --------
3 :    
4 :     The operation of CM can be understood best by looking at its most
5 :     central datastructure: the dependency graph. You can find the
6 :     definitions of its associated types in depend/graph.sml. There is
7 :     also a coarse-grain "group graph" data structure. Its definition is
8 :     in depend/ggraph.sml.
9 :    
10 :     One can roughly divide CM into front-end and back-end. It is the
11 :     front-end's responsibility to establish the dependency graph for a
12 :     given project. The back-end implements various ways of traversing the
13 :     graph, thereby performing the operations that the user expects:
14 :     consistency checking, recompilation, linking, stabilization,
15 :     generation of listings or other statistics, etc.
16 :    
17 :     The central component of the front-end is the parser. It builds the
18 :     dependency graph incrementally with help from the dependency analyzer.
19 :    
20 :     * Analysis CAN be performed incrementally because the sub-graphs that
21 :     correspond to sub-groups or sub-libraries are independent of how they
22 :     are being used.
23 :    
24 : blume 340 * We DO perform analysis incrementally because the parser occasionally
25 :     wants to know what the exported symbols of sub-groups and
26 :     sub-libraries are. (This is required for the parser's conditional
27 :     compilation facility.) While it would probably be possible to achieve
28 :     this using a more cursory analysis, the extra effort of implementing
29 :     it would definitely not be outweighed by any gains.
30 :    
31 : blume 338 The dependency analyzer must inspect the ML source code of the
32 :     project. Within CM, handling of ML source code is centralized -- all
33 :     information pertaining to one ML source file is bundled as an abstract
34 :     data type (SmlInfo.info). You find the definition (and the
35 :     implementation) of that type in smlfile/smlinfo.sml. In particular,
36 :     one important optimization that saves many repeated invocations of
37 :     the compiler's parser is to strip the ML abstract syntax tree from all
38 :     unnecessary (as far as CM is concerned) information and store the
39 :     "compressed" version in some sort of cache. I call such compressed ML
40 :     syntax information a "skeleton". You find the definition of the
41 :     skeleton type in smlfile/skeleton.sml. Associated code is in the same
42 :     directory.
43 :    
44 :     The dependency analyzer operates on skeletons. Its implementation can
45 :     be found in depend/build.sml.
46 :    
47 :    
48 :     PRIVILEGES (access control)
49 :     ---------------------------
50 :    
51 :     The basic mechanisms for access control are implemented: CM can
52 :     correctly detect which "privileges" would be required to access groups
53 :     and libraries. However, nothing has been done to actually enforce
54 :     anything. In other words, everybody is assumed to have every possible
55 :     privilege. CM merely reports which privileges "would have been
56 :     required". For the time being this is not really critical.
57 :     Enforcement must be tied into some form of OS-specific enforcement
58 :     mechanism (such as Unix file permissions or something similar), and I
59 :     haven't really thought out a way of doing this nicely and cleanly.
60 :    
61 :     The basic idea behind CM's "privileges" is quite easy to understand.
62 :     In their description files groups and libraries can specify a list of
63 :     privileges that the user of such a group/library must possess in order
64 :     to be able to use it. Privileges at this level are just names
65 :     (strings). If one group/library imports from another group/library,
66 :     then privileges are being inherited. In effect, to be able to use a
67 :     program, one must have all privileges for all its libraries/groups,
68 :     sub-libraries/groups, sub-sub-libraries/groups, etc.
69 :    
70 :     Of course, this is not yet satisfactory because there should also be
71 :     the possibility of setting up a "safety wall": a library LSafe.cm
72 :     could "wrap" all the unsafe operations in LUnsafe.cm with enough error
73 :     checking that they become safe. Therefore, a user of LSafe.cm should
74 :     not also be required to possess the privileges that would be required
75 :     if one were to use LUnsafe.cm directly.
76 :    
77 :     To this end, in CM's model of privileges it is possible for a
78 :     group/library to "wrap" privileges. If a privilege P is wrapped, then
79 :     the user of the library does not need to have privilege P even though
80 :     the library is using another library that requires privilege P. In
81 :     essence, the library acts as a "proxy" who provides the necessary
82 :     privilege P to the sub-library.
83 :    
84 :     Of course, not everybody can be allowed to establish a library with
85 :     such a "wrapped" privilege P. The programmer who does that should at
86 :     least herself have privilege P (but perhaps better, she should have
87 :     "permission to wrap P" -- a stronger requirement).
88 :    
89 :     In CM, wrapping a privilege is done by specifying the name of that
90 :     privilege within parenthesis. The wrapping becomes active once the
91 :     library gets "stabilized" (see below). The (not yet implemented)
92 :     enforcement mechanism must ensure that anyone who stabilizes a library
93 :     that wraps P has permission to wrap P.
94 :     (In CM's source code and comments, "wrapped" privileges are referred
95 :     to as "granted" privileges -- which doesn't quite seem to capture the
96 :     actual meaning.)
97 :    
98 :    
99 :     STABILIZATION
100 :     -------------
101 :    
102 :     Aside from the issues concerning privileges, stabilization is a way of
103 :     putting an entire pre-compiled library -- together with its
104 :     pre-computed dependency graph -- into one single container. Once this
105 :     is done, CM will never need to have access to the original ML source
106 :     code. Before actually consulting the description file for a
107 :     group/library, the parser will always check and see if there is a
108 :     stable container. If so, it will suck the dependency graph out of the
109 :     container and be done.
110 :    
111 :     Because of ML's "open" feature, it sometimes is necessary for the
112 :     dependency analyzer of a group to consult the contents (i.e., the
113 :     definitions) of signatures, structures, or functors that are imported
114 :     from sub-groups/libraries. Since the pre-computed dependency graph
115 :     does not contain such information, it will then become necessary to
116 :     recover it in a different way.
117 :    
118 :     Remember, the ML source code shouldn't have to be available at this
119 :     point. However, the same information is contained in the static
120 :     environment that is stored in every "binfile". (The binfile is the
121 :     result of compiling one ML source file. It contains executable code
122 :     and a pickled representation of the static environment that is
123 :     exported from the compilation unit.) Aside from the dependency graph,
124 :     the container for a stabilized group/library also stores all the
125 :     associated binfiles.
126 :    
127 :     Loading (stable) binfiles for the purpose of dependency analysis is
128 :     sometimes necessary, but since it is expensive we do it as seldom as
129 :     we can (i.e., lazily). The implementation of this mechanism (which is
130 :     really just a hook into the actual implementation provided by
131 :     GenericVC) is in depend/se2dae.sml. (See the comments there.) It is
132 :     used in stable/stabilize.sml. (Look for "cvtMemo"!)
133 :    
134 :     Information pertaining to members of stabilized groups/libraries is
135 :     managed by the abstract datatype BinInfo.info (see
136 :     stable/bininfo.sml). In some sense, BinInfo.info is to stabilized ML
137 :     code what SmlInfo.info is to not-yet-stabilized ML code.
138 :    
139 :    
140 :     DEPENDENCY GRAPH
141 :     ----------------
142 :    
143 :     The division into non-stabilized and stabilized groups/libraries is
144 :     clearly visible in the definition of the types that make up dependency
145 :     graphs. There are "BNODE"s that mention BinInfo.info and there are
146 :     "SNODE"s that mention SmlInfo.info. (There are also "PNODE"s that
147 :     facilitate access to "primitive" internal environments that have to do
148 :     with bootstrapping.)
149 :    
150 :     You will notice that one can never go from a BNODE to an SNODE. This
151 :     mirrors our intention that a subgroup of a stabilized group must also
152 :     be stabilized. From SNODEs, on the other hand, you can either go to
153 :     other SNODEs or to BNODEs. All the "localimports" of an SNODE (i.e.,
154 :     the imports that come from the same group) are also SNODEs. To go to
155 :     a BNODE one must look into the list of "globalimport"s. Global
156 :     imports refer to "far" nodes -- nodes that are within other groups.
157 :     The edge that goes to such a node can have an export filter attached.
158 :     Therefore, a farbnode is a bnode with an optional filter, a farsbnode
159 :     is either a BNODE or an SNODE with an optional filter attached.
160 :    
161 :     Imports and exports of a group are represented by "impexp"s. Impexps
162 :     are essentially just farsbnodes, but they also contain the dependency
163 :     analyzers "analysis environment" which contains information about the
164 :     actual definition (contents) of exported structures/functors. As said
165 :     earlier, this is necessary to handle the "open" construct of ML.
166 :    
167 :     The exports of a group are then simply a mapping from exported symbols
168 :     to corresponding impexps. (See depend/ggraph.sml.)
169 :    
170 :    
171 :     RECOMPILATION AND EXECUTION
172 :     ---------------------------
173 :    
174 :     There is a generic traversal routine that is used to implement both
175 :     recompilation traversals and execution (link-) traversals
176 :     (compile/generic.sml). The decision of which kind of traversal is
177 :     implemented comes from the functor argument: the "compilation type".
178 :     A signature describing compilation types abstractly is in
179 :     compile/compile-type.sml. In essence, it provides compilation
180 :     environments and associated operations abstractly.
181 :    
182 :     Concrete instantiations of this signature are in compile/recomp.sml
183 :     and in compile/exec.sml. As you will see, these are also implemented
184 :     as functors parameterized by an abstraction of "persistent state".
185 :     Persistent state is used to remember the results of traversals from
186 :     invocation to invocation of CM. This avoids needless recompilation in
187 :     the case of recomp.sml and facilitates sharing of dynamic values in
188 :     the case of exec.sml. (However, the two cases are otherwise quite
189 :     dissimilar.)
190 :    
191 :     Persistent state comes in two varieties: "recomp" and "full". Full
192 :     state is actually an extension of recomp state and can also be used
193 :     where recomp state is expected. The "normal" CM uses full state
194 :     because it implements both recompilation and execution. The same
195 :     state is passed to both ExecFn and RecompFn, so it will be properly
196 :     shared by recompilation and execution traversals. In the case of the
197 :     bootstrap compiler, however, we never actually execute the code that
198 :     comes out of the compiler. (The code will be executed by the runtime
199 :     system when bootstrapping.) Therefore, for the bootstrap compiler we
200 :     don't use full state but simply recomp state. (If we cross-compile
201 :     for a different architecture we could not possibly execute the code
202 :     anyway.)

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0