Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/cm/Overview
ViewVC logotype

Annotation of /sml/trunk/src/cm/Overview

Parent Directory Parent Directory | Revision Log Revision Log


Revision 348 - (view) (download)

1 : blume 338 OVERVIEW
2 :     --------
3 :    
4 :     The operation of CM can be understood best by looking at its most
5 :     central datastructure: the dependency graph. You can find the
6 :     definitions of its associated types in depend/graph.sml. There is
7 :     also a coarse-grain "group graph" data structure. Its definition is
8 :     in depend/ggraph.sml.
9 :    
10 :     One can roughly divide CM into front-end and back-end. It is the
11 :     front-end's responsibility to establish the dependency graph for a
12 :     given project. The back-end implements various ways of traversing the
13 :     graph, thereby performing the operations that the user expects:
14 :     consistency checking, recompilation, linking, stabilization,
15 :     generation of listings or other statistics, etc.
16 :    
17 :     The central component of the front-end is the parser. It builds the
18 :     dependency graph incrementally with help from the dependency analyzer.
19 :    
20 :     * Analysis CAN be performed incrementally because the sub-graphs that
21 :     correspond to sub-groups or sub-libraries are independent of how they
22 :     are being used.
23 :    
24 : blume 340 * We DO perform analysis incrementally because the parser occasionally
25 :     wants to know what the exported symbols of sub-groups and
26 :     sub-libraries are. (This is required for the parser's conditional
27 :     compilation facility.) While it would probably be possible to achieve
28 :     this using a more cursory analysis, the extra effort of implementing
29 :     it would definitely not be outweighed by any gains.
30 :    
31 : blume 338 The dependency analyzer must inspect the ML source code of the
32 :     project. Within CM, handling of ML source code is centralized -- all
33 :     information pertaining to one ML source file is bundled as an abstract
34 :     data type (SmlInfo.info). You find the definition (and the
35 :     implementation) of that type in smlfile/smlinfo.sml. In particular,
36 :     one important optimization that saves many repeated invocations of
37 :     the compiler's parser is to strip the ML abstract syntax tree from all
38 :     unnecessary (as far as CM is concerned) information and store the
39 :     "compressed" version in some sort of cache. I call such compressed ML
40 :     syntax information a "skeleton". You find the definition of the
41 :     skeleton type in smlfile/skeleton.sml. Associated code is in the same
42 :     directory.
43 :    
44 :     The dependency analyzer operates on skeletons. Its implementation can
45 :     be found in depend/build.sml.
46 :    
47 :    
48 :     PRIVILEGES (access control)
49 :     ---------------------------
50 :    
51 :     The basic mechanisms for access control are implemented: CM can
52 :     correctly detect which "privileges" would be required to access groups
53 :     and libraries. However, nothing has been done to actually enforce
54 :     anything. In other words, everybody is assumed to have every possible
55 :     privilege. CM merely reports which privileges "would have been
56 :     required". For the time being this is not really critical.
57 :     Enforcement must be tied into some form of OS-specific enforcement
58 :     mechanism (such as Unix file permissions or something similar), and I
59 :     haven't really thought out a way of doing this nicely and cleanly.
60 :    
61 :     The basic idea behind CM's "privileges" is quite easy to understand.
62 :     In their description files groups and libraries can specify a list of
63 :     privileges that the user of such a group/library must possess in order
64 :     to be able to use it. Privileges at this level are just names
65 :     (strings). If one group/library imports from another group/library,
66 :     then privileges are being inherited. In effect, to be able to use a
67 :     program, one must have all privileges for all its libraries/groups,
68 :     sub-libraries/groups, sub-sub-libraries/groups, etc.
69 :    
70 :     Of course, this is not yet satisfactory because there should also be
71 :     the possibility of setting up a "safety wall": a library LSafe.cm
72 :     could "wrap" all the unsafe operations in LUnsafe.cm with enough error
73 :     checking that they become safe. Therefore, a user of LSafe.cm should
74 :     not also be required to possess the privileges that would be required
75 :     if one were to use LUnsafe.cm directly.
76 :    
77 :     To this end, in CM's model of privileges it is possible for a
78 :     group/library to "wrap" privileges. If a privilege P is wrapped, then
79 :     the user of the library does not need to have privilege P even though
80 :     the library is using another library that requires privilege P. In
81 :     essence, the library acts as a "proxy" who provides the necessary
82 :     privilege P to the sub-library.
83 :    
84 :     Of course, not everybody can be allowed to establish a library with
85 :     such a "wrapped" privilege P. The programmer who does that should at
86 :     least herself have privilege P (but perhaps better, she should have
87 :     "permission to wrap P" -- a stronger requirement).
88 :    
89 :     In CM, wrapping a privilege is done by specifying the name of that
90 :     privilege within parenthesis. The wrapping becomes active once the
91 :     library gets "stabilized" (see below). The (not yet implemented)
92 :     enforcement mechanism must ensure that anyone who stabilizes a library
93 :     that wraps P has permission to wrap P.
94 :    
95 :    
96 :     STABILIZATION
97 :     -------------
98 :    
99 :     Aside from the issues concerning privileges, stabilization is a way of
100 :     putting an entire pre-compiled library -- together with its
101 :     pre-computed dependency graph -- into one single container. Once this
102 :     is done, CM will never need to have access to the original ML source
103 : blume 348 code. Before actually consulting the description file for a library,
104 :     the parser will always check and see if there is a stable container.
105 :     If so, it will suck the dependency graph out of the container and be
106 :     done.
107 : blume 338
108 :     Because of ML's "open" feature, it sometimes is necessary for the
109 :     dependency analyzer of a group to consult the contents (i.e., the
110 :     definitions) of signatures, structures, or functors that are imported
111 : blume 348 from sub-libraries. Since the pre-computed dependency graph does not
112 :     contain such information, it will then become necessary to recover it
113 :     in a different way.
114 : blume 338
115 :     Remember, the ML source code shouldn't have to be available at this
116 :     point. However, the same information is contained in the static
117 :     environment that is stored in every "binfile". (The binfile is the
118 :     result of compiling one ML source file. It contains executable code
119 :     and a pickled representation of the static environment that is
120 :     exported from the compilation unit.) Aside from the dependency graph,
121 : blume 348 the container for a stabilized library also stores all the associated
122 :     binfiles.
123 : blume 338
124 :     Loading (stable) binfiles for the purpose of dependency analysis is
125 :     sometimes necessary, but since it is expensive we do it as seldom as
126 :     we can (i.e., lazily). The implementation of this mechanism (which is
127 :     really just a hook into the actual implementation provided by
128 :     GenericVC) is in depend/se2dae.sml. (See the comments there.) It is
129 :     used in stable/stabilize.sml. (Look for "cvtMemo"!)
130 :    
131 : blume 348 Information pertaining to members of stabilized libraries is managed
132 :     by the abstract datatype BinInfo.info (see stable/bininfo.sml). In
133 :     some sense, BinInfo.info is to stabilized ML code what SmlInfo.info is
134 :     to not-yet-stabilized ML code.
135 : blume 338
136 : blume 348 By the way, only libraries can be stabilized. A stabilized library
137 :     will encompass its own sources as well as the sources of sub-groups
138 :     (and their sub-groups, and so on). Sub-libraries of the library, on
139 :     the other hand, will be referred to symbolically (they do not get
140 :     "sucked" in like groups do). In effect, sub-grouping of a library
141 :     becomes convenient for resolving name-spacing issues without
142 :     compromising the "one single container" paradigm of stable libraries.
143 : blume 338
144 : blume 348
145 : blume 338 DEPENDENCY GRAPH
146 :     ----------------
147 :    
148 : blume 348 The division into non-stabilized and stabilized libraries is clearly
149 :     visible in the definition of the types that make up dependency graphs.
150 :     There are "BNODE"s that mention BinInfo.info and there are "SNODE"s
151 :     that mention SmlInfo.info. (There are also "PNODE"s that facilitate
152 :     access to "primitive" internal environments that have to do with
153 :     bootstrapping.)
154 : blume 338
155 :     You will notice that one can never go from a BNODE to an SNODE. This
156 : blume 348 mirrors our intention that a sub-library of a stabilized library must
157 :     also be stabilized. From SNODEs, on the other hand, you can either go
158 :     to other SNODEs or to BNODEs. All the "localimports" of an SNODE
159 :     (i.e., the imports that come from the same group/library) are also
160 :     SNODEs. To go to a BNODE one must look into the list of
161 :     "globalimport"s. Global imports refer to "far" nodes -- nodes that
162 :     are within other groups/libraries. The edge that goes to such a node
163 :     can have an export filter attached. Therefore, a farbnode is a bnode
164 :     with an optional filter, a farsbnode is either a BNODE or an SNODE
165 :     with an optional filter attached.
166 : blume 338
167 : blume 348 Imports and exports of a group/library are represented by "impexp"s.
168 :     Impexps are essentially just farsbnodes, but they also contain the
169 :     dependency analyzers "analysis environment" which contains information
170 :     about the actual definition (contents) of exported
171 :     structures/functors. As said earlier, this is necessary to handle the
172 :     "open" construct of ML.
173 : blume 338
174 : blume 348 The exports of a group/library are then simply represented by a
175 :     mapping from exported symbols to corresponding impexps. (See
176 :     depend/ggraph.sml.)
177 : blume 338
178 :    
179 :     RECOMPILATION AND EXECUTION
180 :     ---------------------------
181 :    
182 :     There is a generic traversal routine that is used to implement both
183 :     recompilation traversals and execution (link-) traversals
184 :     (compile/generic.sml). The decision of which kind of traversal is
185 :     implemented comes from the functor argument: the "compilation type".
186 :     A signature describing compilation types abstractly is in
187 :     compile/compile-type.sml. In essence, it provides compilation
188 :     environments and associated operations abstractly.
189 :    
190 :     Concrete instantiations of this signature are in compile/recomp.sml
191 :     and in compile/exec.sml. As you will see, these are also implemented
192 :     as functors parameterized by an abstraction of "persistent state".
193 :     Persistent state is used to remember the results of traversals from
194 :     invocation to invocation of CM. This avoids needless recompilation in
195 :     the case of recomp.sml and facilitates sharing of dynamic values in
196 :     the case of exec.sml. (However, the two cases are otherwise quite
197 :     dissimilar.)
198 :    
199 :     Persistent state comes in two varieties: "recomp" and "full". Full
200 :     state is actually an extension of recomp state and can also be used
201 :     where recomp state is expected. The "normal" CM uses full state
202 :     because it implements both recompilation and execution. The same
203 :     state is passed to both ExecFn and RecompFn, so it will be properly
204 :     shared by recompilation and execution traversals. In the case of the
205 :     bootstrap compiler, however, we never actually execute the code that
206 :     comes out of the compiler. (The code will be executed by the runtime
207 :     system when bootstrapping.) Therefore, for the bootstrap compiler we
208 :     don't use full state but simply recomp state. (If we cross-compile
209 :     for a different architecture we could not possibly execute the code
210 :     anyway.)

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0