Home My Page Projects Code Snippets Project Openings SML/NJ
Summary Activity Forums Tracker Lists Tasks Docs Surveys News SCM Files

SCM Repository

[smlnj] Annotation of /sml/trunk/src/compiler/Execution/binfile/binfile.sml
ViewVC logotype

Annotation of /sml/trunk/src/compiler/Execution/binfile/binfile.sml

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1045 - (view) (download)

1 : blume 902 (* binfile-new.sml
2 :     *
3 :     * (C) 2001 Lucent Technologies, Bell Labs
4 :     *
5 :     * author: Matthias Blume (blume@research.bell-labs.com
6 :     *)
7 :    
8 :     (*
9 :     * This revised version of structure Binfile is now machine-independent.
10 :     * Moreover, it deals with the file format only and does not know how to
11 :     * create new binfile contents (aka "compile") or how to interpret the
12 :     * pickles. As a result, it does not statically depend on the compiler.
13 :     * (Eventually we might want to support a light-weight binfile loader.)
14 :     *
15 :     * ----------------------------------------------------------------------------
16 :     * BINFILE FORMAT description:
17 :     *
18 :     * Every 4-byte integer field is stored in big-endian format.
19 :     *
20 :     * Start Size Purpose
21 :     * ----BEGIN OF HEADER----
22 :     * 0 16 magic string
23 :     * 16 4 number of import values (importCnt)
24 :     * 20 4 number of exports (exportCnt = currently always 0 or 1)
25 :     * 24 4 size of import tree area in bytes (importSzB)
26 :     * 28 4 size of CM-specific info in bytes (cmInfoSzB)
27 :     * 32 4 size of pickled lambda-expression in bytes (lambdaSzB)
28 :     * 36 4 size of reserved area in bytes (reserved)
29 :     * 40 4 size of padding in bytes (pad)
30 :     * 44 4 size of code area in bytes (codeSzB)
31 :     * 48 4 size of pickled environment in bytes (envSzB)
32 :     * 52 i import trees [This area contains pickled import trees --
33 :     * see below. The total number of leaves in these trees is
34 :     * importCnt and the size is equal to importSzB.]
35 :     * i+52 ex export pids [Each export pid occupies 16 bytes. Thus, the
36 :     * size ex of this area is 16*exportCnt (0 or 16).]
37 :     * ex+i+52 cm CM info [Currently a list of pid-pairs.] (cm = cmInfoSzB)
38 :     * ----END OF HEADER----
39 :     * 0 h HEADER (h = 52+cm+ex+i)
40 :     * h l pickle of exported lambda-expr. (l = lambdaSzB)
41 :     * l+h r reserved area (r = reserved)
42 :     * r+l+h p padding (p = pad)
43 :     * p+r+l+h c code area (c = codeSzB) [Structured into several
44 :     * segments -- see below.]
45 :     * c+p+r+l+h e pickle of static environment (e = envSzB)
46 :     * e+c+p+r+l+h - END OF BINFILE
47 :     *
48 :     * IMPORT TREE FORMAT description:
49 :     *
50 :     * The import tree area contains a list of (pid * tree) pairs.
51 :     * The pids are stored directly as 16-byte strings.
52 :     * Trees are constructed according to the following ML-datatype:
53 :     * datatype tree = NODE of (int * tree) list
54 :     * Leaves in this tree have the form (NODE []).
55 :     * Trees are written recursively -- (NODE l) is represented by n (= the
56 :     * length of l) followed by n (int * node) subcomponents. Each component
57 :     * consists of the integer selector followed by the corresponding tree.
58 :     *
59 :     * Integer values in the import tree area (lengths and selectors) are
60 :     * written in "packed" integer format. In particular, this means that
61 :     * Values in the range 0..127 are represented by only 1 byte.
62 :     * Conceptually, the following pickling routine is used:
63 :     *
64 :     * void recur_write_ul (unsigned long l, FILE *file)
65 :     * {
66 :     * if (l != 0) {
67 :     * recur_write_ul (l >> 7, file);
68 :     * putc ((l & 0x7f) | 0x80, file);
69 :     * }
70 :     * }
71 :     *
72 :     * void write_ul (unsigned long l, FILE *file)
73 :     * {
74 :     * recur_write_ul (l >> 7, file);
75 :     * putc (l & 0x7f, file);
76 :     * }
77 :     *
78 :     * CODE AREA FORMAT description:
79 :     *
80 :     * The code area contains multiple code segements. There will be at least
81 :     * two. The very first segment is the "data" segment -- responsible for
82 :     * creating literal constants on the heap. The idea is that code in the
83 :     * data segment will be executed only once at link-time. Thus, it can
84 : blume 1045 * then be garbage-collected immediatly. (In fact, the data segment does
85 :     * not consist of machine code but of code for an internal bytecode engine.)
86 : blume 902 *
87 :     * In the binfile, each code segment is represented by its size s (in
88 :     * bytes -- written as a 4-byte big-endian integer) followed by s bytes of
89 :     * machine- (or byte-) code. The total length of all code segments
90 :     * (including the bytes spent on representing individual sizes) is codeSzB.
91 :     *
92 :     * LINKING CONVENTIONS:
93 :     *
94 :     * Linking is achieved by executing all code segments in sequential order.
95 :     *
96 : blume 1045 * Conceptually, the first code segment (i.e., the "data" segment) receives
97 :     * unit as its single argument. (In reality this code segment consists of
98 :     * bytecode which does not really receive any arguments.)
99 : blume 902 *
100 :     * The second code segment receives a record as its single argument.
101 :     * This record has (importCnt+1) components. The first importCnt
102 :     * components correspond to the leaves of the import trees. The final
103 :     * component is the result from executing the data segment.
104 :     *
105 :     * All other code segments receive a single argument which is the result
106 :     * of the preceding segment.
107 :     *
108 :     * The result of the last segment represents the exports of the compilation
109 :     * unit. It is to be paired up with the export pid and stored in the
110 :     * dynamic environment. If there is no export pid, then the final result
111 :     * will be thrown away.
112 :     *
113 :     * The import trees are used for constructing the argument record for the
114 :     * second code segment. The pid at the root of each tree is the key for
115 :     * looking up a value in the existing dynamic environment. In general,
116 :     * that value will be a record. The selector fields of the import tree
117 :     * associated with the pid are used to recursively fetch components of that
118 :     * record.
119 :     *)
120 :     structure Binfile :> BINFILE = struct
121 :    
122 :     structure Pid = PersStamps
123 :    
124 :     exception FormatError = CodeObj.FormatError
125 :    
126 :     type pid = Pid.persstamp
127 :    
128 :     type csegments = CodeObj.csegments
129 :    
130 :     type executable = CodeObj.executable
131 :    
132 :     type stats = { env: int, inlinfo: int, data: int, code: int }
133 :    
134 :     type pickle = { pid: pid, pickle: Word8Vector.vector }
135 :    
136 :     datatype bfContents =
137 :     BF of { imports: ImportTree.import list,
138 :     exportPid: pid option,
139 :     cmData: pid list,
140 :     senv: pickle,
141 :     lambda: pickle,
142 :     csegments: csegments,
143 :     executable: executable option ref }
144 :     fun unBF (BF x) = x
145 :    
146 :     val bytesPerPid = Pid.persStampSize
147 :     val magicBytes = 16
148 :    
149 :     val exportPidOf = #exportPid o unBF
150 :     val cmDataOf = #cmData o unBF
151 :     val senvPickleOf = #senv o unBF
152 :     val staticPidOf = #pid o senvPickleOf
153 :     val lambdaPickleOf = #lambda o unBF
154 :     val lambdaPidOf = #pid o lambdaPickleOf
155 :    
156 :     fun error msg =
157 :     (Control_Print.say (concat ["binfile format error: ", msg, "\n"]);
158 :     raise FormatError)
159 :    
160 :     val fromInt = Word32.fromInt
161 :     val fromByte = Word32.fromLargeWord o Word8.toLargeWord
162 :     val toByte = Word8.fromLargeWord o Word32.toLargeWord
163 :     val >> = Word32.>>
164 :     infix >>
165 :    
166 :     fun bytesIn (s, 0) = Byte.stringToBytes ""
167 :     | bytesIn (s, n) = let
168 :     val bv = BinIO.inputN (s, n)
169 :     in
170 :     if n = Word8Vector.length bv then bv
171 :     else error (concat["expected ", Int.toString n,
172 :     " bytes, but found ",
173 :     Int.toString(Word8Vector.length bv)])
174 :     end
175 :    
176 :     fun readInt32 s = LargeWord.toIntX(Pack32Big.subVec(bytesIn(s, 4), 0))
177 :    
178 :     fun readPackedInt32 s = let
179 :     fun loop n =
180 :     case BinIO.input1 s of
181 :     NONE => error "unable to read a packed int32"
182 :     | SOME w8 => let
183 :     val n' =
184 :     n * 0w128
185 :     + Word8.toLargeWord (Word8.andb (w8, 0w127))
186 :     in
187 :     if Word8.andb (w8, 0w128) = 0w0 then n' else loop n'
188 :     end
189 :     in
190 :     LargeWord.toIntX (loop 0w0)
191 :     end
192 :    
193 :     fun readPid s = Pid.fromBytes (bytesIn (s, bytesPerPid))
194 :     fun readPidList (s, n) = List.tabulate (n, fn _ => readPid s)
195 :    
196 :     fun readImportTree s =
197 :     case readPackedInt32 s of
198 :     0 => (ImportTree.ITNODE [], 1)
199 :     | cnt => let
200 :     fun readImportList 0 = ([], 0)
201 :     | readImportList cnt = let
202 :     val selector = readPackedInt32 s
203 :     val (tree, n) = readImportTree s
204 :     val (rest, n') = readImportList (cnt - 1)
205 :     in
206 :     ((selector, tree) :: rest, n + n')
207 :     end
208 :     val (l, n) = readImportList cnt
209 :     in
210 :     (ImportTree.ITNODE l, n)
211 :     end
212 :    
213 :     fun readImports (s, n) =
214 :     if n <= 0 then []
215 :     else let
216 :     val pid = readPid s
217 :     val (tree, n') = readImportTree s
218 :     val rest = readImports (s, n - n')
219 :     in
220 :     (pid, tree) :: rest
221 :     end
222 :    
223 :     fun pickleInt32 i = let
224 :     val w = fromInt i
225 :     fun out w = toByte w
226 :     in
227 :     Word8Vector.fromList [toByte (w >> 0w24), toByte (w >> 0w16),
228 :     toByte (w >> 0w8), toByte w]
229 :     end
230 :     fun writeInt32 s i = BinIO.output (s, pickleInt32 i)
231 :    
232 :     fun picklePackedInt32 i = let
233 :     val n = fromInt i
234 :     val // = LargeWord.div
235 :     val %% = LargeWord.mod
236 :     val !! = LargeWord.orb
237 :     infix // %% !!
238 :     val toW8 = Word8.fromLargeWord
239 :     fun r (0w0, l) = Word8Vector.fromList l
240 :     | r (n, l) = r (n // 0w128, toW8 ((n %% 0w128) !! 0w128) :: l)
241 :     in
242 :     r (n // 0w128, [toW8 (n %% 0w128)])
243 :     end
244 :    
245 :     fun writePid (s, pid) = BinIO.output (s, Pid.toBytes pid)
246 :     fun writePidList (s, l) = app (fn p => writePid (s, p)) l
247 :    
248 :     local
249 :     fun pickleImportSpec ((selector, tree), (n, p)) = let
250 :     val sp = picklePackedInt32 selector
251 :     val (n', p') = pickleImportTree (tree, (n, p))
252 :     in
253 :     (n', sp :: p')
254 :     end
255 :     and pickleImportTree (ImportTree.ITNODE [], (n, p)) =
256 :     (n + 1, picklePackedInt32 0 :: p)
257 :     | pickleImportTree (ImportTree.ITNODE l, (n, p)) = let
258 :     val (n', p') = foldr pickleImportSpec (n, p) l
259 :     in
260 :     (n', picklePackedInt32 (length l) :: p')
261 :     end
262 :    
263 :     fun pickleImport ((pid, tree), (n, p)) = let
264 :     val (n', p') = pickleImportTree (tree, (n, p))
265 :     in
266 :     (n', Pid.toBytes pid :: p')
267 :     end
268 :     in
269 :     fun pickleImports l = let
270 :     val (n, p) = foldr pickleImport (0, []) l
271 :     in
272 :     (n, Word8Vector.concat p)
273 :     end
274 :     end
275 :    
276 :     fun mkMAGIC (arch, version_id) = let
277 :     val vbytes = 8 (* version part *)
278 :     val abytes = magicBytes - vbytes - 1 (* arch part *)
279 :     fun fit (i, s) = let
280 :     val s = StringCvt.padRight #" " i s
281 :     in
282 :     if (size s = i) then s
283 :     else substring (s, 0, i)
284 :     end
285 :     fun version [] = []
286 :     | version [x : int] = [Int.toString x]
287 :     | version (x :: r) = (Int.toString x) :: "." :: (version r)
288 :     val v = fit (vbytes, concat (version version_id))
289 :     val a = fit (abytes, arch)
290 :     in
291 :     Byte.stringToBytes (concat [v, a, "\n"])
292 :     (* assert (Word8Vector.length (MAGIC <arch>) = magicBytes *)
293 :     end
294 :    
295 :     (* calculate size of code objects (including length fields) *)
296 :     fun codeSize (csegs: csegments) =
297 :     List.foldl
298 :     (fn (co, n) => n + CodeObj.size co + 4)
299 :     (CodeObj.size(#c0 csegs) + Word8Vector.length(#data csegs) + 8)
300 :     (#cn csegs)
301 :    
302 :     (* This function must be kept in sync with the "write" function below.
303 :     * It calculates the number of bytes written by a corresponding
304 :     * call to "write". *)
305 :     fun size { contents, nopickle } = let
306 :     val { imports, exportPid, senv, cmData, lambda, csegments, ... } =
307 :     unBF contents
308 :     val (_, picki) = pickleImports imports
309 :     val hasExports = isSome exportPid
310 :     fun pickleSize { pid, pickle } =
311 :     if nopickle then 0 else Word8Vector.length pickle
312 :     in
313 :     magicBytes +
314 :     9 * 4 +
315 :     Word8Vector.length picki +
316 :     (if hasExports then bytesPerPid else 0) +
317 :     bytesPerPid * (length cmData + 2) +
318 :     pickleSize lambda +
319 :     codeSize csegments +
320 :     pickleSize senv
321 :     end
322 :    
323 :     fun create { imports, exportPid, cmData, senv, lambda, csegments } =
324 :     BF { imports = imports,
325 :     exportPid = exportPid,
326 :     cmData = cmData,
327 :     senv = senv,
328 :     lambda = lambda,
329 :     csegments = csegments,
330 :     executable = ref NONE }
331 :    
332 :     (* must be called with second arg >= 0 *)
333 : blume 986 fun readCodeList (strm, nbytes) = let
334 : blume 902 fun readCode 0 = []
335 :     | readCode n = let
336 :     val sz = readInt32 strm
337 :     val n' = n - sz - 4
338 :     in
339 :     if n' < 0 then
340 :     error "code size"
341 : blume 986 else CodeObj.input(strm, sz) :: readCode n'
342 : blume 902 end
343 :     val dataSz = readInt32 strm
344 :     val n' = nbytes - dataSz - 4
345 :     val data = if n' < 0 then error "data size" else bytesIn (strm, dataSz)
346 :     in
347 :     case readCode n' of
348 :     (c0 :: cn) => { data = data, c0 = c0, cn = cn }
349 :     | [] => error "missing code objects"
350 :     end
351 :    
352 : blume 986 fun read { arch, version, stream = s } = let
353 : blume 902 val MAGIC = mkMAGIC (arch, version)
354 :     val magic = bytesIn (s, magicBytes)
355 :     val _ = if magic = MAGIC then () else error "bad magic number"
356 :     val leni = readInt32 s
357 :     val ne = readInt32 s
358 :     val importSzB = readInt32 s
359 :     val cmInfoSzB = readInt32 s
360 :     val nei = cmInfoSzB div bytesPerPid
361 :     val lambdaSz = readInt32 s
362 :     val reserved = readInt32 s
363 :     val pad = readInt32 s
364 :     val cs = readInt32 s
365 :     val es = readInt32 s
366 :     val imports = readImports (s, leni)
367 :     val exportPid =
368 :     (case ne of
369 :     0 => NONE
370 :     | 1 => SOME(readPid s)
371 :     | _ => error "too many export PIDs")
372 :     val envPids = readPidList (s, nei)
373 :     val (staticPid, lambdaPid, cmData) =
374 :     case envPids of
375 :     st :: lm :: cmData => (st, lm, cmData)
376 :     | _ => error "env PID list"
377 :     val plambda = bytesIn (s, lambdaSz)
378 :     (* We could simply skip the reserved area if there is one,
379 :     * but in that case there probably is something else seriously
380 :     * wrong (wrong version, etc.), so we may as well complain... *)
381 :     val _ = if reserved = 0 then () else error "non-zero reserved size"
382 :     (* skip padding *)
383 :     val _ = if pad <> 0 then ignore (bytesIn (s, pad)) else ()
384 :     (* now get the code *)
385 : blume 986 val code = readCodeList (s, cs)
386 : blume 902 val penv = bytesIn (s, es)
387 :     in
388 :     { contents = create { imports = imports,
389 :     exportPid = exportPid,
390 :     cmData = cmData,
391 :     senv = { pid = staticPid, pickle = penv },
392 :     lambda = { pid = lambdaPid, pickle = plambda },
393 :     csegments = code },
394 :     stats = { env = es, inlinfo = lambdaSz, code = cs,
395 :     data = Word8Vector.length (#data code) } }
396 :     end
397 :    
398 :     fun write { arch, version, stream = s, contents, nopickle } = let
399 :     (* Keep this in sync with "size" (see above). *)
400 :     val { imports, exportPid, cmData, senv, lambda, csegments, ... } =
401 :     unBF contents
402 :     val { pickle = senvP, pid = staticPid } = senv
403 :     val { pickle = lambdaP, pid = lambdaPid } = lambda
404 :     val envPids = staticPid :: lambdaPid :: cmData
405 :     val (leni, picki) = pickleImports imports
406 :     val importSzB = Word8Vector.length picki
407 :     val (ne, epl) =
408 :     case exportPid of
409 :     NONE => (0, [])
410 :     | SOME p => (1, [p])
411 :     val nei = length envPids
412 :     val cmInfoSzB = nei * bytesPerPid
413 :     fun pickleSize { pid, pickle } =
414 :     if nopickle then 0 else Word8Vector.length pickle
415 :     val lambdaSz = pickleSize lambda
416 :     val reserved = 0 (* currently no reserved area *)
417 :     val pad = 0 (* currently no padding *)
418 :     val cs = codeSize csegments
419 :     fun codeOut c = (writeInt32 s (CodeObj.size c); CodeObj.output (s, c))
420 :     val es = pickleSize senv
421 :     val writeEnv = if nopickle then fn () => ()
422 :     else fn () => BinIO.output (s, senvP)
423 :     val datasz = Word8Vector.length (#data csegments)
424 :     val MAGIC = mkMAGIC (arch, version)
425 :     in
426 :     BinIO.output (s, MAGIC);
427 :     app (writeInt32 s) [leni, ne, importSzB, cmInfoSzB,
428 :     lambdaSz, reserved, pad, cs, es];
429 :     BinIO.output (s, picki);
430 :     writePidList (s, epl);
431 :     (* arena1 *)
432 :     writePidList (s, envPids);
433 :     (* arena2 -- pickled flint stuff *)
434 :     if lambdaSz = 0 then () else BinIO.output (s, lambdaP);
435 :     (* arena3 is empty *)
436 :     (* arena4 is empty *)
437 :     (* code objects *)
438 :     writeInt32 s datasz;
439 :     BinIO.output(s, #data csegments);
440 :     codeOut (#c0 csegments);
441 :     app codeOut (#cn csegments);
442 :     writeEnv ();
443 :     { env = es, inlinfo = lambdaSz, data = datasz, code = cs }
444 :     end
445 :    
446 :     fun exec (BF { imports, exportPid, executable, csegments, ... }, dynenv) =
447 :     let val executable =
448 :     case !executable of
449 :     SOME e => e
450 :     | NONE => let
451 :     val e = Isolate.isolate (Execute.mkexec csegments)
452 :     in executable := SOME e; e
453 :     end
454 :     in
455 :     Execute.execute { executable = executable,
456 :     imports = imports,
457 :     exportPid = exportPid,
458 :     dynenv = dynenv }
459 :     end
460 :     end

root@smlnj-gforge.cs.uchicago.edu
ViewVC Help
Powered by ViewVC 1.0.0