SCM Repository
View of /dev-notes/new-literals.md
Parent Directory
|
Revision Log
Revision 4787 -
(download)
(annotate)
Wed Sep 5 14:35:32 2018 UTC (5 months, 2 weeks ago) by jhr
File size: 11088 byte(s)
Wed Sep 5 14:35:32 2018 UTC (5 months, 2 weeks ago) by jhr
File size: 11088 byte(s)
fix typos in new-literals.md
# New Literal Representation This document describes a new bytecode for generating heap-allocated literals in the SML/NJ compiler. The main reason for introducing a new bytecode is to prepare for future compiler improvements (64-bit support, `Real32`, and better `Int64` and `IntInf` integration). ## Endianess Multiple byte quantities are represented in big-endian form (most-significant byte first). ## Header The first four 32-bit words of the literal representation correspond to the following **C** struct: ````C struct literal_header { uint32_t magic; uint32_t maxstk; uint32_t wordsz; uint32_t maxsaved; }; ```` where the fields are in big-endian form and have the following meaning: * `magic` contains the version ID (which should be `0x20180508`) * `maxstk` is the maximum stack depth required, and * `wordsz` is the size of an ML value (32 or 64) * `maxsaved` is the number of saved literals (used for sharing) Note that Version 1 files will have the version ID `0x19981022` and have the first two header fields, but not the `wordsz` or `numsaved` fields. ## Opcodes The following is a list of the symbolic opcodes used in the interpreter. The instruction encoding is described below. * **INT**(*n*) literal value in the default (tagged) integer or word type (`Int.int` or `Word.word`). The value `n` should be in the range -2^w-1^ to 2^w-1^-1 when encoded as a w-bit 2's complement integer. The width *w* will be 31 or 63 depending on the host architecture. * **INT32**(*n*) 32-bit literal value for either the type `Int32.int` or `Word32.int`. * **INT64**(*n*) 64-bit literal value for either the type `Int64.int` or `Word64.int`. * **BIGINT**(*n*) arbitrary precision integer literal (currently not used). * **IVEC8**(*n*, *b~1~*, ..., *b~n~*) packed vector of 8-bit integers for either the type `Int8Vector.vector` or `Word8Vector.vector`. * **IVEC16**(*n*, *h~1~*, ..., *h~n~*) packed vector of 16-bit integers for either the type `Int16Vector.vector` or `Word16Vector.vector` (currently not used). * **IVEC32**(*n*, *w~1~*, ..., *w~n~*) packed vector of 32-bit integers for either the type `Int32Vector.vector` or `Word32Vector.vector` (currently not used). * **IVEC64**(*n*, *d~1~*, ..., *d~n~*) packed vector of 64-bit integers for either the type `Int64Vector.vector` or `Word64Vector.vector` (currently not used). * **REAL32**(*f*) 32-bit floating-point literal for the type `Real32.real` (currently not used). * **REAL64**(*f*) 64-bit floating-point literal for the type `Real32.real`. * **RVEC32**(*n*, *f~1~*, ..., *f~n~*) packed vector of 32-bit floating-point literals for the type `Real32Vector.vector` (currently not used). * **RVEC64**(*n*, *F~1~*, ..., *F~n~*) packed vector of 64-bit floating-point literals for the type `Real32Vector.vector`. * **STR8**(s) string literal (8-bit characters) * **RECORD**(n) construct record from the topmost n literal values * **VECTOR**(n) construct a vector from the topmost n literal values * **CONCAT**(n) pop *n* records/vectors from the stack and concatenate them into a single record/vector. This operation allows the implementation to avoid excessively large stacks when building very large record/vector literals. * **SAVE**(i) save the top of the stack in the i^th^ save slot, which allows it to be shared by some subsequent aggregate literal. * **LOAD**(i) push the i^th^ saved literal onto the stack. * **RETURN** signals the end of the program; the stack depth should be one and that value is popped and returned as the result. ### Future extensions There are a number of additional features that we might want to support, which we list here. * support for 32-bit string literals for the type `WideString.string` * support for packed records (once the compiler generates such objects) ## Instruction encoding ### Notation In the encoding below, we use the following conventions: * *b* represents a signed 8-bit integer. * *ub* represents an unsigned 8-bit integer. * *c* represents a 8-bit character. * *h* represents a signed 16-bit integer. * *w* represents a signed 32-bit integer. * *lw* represents a signed 64-bit integer. * *n* represents a 32-bit integer length (usually unsigned). * *d* represents a bignum digit whose size will be the default word size. * *f* represents a 32-bit floating-point literal. * *F* represents a 64-bit floating-point literal. * *i* represents a tagged default int or word literal (*i.e.*, `Int.int` or `Word.word`). * *I* represents an arbitrary-precision integer literal (*i.e.*, `IntInf.int`). ### Encoding * `00000000` (`0x00`) <br /> **INT**(0) <br /> default tagged literal value 0. * `00000001` (`0x01`) <br /> **INT**(1) <br /> default tagged literal value 1. * `00000010` (`0x02`) <br /> **INT**(2) <br /> default tagged literal value 2. * `00000011` (`0x03`) <br /> **INT**(3) <br /> default tagged literal value 3. * `00000100` (`0x04`) <br /> **INT**(4) default tagged literal value 4. * `00000101` (`0x05`) <br /> **INT**(5) default tagged literal value 5. * `00000110` (`0x06`) <br /> **INT**(6) default tagged literal value 6. * `00000111` (`0x07`) <br /> **INT**(7) default tagged literal value 7. * `00001000` (`0x08`) <br /> **INT**(8) default tagged literal value 8. * `00001001` (`0x09`) <br /> **INT**(9) default tagged literal value 9. * `00001010` (`0x0A`) <br /> **INT**(10) default tagged literal value 10. * `00001011` (`0x0B`) <br /> **INT**(-1) default tagged literal value -1. * `00001100` (`0x0C`) <br /> **INT**(-2) default tagged literal value -2. * `00001101` (`0x0D`) <br /> **INT**(-3) default tagged literal value -3. * `00001110` (`0x0E`) <br /> **INT**(-4) default tagged literal value -4. * `00001111` (`0x0F`) <br /> **INT**(-5) default tagged literal value -5. * `00010000` (`0x10` *b*) <br /> **INT**(*b*) --- for tagged integer literals in the range -128..127. * `00010001` (`0x11` *h*) <br /> **INT**(*h*) --- for tagged integer literals in the range -32768..32767. * `00010010` (`0x12` *w*) <br /> **INT**(*w*) --- for tagged integer literals in the range -2147483648..2147483647. * `00010011` (`0x13` *lw*) <br /> **INT**(*lw*) --- for all other tagged integer literals (64-bit target only). * `00010100` (`0x14` *b*) <br /> **INT32**(*b*) --- for 32-bit integer literals in the range -128..127. * `00010101` (`0x15` *h*) <br /> **INT32**(*h*) --- for 32-bit integer literals in the range -32768..32767. * `00010110` (`0x16` *w*) <br /> **INT32**(*w*) --- for all other 32-bit integer literals. * `00010111` (`0x17` *b*) <br /> **INT64**(*b*) --- for 64-bit integer literals in the range -128..127. * `00011000` (`0x18` *h*) <br /> **INT64**(*h*) --- for 64-bit integer literals in the range -64768..64767. * `00011001` (`0x19` *w*) <br /> **INT64**(*w*) --- for 64-bit integer literals in the range -2147483648..2147483647. * `00011010` (`0x1A` *lw*) <br /> **INT64**(*lw*) --- for all other 64-bit integer literals. * `00011011` (`0x1B` *n*) <br /> **BIGINT**(*b*) --- for bigint literals in the range -128..127. * `00011100` (`0x1C` *h*) <br /> **BIGINT**(*h*) --- for bigint literals in the range -32768..32767. * `00011101` (`0x1D` *w*) <br /> **BIGINT**(*w*) --- for bigint literals in the range -2147483648..2147483647. * `00011110` (`0x1E` *n* *d~1~* ... d~|n|~) <br /> **BIGINT**(*I*) --- where the absolute value of *n* is the number of digits (*i.e.*, if *n* is negative, then *I* is negative). The digits follow *n* in least-significant to most-significant order. If *n* is zero, the *I* is zero. The base *b* and size of the digits will depend on the target word size. * `00011111` (`0x1F` *ub* *i~1~* ... *i~ub~*) <br /> **IVEC**(*ub*, *i~1~*, ..., *i~ub~*) --- short int vector (up to 255 elements). * `00100000` (`0x20` *n* *i~1~* ... *i~n~*) <br /> **IVEC**(*ub*, *i~1~*, ..., *i~n~*) * `00100001` (`0x21` *ub* *b~1~* ... *b~ub~*) <br /> **IVEC8**(*ub*, *b~1~*, ..., *b~ub~*) --- short bytevectors (up to 255 elements). * `00100010` (`0x22` *n* *b~1~* ... *b~n~*) <br /> **IVEC8**(*n*, *b~1~*, ..., *b~n~*) * `00100011` (`0x23` *ub* *h~1~* ... *h~ub~*) <br /> **IVEC16**(*ub*, *h~1~*, ..., *h~ub~*) --- short 16-bit integer vectors (up to 255 elements). * `00100100` (`0x24` *n* *h~1~* ... *h~n~*) <br /> **IVEC16**(*n*, *h~1~*, ..., *h~n~*) * `00100101` (`0x25` *ub* *w~1~* ... *w~ub~*) <br /> **IVEC32**(*ub*, *w~1~*, ..., *w~ub~*) --- short 32-bit integer vectors (up to 255 elements). * `00100110` (`0x26` *n* *w~1~* ... *w~n~*) <br /> **IVEC32**(*n*, *w~1~*, ..., *w~n~*) * `00100111` (`0x27` *ub* *lw~1~* ... *lw~ub~*) <br /> **IVEC64**(*ub*, *lw~1~*, ..., *lw~ub~*) --- short 64-bit integer vectors (up to 255 elements). * `00101000` (`0x28` *n* *lw~1~* ... *lw~n~*) <br /> **IVEC64**(*n*, *lw~1~*, ..., *lw~n~*) * `00101001` (`0x29` *f*) <br /> **REAL32**(*f*) * `00101010` (`0x2A` *F*) <br /> **REAL64**(*F*) * `00101011` (`0x2B` *ub* *f~1~* ... *f~ub~*) <br /> **RVEC32**(*ub*, *f~1~*, ..., *f~ub~*) --- short 32-bit real vectors (up to 255 elements). * `00101100` (`0x2C` *n* *f~1~* ... *f~n~*) <br /> **RVEC32**(*n*, *f~1~*, ..., *f~n~*) * `00101101` (`0x2D` *ub* *F~1~* ... *F~ub~*) <br /> **RVEC64**(*ub*, *F~1~*, ..., *F~ub~*) --- short 64-bit real vectors (up to 255 elements). * `00101110` (`0x2E` *n* *F~1~* ... *F~n~*) <br /> **RVEC64**(*n*, *F~1~*, ..., *F~n~*) * `00101111` (`0x2F` *ub* *c~1~* ... *c~ub~*) <br /> **STR8**(*s*) --- where **size**(*s*) = *ub* and *c~1~*, ..., *c~ub~* are the characters of *s*. * `00110000` (`0x30` *n* *c~1~* ... *c~n~*) <br /> **STR8**(*s*) --- where **size**(*s*) = *n* and *c~1~*, ..., *c~n~* are the characters of *s*. * `00110001` (`0x31`) <br /> *reserved for STR32* * `00110010` (`0x32`) <br /> *reserved for STR32* * `00110011` (`0x33`) <br /> **RECORD**(1) * `00110100` (`0x34`) <br /> **RECORD**(2) * `00110101` (`0x35`) <br /> **RECORD**(3) * `00110101` (`0x36`) <br /> **RECORD**(4) * `00110101` (`0x37`) <br /> **RECORD**(5) * `00110101` (`0x38`) <br /> **RECORD**(6) * `00110101` (`0x39`) <br /> **RECORD**(7) * `00110101` (`0x3A` *ub*) <br /> **RECORD**(*ub*) * `00110101` (`0x3B` *h*) <br /> **RECORD**(*h*) * `00110101` (`0x3C` *ub*) <br /> **VECTOR**(*ub*) * `00110101` (`0x3D` *h*) <br /> **VECTOR**(*h*) * `00110101` (`0x3E` *h*) <br /> **CONCAT**(*h*) * `00111111` (`0x3F` *ub*) <br /> **SAVE**(*ub*) * `01000000` (`0x40` *h*) <br /> **SAVE**(*h*) * `01000001` (`0x41` *ub*) <br /> **LOAD**(*ub*) * `01000010` (`0x42` *h*) <br /> **LOAD**(*h*) * `01000011` -- `11111110` (`0x43` -- `0xFE`) <br /> *unused* * `11111111` (`0xFF`) <br /> **RETURN**
root@smlnj-gforge.cs.uchicago.edu | ViewVC Help |
Powered by ViewVC 1.0.0 |