% LZMA2 format The LZMA2 format supports flushing, parallel encoding or decoding. Chunks of data that cannot be compressed are copied as such. ## Dictionary Size LZMA2 requires information about the size of the dictionary. This is provided by a single byte. Bits | Mask | Description ----:|-----:|:------------------------------------------------ 0-5 | 0x3F | Dictionary Size 6-7 | 0xC0 | Reserved for future use; Must be zero The dictionary size is encoded with a one-bit mantissa and five-bit exponent. The smallest dictionary size is 4 KiB and the biggest is 4 GiB - 1 B. |Raw Value | Mantissa | Exponent | Dictionary size| |---------:|---------:|---------:|---------------:| | 0 | 2 | 11 | 4 KiB | | 1 | 3 | 11 | 6 KiB | | 2 | 2 | 12 | 8 KiB | | 3 | 3 | 12 | 12 KiB | | ... | ... | ... | ... | | 36 | 2 | 29 | 1024 MiB | | 37 | 3 | 29 | 1536 MiB | | 38 | 2 | 30 | 2048 MiB | | 39 | 3 | 30 | 3072 MiB | | 40 | 2 | 31 | 4096 MiB - 1B | For test purposes we add the dictionary size byte as first byte of an LZMA2 stream. ## Chunks An LZMA2 stream is a sequence of chunks. Each chunk is preceded by a control byte and other information. Following the C implementation in the LZMA SDK the control byte can be described as such: Chunk header | Description :------------------- | :-------------------------------------------------- `00000000` | End of LZMA2 stream `00000001 U U` | Uncompressed chunk, reset dictionary `00000010 U U` | Uncompressed chunk, no reset of dictionary `100uuuuu U U C C` | LZMA, no reset `101uuuuu U U C C` | LZMA, reset state `110uuuuu U U C C S` | LZMA, reset state, new properties `111uuuuu U U C C S` | LZMA, reset state, new properties, reset dictionary The symbols used are described by following table. Symbol | Description :----- | :-------------------- u | uncompressed size bit U | uncompressed size byte C | uncompressed size byte S | properties byte A dictionary reset requires always new properties. If this is an uncompressed chunk the properties need to be provided in the next compressed chunk. New properties require a reset of the state. A dictionary reset puts the current position to zero. Uncompressed data is written into the dictionary. The uncompressed size and compressed size are given in big-endian byte order. The values need to be incremented for the actual size. So a chunk with 1 byte uncompressed data will store size 0 in the uncompressed bits and bytes. The properties byte provides the parameters pb, lc, lp using following formula: S = (pb * 5 + lp) * 9 + lc This is same encoding used for LZMA. For LZMA2 following condition has been introduced: lc + lp <= 4. The parameters are defined as follows: Name | Range | Description :---- | :----- | :------------------------------ lc | [0,8] | number of literal context bits lp | [0,4] | number of literal pos bits pb | [0,4] | the number of pos bits