rfc9841v1.txt | rfc9841.txt | |||
---|---|---|---|---|
skipping to change at line 18 ¶ | skipping to change at line 18 ¶ | |||
August 2025 | August 2025 | |||
Shared Brotli Compressed Data Format | Shared Brotli Compressed Data Format | |||
Abstract | Abstract | |||
This specification defines a data format for shared brotli | This specification defines a data format for shared brotli | |||
compression, which adds support for shared dictionaries, large | compression, which adds support for shared dictionaries, large | |||
window, and a container format to brotli (RFC 7932). Shared | window, and a container format to brotli (RFC 7932). Shared | |||
dictionaries and large window support allow significant compression | dictionaries and large window support allow significant compression | |||
gains compared to regular brotli. This document updates RFC 7932. | gains compared to regular brotli. This document specifies an | |||
extension to the method defined in RFC 7932. | ||||
Status of This Memo | Status of This Memo | |||
This document is not an Internet Standards Track specification; it is | This document is not an Internet Standards Track specification; it is | |||
published for informational purposes. | published for informational purposes. | |||
This document is a product of the Internet Engineering Task Force | This document is a product of the Internet Engineering Task Force | |||
(IETF). It represents the consensus of the IETF community. It has | (IETF). It represents the consensus of the IETF community. It has | |||
received public review and has been approved for publication by the | received public review and has been approved for publication by the | |||
Internet Engineering Steering Group (IESG). Not all documents | Internet Engineering Steering Group (IESG). Not all documents | |||
skipping to change at line 264 ¶ | skipping to change at line 265 ¶ | |||
original dictionary in the custom dictionary. | original dictionary in the custom dictionary. | |||
If no shared dictionary is set, the decoder behaves the same as in | If no shared dictionary is set, the decoder behaves the same as in | |||
[RFC7932] on a brotli stream. | [RFC7932] on a brotli stream. | |||
If a shared dictionary is set, then it can set LZ77 dictionaries, | If a shared dictionary is set, then it can set LZ77 dictionaries, | |||
override static dictionary words, and/or override transforms. | override static dictionary words, and/or override transforms. | |||
3.1. Custom Static Dictionaries | 3.1. Custom Static Dictionaries | |||
If a custom word list is set, then the following behavior of the RFC | If a custom word list is set, then the following behaviors of the | |||
7932 decoder [RFC7932] is overridden: | decoder defined in [RFC7932] are overridden: | |||
Instead of the Static Dictionary Data from Appendix A of | Instead of the Static Dictionary Data from Appendix A of | |||
[RFC7932], one or more word lists from the custom static | [RFC7932], one or more word lists from the custom static | |||
dictionary data are used. | dictionary data are used. | |||
Instead of NDBITS at the end of Appendix A of [RFC7932], a custom | Instead of NDBITS at the end of Appendix A of [RFC7932], a custom | |||
SIZE_BITS_BY_LENGTH per custom word list is used. | SIZE_BITS_BY_LENGTH per custom word list is used. | |||
The copy length for a static dictionary reference must be between | The copy length for a static dictionary reference must be between | |||
4 and 31 and may not be a value for which SIZE_BITS_BY_LENGTH of | 4 and 31 and may not be a value for which SIZE_BITS_BY_LENGTH of | |||
this dictionary is 0. | this dictionary is 0. | |||
If a custom transforms list is set without context dependency, then | If a custom transforms list is set without context dependency, then | |||
the following behavior of the RFC 7932 decoder [RFC7932] is | the following behaviors of the decoder defined in [RFC7932] are | |||
overridden: | overridden: | |||
The "List of Word Transformations" from Appendix B of [RFC7932] is | The "List of Word Transformations" from Appendix B of [RFC7932] is | |||
overridden by one or more lists of custom prefixes, suffixes, and | overridden by one or more lists of custom prefixes, suffixes, and | |||
transform operations. | transform operations. | |||
The transform_id must be smaller than the number of transforms | The transform_id must be smaller than the number of transforms | |||
given in the custom transforms list. | given in the custom transforms list. | |||
If the dictionary is context dependent, it includes a lookup table of | If the dictionary is context dependent, it includes a lookup table of | |||
a 64-word list and transform list combinations. When resolving a | a 64 word list and transform list combinations. When resolving a | |||
static dictionary word, the decoder computes the literal Context ID | static dictionary word, the decoder computes the literal Context ID | |||
as described in Section 7.1 of [RFC7932]. The literal Context ID is | as described in Section 7.1 of [RFC7932]. The literal Context ID is | |||
used as the index in the lookup tables to select the word list and | used as the index in the lookup tables to select the word list and | |||
transforms to use. If the dictionary is not context dependent, this | transforms to use. If the dictionary is not context dependent, this | |||
ID is implicitly 0 instead. | ID is implicitly 0 instead. | |||
If a distance goes beyond the dictionary for the current ID and | If a distance goes beyond the dictionary for the current ID and | |||
multiple word/transform list combinations are defined, then a next | multiple word/transform list combinations are defined, then the next | |||
dictionary is used in the following order: if not context dependent, | dictionary is used in the following order: | |||
the same order as defined in the shared dictionary. If context | ||||
dependent, the index matching the current context is used first, the | * If context dependent: | |||
same order as defined in the shared dictionary excluding the current | ||||
context are used next. | - use the index matching the current context first, and then | |||
- use the same order as defined in the shared dictionary | ||||
(excluding the current context) next. | ||||
* If not context dependent: | ||||
- use the same order as defined in the shared dictionary. | ||||
3.1.1. Transform Operations | 3.1.1. Transform Operations | |||
A shared dictionary may include custom word transformations to | A shared dictionary may include custom word transformations to | |||
replace those specified in Section 8 and Appendix B of [RFC7932]. A | replace those specified in Section 8 and Appendix B of [RFC7932]. A | |||
transform consists of a possible prefix, a transform operation, for | transform consists of a possible prefix, a transform operation, a | |||
some operations a parameter, and a possible suffix. In the shared | parameter (for some operations), and a possible suffix. In the | |||
dictionary format, the transform operation is represented by a | shared dictionary format, the transform operation is represented by a | |||
numerical ID, which is listed in the table below. | numerical ID, which is listed in the table below. | |||
+====+===========================+ | +====+===========================+ | |||
| ID | Operation | | | ID | Operation | | |||
+====+===========================+ | +====+===========================+ | |||
| 0 | Identity | | | 0 | Identity | | |||
+----+---------------------------+ | +----+---------------------------+ | |||
| 1 | OmitLast1 | | | 1 | OmitLast1 | | |||
+----+---------------------------+ | +----+---------------------------+ | |||
| 2 | OmitLast2 | | | 2 | OmitLast2 | | |||
skipping to change at line 464 ¶ | skipping to change at line 472 ¶ | |||
4. Varint Encoding | 4. Varint Encoding | |||
A varint is encoded in base 128 in one or more bytes as follows: | A varint is encoded in base 128 in one or more bytes as follows: | |||
+--------+--------+ +--------+ | +--------+--------+ +--------+ | |||
|1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| | |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| | |||
+--------+--------+ +--------+ | +--------+--------+ +--------+ | |||
where the "x" bits of the first byte are the LSBs of the value and | where the "x" bits of the first byte are the LSBs of the value and | |||
the "x" bits of the last byte are the MSBs of the value. The last | the "x" bits of the last byte are the MSBs of the value. The last | |||
byte must have its MSB set to 0, all other bytes to 1 to indicate | byte must have its MSB set to 0 and all other bytes must have their | |||
there is a next byte. | MSBs set to 1 to indicate there is a next byte. | |||
The maximum allowed amount of bits to read is 63 bits; if the 9th | The maximum allowed amount of bits to read is 63 bits; if the 9th | |||
byte is present and has its MSB set, then the stream must be | byte is present and has its MSB set, then the stream must be | |||
considered as invalid. | considered as invalid. | |||
5. Shared Dictionary Stream | 5. Shared Dictionary Stream | |||
The shared dictionary stream encodes a custom dictionary for brotli, | The shared dictionary stream encodes a custom dictionary for brotli, | |||
including custom words and/or custom transformations. A shared | including custom words and/or custom transformations. A shared | |||
dictionary may appear as a standalone or as contents of a resource in | dictionary may appear as a standalone or as contents of a resource in | |||
a framing format container. | a framing format container. | |||
A compliant shared brotli dictionary stream must have the following | A compliant shared brotli dictionary stream must have the following | |||
format: | format: | |||
2 bytes: File signature, in hexadecimal the bytes 91, 0. | 2 bytes: File signature in hexadecimal format (bytes 91 and 0). | |||
varint: LZ77_DICTIONARY_LENGTH. The number of bytes for an LZ7711 | varint: LZ77_DICTIONARY_LENGTH. The number of bytes for an LZ7711 | |||
dictionary or 0 if there is none. The maximum allowed value is | dictionary, or 0 if there is none. The maximum allowed value is | |||
the maximum possible sliding window size of brotli or large window | the maximum possible sliding window size of brotli or large window | |||
brotli. | brotli. | |||
LZ77_DICTIONARY_LENGTH bytes: Contents of the LZ77 dictionary. | LZ77_DICTIONARY_LENGTH bytes: Contents of the LZ77 dictionary. | |||
1 byte: NUM_CUSTOM_WORD_LISTS. May have a value of 0 to 64. | 1 byte: NUM_CUSTOM_WORD_LISTS. May have a value in range 0 to 64. | |||
NUM_CUSTOM_WORD_LISTS times a word list with the following format | NUM_CUSTOM_WORD_LISTS times a word list with the following format | |||
for each word list: | for each word list: | |||
28 bytes: SIZE_BITS_BY_LENGTH. An array of 28 unsigned 8-bit | 28 bytes: SIZE_BITS_BY_LENGTH. An array of 28 unsigned 8-bit | |||
integers, indexed by word lengths 4 to 31. The value | integers, indexed by word lengths 4 to 31. The value | |||
represents log2(number of words of this length), with the | represents log2(number of words of this length), with the | |||
exception of 0 meaning 0 words of this length. The max allowed | exception of 0 meaning 0 words of this length. The max allowed | |||
length value is 15 bits. OFFSETS_BY_LENGTH is computed from | length value is 15 bits. OFFSETS_BY_LENGTH is computed from | |||
this as OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + | this as OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + | |||
(SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) : 0). | (SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) : 0). | |||
N bytes: Words dictionary data, where N is OFFSETS_BY_LENGTH[31] | N bytes: Words dictionary data, where N is OFFSETS_BY_LENGTH[31] | |||
+ (SIZE_BITS_BY_LENGTH[31] ? (31 << SIZE_BITS_BY_LENGTH[31]) : | + (SIZE_BITS_BY_LENGTH[31] ? (31 << SIZE_BITS_BY_LENGTH[31]) : | |||
0), with all the words of shortest length first, then all words | 0), with all the words of shortest length first, then all words | |||
of the next length, and so on, where there are either 0 or a | of the next length, and so on, where there are either 0 or a | |||
positive power of two number of words for each length. | positive power of two number of words for each length. | |||
1 byte: NUM_CUSTOM_TRANSFORM_LISTS. May have a value of 0 to 64. | 1 byte: NUM_CUSTOM_TRANSFORM_LISTS. May have a value in range 0 to | |||
64. | ||||
NUM_CUSTOM_TRANSFORM_LISTS times a transform list with the | NUM_CUSTOM_TRANSFORM_LISTS times a transform list with the | |||
following format for each transform list: | following format for each transform list: | |||
2 bytes: PREFIX_SUFFIX_LENGTH. The length of prefix/suffix data. | 2 bytes: PREFIX_SUFFIX_LENGTH. The length of prefix/suffix data. | |||
Must be at least 1 because the list must always end with a | Must be at least 1 because the list must always end with a | |||
zero-length stringlet even if it is empty. | zero-length stringlet even if it is empty. | |||
NUM_PREFIX_SUFFIX times: Prefix/suffix stringlet. | NUM_PREFIX_SUFFIX times: Prefix/suffix stringlet. | |||
NUM_PREFIX_SUFFIX is the number of stringlets parsed and must | NUM_PREFIX_SUFFIX is the number of stringlets parsed and must | |||
skipping to change at line 533 ¶ | skipping to change at line 542 ¶ | |||
for the last (terminating) entry of the transform list. For | for the last (terminating) entry of the transform list. For | |||
other entries, STRING_LENGTH must be in range 1..255. The 0 | other entries, STRING_LENGTH must be in range 1..255. The 0 | |||
entry must be present and must be the last byte of the | entry must be present and must be the last byte of the | |||
PREFIX_SUFFIX_LENGTH bytes of prefix/suffix data, else the | PREFIX_SUFFIX_LENGTH bytes of prefix/suffix data, else the | |||
stream must be rejected as invalid. | stream must be rejected as invalid. | |||
STRING_LENGTH bytes: Contents of the prefix/suffix. | STRING_LENGTH bytes: Contents of the prefix/suffix. | |||
1 byte: NTRANSFORMS. Number of transformation triplets. | 1 byte: NTRANSFORMS. Number of transformation triplets. | |||
NTRANSFORMS times: Data for each transform: | NTRANSFORMS times the data for each transform listed below: | |||
1 byte: Index of prefix in prefix/suffix data; must be less | 1 byte: Index of prefix in prefix/suffix data; must be less | |||
than NUM_PREFIX_SUFFIX. | than NUM_PREFIX_SUFFIX. | |||
1 byte: Index of suffix in prefix/suffix data; must be less | 1 byte: Index of suffix in prefix/suffix data; must be less | |||
than NUM_PREFIX_SUFFIX. | than NUM_PREFIX_SUFFIX. | |||
1 byte: Operation index; must be an index in the table of | 1 byte: Operation index; must be an index in the table of | |||
operations listed in Section 3.1.1. | operations listed in Section 3.1.1. | |||
If and only if at least one transform has operation index | If and only if at least one transform has operation index | |||
ShiftFirst or ShiftAll: | ShiftFirst or ShiftAll, then NTRANSFORMS times the following: | |||
NTRANSFORMS times: | ||||
2 bytes: Parameters for the transform. If the transform | 2 bytes: Parameters for the transform. If the transform does | |||
does not have type ShiftFirst or ShiftAll, the value must | not have type ShiftFirst or ShiftAll, the value must be 0. | |||
be 0. ShiftFirst and ShiftAll interpret these bytes as | ShiftFirst and ShiftAll interpret these bytes as an unsigned | |||
an unsigned 16-bit integer. | 16-bit integer. | |||
If NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > 0 | If NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > 0 | |||
(else implicitly NUM_DICTIONARIES is 1 and points to the brotli | (else implicitly NUM_DICTIONARIES is 1 and points to the brotli | |||
built-in and there is no context map): | built-in and there is no context map): | |||
1 byte: NUM_DICTIONARIES. May have value 1 to 64. Each | 1 byte: NUM_DICTIONARIES. May have a value in range 1 to 64. | |||
dictionary is a combination of a word list and a transform | Each dictionary is a combination of a word list and a transform | |||
list. Each next dictionary is used when the distance goes | list. Each next dictionary is used when the distance goes | |||
beyond the previous. If a CONTEXT_MAP is enabled, then the | beyond the previous. If a CONTEXT_MAP is enabled, then the | |||
dictionary matching the context is moved to the front in the | dictionary matching the context is moved to the front in the | |||
order for this context. | order for this context. | |||
NUM_DICTIONARIES times: The DICTIONARY_MAP: | NUM_DICTIONARIES times the DICTIONARY_MAP, which contains: | |||
1 byte: Index into a custom word list or value | 1 byte: Index into a custom word list or value | |||
NUM_CUSTOM_WORD_LISTS to indicate using the brotli [RFC7932] | NUM_CUSTOM_WORD_LISTS to indicate using the brotli [RFC7932] | |||
built-in default word list. | built-in default word list. | |||
1 byte: Index into a custom transform list or value | 1 byte: Index into a custom transform list or value | |||
NUM_CUSTOM_TRANSFORM_LISTS to indicate using the brotli | NUM_CUSTOM_TRANSFORM_LISTS to indicate using the brotli | |||
[RFC7932] built-in default transform list. | [RFC7932] built-in default transform list. | |||
1 byte: CONTEXT_ENABLED. If 0, there is no context map. If 1, a | 1 byte: CONTEXT_ENABLED. If 0, there is no context map. If 1, a | |||
skipping to change at line 592 ¶ | skipping to change at line 599 ¶ | |||
first dictionary to use for this context. | first dictionary to use for this context. | |||
6. Large Window Brotli Compressed Data Stream | 6. Large Window Brotli Compressed Data Stream | |||
Large window brotli allows a sliding window beyond the 24-bit maximum | Large window brotli allows a sliding window beyond the 24-bit maximum | |||
of regular brotli [RFC7932]. | of regular brotli [RFC7932]. | |||
The compressed data stream is backwards compatible to brotli | The compressed data stream is backwards compatible to brotli | |||
[RFC7932] and may optionally have the following differences: | [RFC7932] and may optionally have the following differences: | |||
Encoding of WBITS in the stream header: The following new pattern of | In the encoding of WBITS in the stream header, the following new | |||
14 bits is supported: | pattern of 14 bits is supported: | |||
8 bits: Value 00010001 to indicate a large window brotli stream. | 8 bits: Value 00010001 to indicate a large window brotli stream. | |||
6 bits: WBITS. Must have value in range 10 to 62. | 6 bits: WBITS. Must have value in range 10 to 62. | |||
Distance alphabet: If the stream is a large window brotli stream, | Distance alphabet: If the stream is a large window brotli stream, | |||
the maximum number of extra bits is 62 and the theoretical maximum | the maximum number of extra bits is 62 and the theoretical maximum | |||
size of the distance alphabet is (16 + NDIRECT + (124 << | size of the distance alphabet is (16 + NDIRECT + (124 << | |||
NPOSTFIX)). This overrides the value for the distance alphabet | NPOSTFIX)). This overrides the value for the distance alphabet | |||
size given in Section 3.3 of [RFC7932] and affects the number of | size given in Section 3.3 of [RFC7932] and affects the number of | |||
skipping to change at line 638 ¶ | skipping to change at line 645 ¶ | |||
* The stream may have the format of regular brotli [RFC7932] or the | * The stream may have the format of regular brotli [RFC7932] or the | |||
format of large window brotli as described in Section 6. | format of large window brotli as described in Section 6. | |||
8. Shared Brotli Framing Format Stream | 8. Shared Brotli Framing Format Stream | |||
A compliant shared brotli framing format stream has the format | A compliant shared brotli framing format stream has the format | |||
described below. | described below. | |||
8.1. Main Format | 8.1. Main Format | |||
4 bytes: File signature, in hexadecimal the bytes 0x91, 0x0a, 0x42, | 4 bytes: File signature in hexadecimal format (bytes 0x91, 0x0a, | |||
0x52. The first byte contains the invalid WBITS combination for | 0x42, and 0x52). The first byte contains the invalid WBITS | |||
brotli [RFC7932] and large window brotli. | combination for brotli [RFC7932] and large window brotli. | |||
1 byte: Container flags that are 8 bits and have the following | 1 byte: Container flags that are 8 bits and have the following | |||
meanings: | meanings: | |||
bit 0 and 1: Version indicator that must be b'00. Otherwise, the | bits 0 and 1: Version indicator that must be b'00. Otherwise, | |||
decoder must reject the data stream as invalid. | the decoder must reject the data stream as invalid. | |||
bit 2: If 0, the file contains no final footer, may not contain | bit 2: If 0, the file contains no final footer, may not contain | |||
any metadata chunks, may not contain a central directory, and | any metadata chunks, may not contain a central directory, and | |||
may encode only a single resource (using one or more data | may encode only a single resource (using one or more data | |||
chunks). If 1, the file may contain one or more resources, | chunks). If 1, the file may contain one or more resources, | |||
metadata, and a central directory, and it must contain a final | metadata, and a central directory, and it must contain a final | |||
footer. | footer. | |||
multiple times: A chunk, each with the format specified in | multiple times: A chunk, each with the format specified in | |||
Section 8.2. | Section 8.2. | |||
skipping to change at line 707 ¶ | skipping to change at line 714 ¶ | |||
can be 1 serialized dictionary and 15 prefix dictionaries | can be 1 serialized dictionary and 15 prefix dictionaries | |||
maximum (a serialized dictionary may already contain one of | maximum (a serialized dictionary may already contain one of | |||
those). Circular references are not allowed (any dictionary | those). Circular references are not allowed (any dictionary | |||
reference that directly or indirectly uses this chunk itself as | reference that directly or indirectly uses this chunk itself as | |||
dictionary). | dictionary). | |||
Per dictionary reference: | Per dictionary reference: | |||
1 byte: Flags: | 1 byte: Flags: | |||
bit 0 and 1: Dictionary source: | bits 0 and 1: Dictionary source: | |||
00: Internal dictionary reference to a full resource by | 00: Internal dictionary reference to a full resource by | |||
pointer, which can span one or more chunks. Must | pointer, which can span one or more chunks. Must | |||
point to a full data chunk or a first partial data | point to a full data chunk or a first partial data | |||
chunk. | chunk. | |||
01: Internal dictionary reference to single chunk | 01: Internal dictionary reference to single chunk | |||
contents by pointer. May point to any chunk with | contents by pointer. May point to any chunk with | |||
content (data or metadata). If a partial data | content (data or metadata). If a partial data | |||
chunk, only this part is the dictionary. In this | chunk, only this part is the dictionary. In this | |||
skipping to change at line 731 ¶ | skipping to change at line 738 ¶ | |||
10: Reference to a dictionary by hash code of a | 10: Reference to a dictionary by hash code of a | |||
resource. The dictionary can come from an external | resource. The dictionary can come from an external | |||
source, such as a different container. The user of | source, such as a different container. The user of | |||
the decoder must be able to provide the dictionary | the decoder must be able to provide the dictionary | |||
contents given its hash code (even if it comes from | contents given its hash code (even if it comes from | |||
this container itself) or treat it as an error when | this container itself) or treat it as an error when | |||
the user does not have it available. | the user does not have it available. | |||
11: Invalid bit combination | 11: Invalid bit combination | |||
bit 2 and 3: Dictionary type: | bits 2 and 3: Dictionary type: | |||
00: Prefix dictionary, set in front of the sliding | 00: Prefix dictionary, set in front of the sliding | |||
window | window | |||
01: Serialized dictionary in the shared brotli format as | 01: Serialized dictionary in the shared brotli format as | |||
specified in Section 5. | specified in Section 5. | |||
10: Invalid bit combination | 10: Invalid bit combination | |||
11: Invalid bit combination | 11: Invalid bit combination | |||
bit 4-7: Must be 0 | bits 4-7: Must be 0 | |||
If hash-based: | If hash-based: | |||
1 byte: Type of hash used. Only supported value: 3, | 1 byte: Type of hash used. Only supported value: 3, | |||
indicating 256-bit HighwayHash [HWYHASH]. | indicating 256-bit HighwayHash [HWYHASH]. | |||
32 bytes: 256-bit HighwayHash checksum to refer to | 32 bytes: 256-bit HighwayHash checksum to refer to | |||
dictionary. | dictionary. | |||
If pointer based: Varint-encoded pointer to its chunk in this | If pointer based: Varint-encoded pointer to its chunk in this | |||
container. The chunk must come in the container earlier | container. The chunk must come in the container earlier | |||
than the current chunk. | than the current chunk. | |||
X bytes: Extra header bytes, depending on CHUNK_TYPE. If present, | X bytes: Extra header bytes, depending on CHUNK_TYPE. If present, | |||
they are specified in the subsequent sections. | they are specified in the subsequent sections. | |||
remaining bytes: The chunk contents. The uncompressed data in | remaining bytes: The chunk contents. The uncompressed data in the | |||
the chunk content depends on CHUNK_TYPE and is specified in the | chunk content depends on CHUNK_TYPE and is specified in the | |||
subsequent sections. The compressed data has following format | subsequent sections. The compressed data has following format | |||
depending on CODEC: | depending on CODEC: | |||
* uncompressed: The raw bytes. | * uncompressed: The raw bytes. | |||
* If "keep decoder", the continuation of the compressed stream | * If "keep decoder", the continuation of the compressed stream | |||
that was interrupted at the end of the previous chunk. The | that was interrupted at the end of the previous chunk. The | |||
decoder from the previous chunk must be used and its state | decoder from the previous chunk must be used and its state it | |||
it had at the end of the previous chunk must be kept at the | had at the end of the previous chunk must be kept at the start | |||
start of the decoding of this chunk. | of the decoding of this chunk. | |||
* brotli: The bytes are in brotli format [RFC7932]. | * brotli: The bytes are in brotli format [RFC7932]. | |||
* shared brotli: The bytes are in the shared brotli format | * shared brotli: The bytes are in the shared brotli format | |||
specified in Section 7. | specified in Section 7. | |||
8.3. Metadata Format | 8.3. Metadata Format | |||
All the metadata chunk types use the following format for the | All the metadata chunk types use the following format for the | |||
uncompressed content: | uncompressed content: | |||
Per field: | Per field: | |||
2 bytes: Code to identify this metadata field. This must be two | 2 bytes: Code to identify this metadata field. This must be two | |||
lowercase or two uppercase alpha ASCII characters. If the | lowercase or two uppercase alpha ASCII characters. If the | |||
decoder encounters a lowercase field that it does not recognize | decoder encounters a lowercase field that it does not recognize | |||
skipping to change at line 828 ¶ | skipping to change at line 835 ¶ | |||
This chunk contains metadata that applies to the resource whose | This chunk contains metadata that applies to the resource whose | |||
beginning is encoded in the subsequent data chunk or first partial | beginning is encoded in the subsequent data chunk or first partial | |||
data chunk. | data chunk. | |||
The contents of this chunk follows the format described in | The contents of this chunk follows the format described in | |||
Section 8.3. | Section 8.3. | |||
The following field types are recognized: | The following field types are recognized: | |||
id: Name field. May appear 0 or 1 times. Has the following format: | id (N bytes): Name field. May appear 0 or 1 times. Has the | |||
following format: name in UTF-8 encoding, length determined by the | ||||
N bytes: Name in UTF-8 encoding, length determined by the field | field length. Treated generically but may be used as a filename. | |||
length. Treated generically but may be used as a filename. If | If used as a filename, forward slashes '/' should be used as | |||
used as a filename, forward slashes '/' should be used as | directory separators, relative paths should be used, and filenames | |||
directory separators, relative paths should be used, and | ending in a slash with 0-length content in the matching data chunk | |||
filenames ending in a slash with 0-length content in the | should be treated as an empty directory. | |||
matching data chunk should be treated as an empty directory. | ||||
mt: Modification type. May appear 0 or 1 times. Has the following | ||||
format: | ||||
8 bytes: Microseconds since epoch, as a little-endian, signed | mt (8 bytes): Modification type. May appear 0 or 1 times. Has the | |||
two's complement 64-bit integer. | following format: contains microseconds since epoch, as a little- | |||
endian, signed two's complement 64-bit integer. | ||||
custom user field: Any two uppercase ASCII characters. | custom user field: Any two uppercase ASCII characters. | |||
8.4.3. Data Chunk (Type 2) | 8.4.3. Data Chunk (Type 2) | |||
A data chunk contains the actual data of a resource. | A data chunk contains the actual data of a resource. | |||
This chunk has the following extra header bytes: | This chunk has the following extra header bytes: | |||
1 byte: Flags: | 1 byte: Flags: | |||
bit 0: If true, indicates this is not a resource that should be | bit 0: If true, indicates this is not a resource that should be | |||
output implicitly as part of extracting resources from this | output implicitly as part of extracting resources from this | |||
container. Instead, it may be referred to only explicitly, | container. Instead, it may be referred to only explicitly, | |||
e.g., as a dictionary reference by hash code or offset. This | e.g., as a dictionary reference by hash code or offset. This | |||
flag should be set for data used as dictionary to improve | flag should be set for data used as dictionary to improve | |||
compression of actual resources. | compression of actual resources. | |||
bit 1: If true, hash code is given | bit 1: If true, hash code is given. | |||
bits 2-7: Must be zero. | bits 2-7: Must be zero. | |||
If hash code is given: | If hash code is given: | |||
1 byte: Type of hash used. Only supported value: 3, indicating | 1 byte: Type of hash used. Only supported value: 3, indicating | |||
256-bit HighwayHash [HWYHASH]. | 256-bit HighwayHash [HWYHASH]. | |||
32 bytes: 256-bit HighwayHash checksum of the uncompressed data. | 32 bytes: 256-bit HighwayHash checksum of the uncompressed data. | |||
skipping to change at line 1003 ¶ | skipping to change at line 1007 ¶ | |||
8.4.10. Central Directory Chunk (Type 9) | 8.4.10. Central Directory Chunk (Type 9) | |||
The central directory chunk along with the repeat metadata chunks | The central directory chunk along with the repeat metadata chunks | |||
allow quickly finding and listing compressed resources in the | allow quickly finding and listing compressed resources in the | |||
container file. | container file. | |||
The central directory chunk is always uncompressed and does not have | The central directory chunk is always uncompressed and does not have | |||
the codec byte. It instead has the following format: | the codec byte. It instead has the following format: | |||
varint: Pointer into the file where the repeat metadata chunks are | varint: Pointer into the file where the repeat metadata chunks are | |||
located or 0 if they are not present per chunk listed: | located or 0 if they are not present. | |||
per chunk listed: | ||||
varint: Pointer into the file where this chunk begins. | varint: Pointer into the file where this chunk begins. | |||
varint: Number of header bytes N used below. | varint: Number of header bytes N used below. | |||
N bytes: Copy of all the header bytes of the pointed at chunk, | N bytes: Copy of all the header bytes of the pointed at chunk, | |||
including total size, chunk type byte, codec, uncompressed | including total size, chunk type byte, codec, uncompressed | |||
size, dictionary references, and X extra header bytes. The | size, dictionary references, and X extra header bytes. The | |||
content is not repeated here. | content is not repeated here. | |||
The last listed chunk is reached when the end of the contents of the | The last listed chunk is reached when the end of the contents of the | |||
central directory are reached. If the end does not match the last | central directory are reached. If the end does not match the last | |||
byte of the central directory, the decoder must reject the data | byte of the central directory, the decoder must reject the data | |||
stream as invalid. | stream as invalid. | |||
If present, the central directory must list all data and metadata | If present, the central directory must list all data and metadata | |||
chunks of all types. | chunks of all types. | |||
8.4.11. Final Footer Chunk (Type 10) | 8.4.11. Final Footer Chunk (Type 10) | |||
The final footer chunk closes the file and is only present if in the | The final footer chunk closes the file and is only present if bit 2 | |||
initial container header flags bit 2 was set. | of the initial container flags was set. | |||
This chunk has the following content, which is always uncompressed: | This chunk has the following content, which is always uncompressed: | |||
reversed varint: Size of this entire framing format file, including | reversed varint: Size of this entire framing format file, including | |||
these bytes themselves, or 0 if this size is not given. | these bytes themselves, or 0 if this size is not given. | |||
reversed varint: Pointer to the start of the central directory, or 0 | reversed varint: Pointer to the start of the central directory, or 0 | |||
if there is none. | if there is none. | |||
A reversed varint has the same format as a varint but its bytes are | A reversed varint has the same format as a varint but its bytes are | |||
skipping to change at line 1092 ¶ | skipping to change at line 1098 ¶ | |||
The dictionary must be treated with the same security precautions as | The dictionary must be treated with the same security precautions as | |||
the content because a change to the dictionary can result in a change | the content because a change to the dictionary can result in a change | |||
to the decompressed content. | to the decompressed content. | |||
The CRIME attack [CRIME] shows that it's a bad idea to compress data | The CRIME attack [CRIME] shows that it's a bad idea to compress data | |||
from mixed (e.g., public and private) sources -- the data sources | from mixed (e.g., public and private) sources -- the data sources | |||
include not only the compressed data but also the dictionaries. For | include not only the compressed data but also the dictionaries. For | |||
example, if you compress secret cookies using a public-data-only | example, if you compress secret cookies using a public-data-only | |||
dictionary, you still leak information about the cookies. | dictionary, you still leak information about the cookies. | |||
Not only can the dictionary reveal information about the compressed | The dictionary can reveal information about the compressed data and | |||
data, but vice versa; data compressed with the dictionary can reveal | vice versa. That is, data compressed with the dictionary can reveal | |||
the contents of the dictionary when an adversary can control parts of | contents of the dictionary when an adversary can control parts of the | |||
data to compress and see the compressed size. On the other hand, if | data to compress and see the compressed size. On the other hand, if | |||
the adversary can control the dictionary, the adversary can learn | the adversary can control the dictionary, the adversary can learn | |||
information about the compressed data. | information about the compressed data. | |||
The most robust defense against CRIME is not to compress private | The most robust defense against CRIME is not to compress private | |||
data, e.g., sensitive headers like cookies or any content with | data, e.g., sensitive headers like cookies or any content with | |||
personally identifiable information (PII). The challenge has been to | personally identifiable information (PII). The challenge has been to | |||
identify secrets within a vast amount of data to be compressed. | identify secrets within a vast amount of data to be compressed. | |||
Cloudflare uses a regular expression [CLOUDFLARE]. Another idea is | Cloudflare uses a regular expression [CLOUDFLARE]. Another idea is | |||
to extend existing web template systems (e.g., Soy [SOY]) to allow | to extend existing web template systems (e.g., Soy [SOY]) to allow | |||
skipping to change at line 1173 ¶ | skipping to change at line 1179 ¶ | |||
[CRIME] CVE Program, "CVE-2012-4929", | [CRIME] CVE Program, "CVE-2012-4929", | |||
<https://www.cve.org/CVERecord?id=CVE-2012-4929>. | <https://www.cve.org/CVERecord?id=CVE-2012-4929>. | |||
[LZ77] Ziv, J. and A. Lempel, "A Universal Algorithm for | [LZ77] Ziv, J. and A. Lempel, "A Universal Algorithm for | |||
Sequential Data Compression", IEEE Transactions on | Sequential Data Compression", IEEE Transactions on | |||
Information Theory, vol. 23, no. 3, pp. 337-343, | Information Theory, vol. 23, no. 3, pp. 337-343, | |||
DOI 10.1109/TIT.1977.1055714, May 1977, | DOI 10.1109/TIT.1977.1055714, May 1977, | |||
<https://doi.org/10.1109/TIT.1977.1055714>. | <https://doi.org/10.1109/TIT.1977.1055714>. | |||
[SOY] Google Developers, "Closure Tools", | [SOY] Google Developers, "Closure Tools", | |||
<https://developers.google.com/closure/templates/>. | <https://developers.google.com/closure>. | |||
Acknowledgments | Acknowledgments | |||
The authors would like to thank Robert Obryk for suggesting | The authors would like to thank Robert Obryk for suggesting | |||
improvements to the format and the text of the specification. | improvements to the format and the text of the specification. | |||
Authors' Addresses | Authors' Addresses | |||
Jyrki Alakuijala | Jyrki Alakuijala | |||
Google, Inc. | Google, Inc. | |||
End of changes. 34 change blocks. | ||||
75 lines changed or deleted | 81 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |