rfc9841v1.txt   rfc9841.txt 
skipping to change at line 18 skipping to change at line 18
August 2025 August 2025
Shared Brotli Compressed Data Format Shared Brotli Compressed Data Format
Abstract Abstract
This specification defines a data format for shared brotli This specification defines a data format for shared brotli
compression, which adds support for shared dictionaries, large compression, which adds support for shared dictionaries, large
window, and a container format to brotli (RFC 7932). Shared window, and a container format to brotli (RFC 7932). Shared
dictionaries and large window support allow significant compression dictionaries and large window support allow significant compression
gains compared to regular brotli. This document updates RFC 7932. gains compared to regular brotli. This document specifies an
extension to the method defined in RFC 7932.
Status of This Memo Status of This Memo
This document is not an Internet Standards Track specification; it is This document is not an Internet Standards Track specification; it is
published for informational purposes. published for informational purposes.
This document is a product of the Internet Engineering Task Force This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents Internet Engineering Steering Group (IESG). Not all documents
skipping to change at line 264 skipping to change at line 265
original dictionary in the custom dictionary. original dictionary in the custom dictionary.
If no shared dictionary is set, the decoder behaves the same as in If no shared dictionary is set, the decoder behaves the same as in
[RFC7932] on a brotli stream. [RFC7932] on a brotli stream.
If a shared dictionary is set, then it can set LZ77 dictionaries, If a shared dictionary is set, then it can set LZ77 dictionaries,
override static dictionary words, and/or override transforms. override static dictionary words, and/or override transforms.
3.1. Custom Static Dictionaries 3.1. Custom Static Dictionaries
If a custom word list is set, then the following behavior of the RFC If a custom word list is set, then the following behaviors of the
7932 decoder [RFC7932] is overridden: decoder defined in [RFC7932] are overridden:
Instead of the Static Dictionary Data from Appendix A of Instead of the Static Dictionary Data from Appendix A of
[RFC7932], one or more word lists from the custom static [RFC7932], one or more word lists from the custom static
dictionary data are used. dictionary data are used.
Instead of NDBITS at the end of Appendix A of [RFC7932], a custom Instead of NDBITS at the end of Appendix A of [RFC7932], a custom
SIZE_BITS_BY_LENGTH per custom word list is used. SIZE_BITS_BY_LENGTH per custom word list is used.
The copy length for a static dictionary reference must be between The copy length for a static dictionary reference must be between
4 and 31 and may not be a value for which SIZE_BITS_BY_LENGTH of 4 and 31 and may not be a value for which SIZE_BITS_BY_LENGTH of
this dictionary is 0. this dictionary is 0.
If a custom transforms list is set without context dependency, then If a custom transforms list is set without context dependency, then
the following behavior of the RFC 7932 decoder [RFC7932] is the following behaviors of the decoder defined in [RFC7932] are
overridden: overridden:
The "List of Word Transformations" from Appendix B of [RFC7932] is The "List of Word Transformations" from Appendix B of [RFC7932] is
overridden by one or more lists of custom prefixes, suffixes, and overridden by one or more lists of custom prefixes, suffixes, and
transform operations. transform operations.
The transform_id must be smaller than the number of transforms The transform_id must be smaller than the number of transforms
given in the custom transforms list. given in the custom transforms list.
If the dictionary is context dependent, it includes a lookup table of If the dictionary is context dependent, it includes a lookup table of
a 64-word list and transform list combinations. When resolving a a 64 word list and transform list combinations. When resolving a
static dictionary word, the decoder computes the literal Context ID static dictionary word, the decoder computes the literal Context ID
as described in Section 7.1 of [RFC7932]. The literal Context ID is as described in Section 7.1 of [RFC7932]. The literal Context ID is
used as the index in the lookup tables to select the word list and used as the index in the lookup tables to select the word list and
transforms to use. If the dictionary is not context dependent, this transforms to use. If the dictionary is not context dependent, this
ID is implicitly 0 instead. ID is implicitly 0 instead.
If a distance goes beyond the dictionary for the current ID and If a distance goes beyond the dictionary for the current ID and
multiple word/transform list combinations are defined, then a next multiple word/transform list combinations are defined, then the next
dictionary is used in the following order: if not context dependent, dictionary is used in the following order:
the same order as defined in the shared dictionary. If context
dependent, the index matching the current context is used first, the * If context dependent:
same order as defined in the shared dictionary excluding the current
context are used next. - use the index matching the current context first, and then
- use the same order as defined in the shared dictionary
(excluding the current context) next.
* If not context dependent:
- use the same order as defined in the shared dictionary.
3.1.1. Transform Operations 3.1.1. Transform Operations
A shared dictionary may include custom word transformations to A shared dictionary may include custom word transformations to
replace those specified in Section 8 and Appendix B of [RFC7932]. A replace those specified in Section 8 and Appendix B of [RFC7932]. A
transform consists of a possible prefix, a transform operation, for transform consists of a possible prefix, a transform operation, a
some operations a parameter, and a possible suffix. In the shared parameter (for some operations), and a possible suffix. In the
dictionary format, the transform operation is represented by a shared dictionary format, the transform operation is represented by a
numerical ID, which is listed in the table below. numerical ID, which is listed in the table below.
+====+===========================+ +====+===========================+
| ID | Operation | | ID | Operation |
+====+===========================+ +====+===========================+
| 0 | Identity | | 0 | Identity |
+----+---------------------------+ +----+---------------------------+
| 1 | OmitLast1 | | 1 | OmitLast1 |
+----+---------------------------+ +----+---------------------------+
| 2 | OmitLast2 | | 2 | OmitLast2 |
skipping to change at line 464 skipping to change at line 472
4. Varint Encoding 4. Varint Encoding
A varint is encoded in base 128 in one or more bytes as follows: A varint is encoded in base 128 in one or more bytes as follows:
+--------+--------+ +--------+ +--------+--------+ +--------+
|1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx|
+--------+--------+ +--------+ +--------+--------+ +--------+
where the "x" bits of the first byte are the LSBs of the value and where the "x" bits of the first byte are the LSBs of the value and
the "x" bits of the last byte are the MSBs of the value. The last the "x" bits of the last byte are the MSBs of the value. The last
byte must have its MSB set to 0, all other bytes to 1 to indicate byte must have its MSB set to 0 and all other bytes must have their
there is a next byte. MSBs set to 1 to indicate there is a next byte.
The maximum allowed amount of bits to read is 63 bits; if the 9th The maximum allowed amount of bits to read is 63 bits; if the 9th
byte is present and has its MSB set, then the stream must be byte is present and has its MSB set, then the stream must be
considered as invalid. considered as invalid.
5. Shared Dictionary Stream 5. Shared Dictionary Stream
The shared dictionary stream encodes a custom dictionary for brotli, The shared dictionary stream encodes a custom dictionary for brotli,
including custom words and/or custom transformations. A shared including custom words and/or custom transformations. A shared
dictionary may appear as a standalone or as contents of a resource in dictionary may appear as a standalone or as contents of a resource in
a framing format container. a framing format container.
A compliant shared brotli dictionary stream must have the following A compliant shared brotli dictionary stream must have the following
format: format:
2 bytes: File signature, in hexadecimal the bytes 91, 0. 2 bytes: File signature in hexadecimal format (bytes 91 and 0).
varint: LZ77_DICTIONARY_LENGTH. The number of bytes for an LZ7711 varint: LZ77_DICTIONARY_LENGTH. The number of bytes for an LZ7711
dictionary or 0 if there is none. The maximum allowed value is dictionary, or 0 if there is none. The maximum allowed value is
the maximum possible sliding window size of brotli or large window the maximum possible sliding window size of brotli or large window
brotli. brotli.
LZ77_DICTIONARY_LENGTH bytes: Contents of the LZ77 dictionary. LZ77_DICTIONARY_LENGTH bytes: Contents of the LZ77 dictionary.
1 byte: NUM_CUSTOM_WORD_LISTS. May have a value of 0 to 64. 1 byte: NUM_CUSTOM_WORD_LISTS. May have a value in range 0 to 64.
NUM_CUSTOM_WORD_LISTS times a word list with the following format NUM_CUSTOM_WORD_LISTS times a word list with the following format
for each word list: for each word list:
28 bytes: SIZE_BITS_BY_LENGTH. An array of 28 unsigned 8-bit 28 bytes: SIZE_BITS_BY_LENGTH. An array of 28 unsigned 8-bit
integers, indexed by word lengths 4 to 31. The value integers, indexed by word lengths 4 to 31. The value
represents log2(number of words of this length), with the represents log2(number of words of this length), with the
exception of 0 meaning 0 words of this length. The max allowed exception of 0 meaning 0 words of this length. The max allowed
length value is 15 bits. OFFSETS_BY_LENGTH is computed from length value is 15 bits. OFFSETS_BY_LENGTH is computed from
this as OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + this as OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] +
(SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) : 0). (SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) : 0).
N bytes: Words dictionary data, where N is OFFSETS_BY_LENGTH[31] N bytes: Words dictionary data, where N is OFFSETS_BY_LENGTH[31]
+ (SIZE_BITS_BY_LENGTH[31] ? (31 << SIZE_BITS_BY_LENGTH[31]) : + (SIZE_BITS_BY_LENGTH[31] ? (31 << SIZE_BITS_BY_LENGTH[31]) :
0), with all the words of shortest length first, then all words 0), with all the words of shortest length first, then all words
of the next length, and so on, where there are either 0 or a of the next length, and so on, where there are either 0 or a
positive power of two number of words for each length. positive power of two number of words for each length.
1 byte: NUM_CUSTOM_TRANSFORM_LISTS. May have a value of 0 to 64. 1 byte: NUM_CUSTOM_TRANSFORM_LISTS. May have a value in range 0 to
64.
NUM_CUSTOM_TRANSFORM_LISTS times a transform list with the NUM_CUSTOM_TRANSFORM_LISTS times a transform list with the
following format for each transform list: following format for each transform list:
2 bytes: PREFIX_SUFFIX_LENGTH. The length of prefix/suffix data. 2 bytes: PREFIX_SUFFIX_LENGTH. The length of prefix/suffix data.
Must be at least 1 because the list must always end with a Must be at least 1 because the list must always end with a
zero-length stringlet even if it is empty. zero-length stringlet even if it is empty.
NUM_PREFIX_SUFFIX times: Prefix/suffix stringlet. NUM_PREFIX_SUFFIX times: Prefix/suffix stringlet.
NUM_PREFIX_SUFFIX is the number of stringlets parsed and must NUM_PREFIX_SUFFIX is the number of stringlets parsed and must
skipping to change at line 533 skipping to change at line 542
for the last (terminating) entry of the transform list. For for the last (terminating) entry of the transform list. For
other entries, STRING_LENGTH must be in range 1..255. The 0 other entries, STRING_LENGTH must be in range 1..255. The 0
entry must be present and must be the last byte of the entry must be present and must be the last byte of the
PREFIX_SUFFIX_LENGTH bytes of prefix/suffix data, else the PREFIX_SUFFIX_LENGTH bytes of prefix/suffix data, else the
stream must be rejected as invalid. stream must be rejected as invalid.
STRING_LENGTH bytes: Contents of the prefix/suffix. STRING_LENGTH bytes: Contents of the prefix/suffix.
1 byte: NTRANSFORMS. Number of transformation triplets. 1 byte: NTRANSFORMS. Number of transformation triplets.
NTRANSFORMS times: Data for each transform: NTRANSFORMS times the data for each transform listed below:
1 byte: Index of prefix in prefix/suffix data; must be less 1 byte: Index of prefix in prefix/suffix data; must be less
than NUM_PREFIX_SUFFIX. than NUM_PREFIX_SUFFIX.
1 byte: Index of suffix in prefix/suffix data; must be less 1 byte: Index of suffix in prefix/suffix data; must be less
than NUM_PREFIX_SUFFIX. than NUM_PREFIX_SUFFIX.
1 byte: Operation index; must be an index in the table of 1 byte: Operation index; must be an index in the table of
operations listed in Section 3.1.1. operations listed in Section 3.1.1.
If and only if at least one transform has operation index If and only if at least one transform has operation index
ShiftFirst or ShiftAll: ShiftFirst or ShiftAll, then NTRANSFORMS times the following:
NTRANSFORMS times:
2 bytes: Parameters for the transform. If the transform 2 bytes: Parameters for the transform. If the transform does
does not have type ShiftFirst or ShiftAll, the value must not have type ShiftFirst or ShiftAll, the value must be 0.
be 0. ShiftFirst and ShiftAll interpret these bytes as ShiftFirst and ShiftAll interpret these bytes as an unsigned
an unsigned 16-bit integer. 16-bit integer.
If NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > 0 If NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > 0
(else implicitly NUM_DICTIONARIES is 1 and points to the brotli (else implicitly NUM_DICTIONARIES is 1 and points to the brotli
built-in and there is no context map): built-in and there is no context map):
1 byte: NUM_DICTIONARIES. May have value 1 to 64. Each 1 byte: NUM_DICTIONARIES. May have a value in range 1 to 64.
dictionary is a combination of a word list and a transform Each dictionary is a combination of a word list and a transform
list. Each next dictionary is used when the distance goes list. Each next dictionary is used when the distance goes
beyond the previous. If a CONTEXT_MAP is enabled, then the beyond the previous. If a CONTEXT_MAP is enabled, then the
dictionary matching the context is moved to the front in the dictionary matching the context is moved to the front in the
order for this context. order for this context.
NUM_DICTIONARIES times: The DICTIONARY_MAP: NUM_DICTIONARIES times the DICTIONARY_MAP, which contains:
1 byte: Index into a custom word list or value 1 byte: Index into a custom word list or value
NUM_CUSTOM_WORD_LISTS to indicate using the brotli [RFC7932] NUM_CUSTOM_WORD_LISTS to indicate using the brotli [RFC7932]
built-in default word list. built-in default word list.
1 byte: Index into a custom transform list or value 1 byte: Index into a custom transform list or value
NUM_CUSTOM_TRANSFORM_LISTS to indicate using the brotli NUM_CUSTOM_TRANSFORM_LISTS to indicate using the brotli
[RFC7932] built-in default transform list. [RFC7932] built-in default transform list.
1 byte: CONTEXT_ENABLED. If 0, there is no context map. If 1, a 1 byte: CONTEXT_ENABLED. If 0, there is no context map. If 1, a
skipping to change at line 592 skipping to change at line 599
first dictionary to use for this context. first dictionary to use for this context.
6. Large Window Brotli Compressed Data Stream 6. Large Window Brotli Compressed Data Stream
Large window brotli allows a sliding window beyond the 24-bit maximum Large window brotli allows a sliding window beyond the 24-bit maximum
of regular brotli [RFC7932]. of regular brotli [RFC7932].
The compressed data stream is backwards compatible to brotli The compressed data stream is backwards compatible to brotli
[RFC7932] and may optionally have the following differences: [RFC7932] and may optionally have the following differences:
Encoding of WBITS in the stream header: The following new pattern of In the encoding of WBITS in the stream header, the following new
14 bits is supported: pattern of 14 bits is supported:
8 bits: Value 00010001 to indicate a large window brotli stream. 8 bits: Value 00010001 to indicate a large window brotli stream.
6 bits: WBITS. Must have value in range 10 to 62. 6 bits: WBITS. Must have value in range 10 to 62.
Distance alphabet: If the stream is a large window brotli stream, Distance alphabet: If the stream is a large window brotli stream,
the maximum number of extra bits is 62 and the theoretical maximum the maximum number of extra bits is 62 and the theoretical maximum
size of the distance alphabet is (16 + NDIRECT + (124 << size of the distance alphabet is (16 + NDIRECT + (124 <<
NPOSTFIX)). This overrides the value for the distance alphabet NPOSTFIX)). This overrides the value for the distance alphabet
size given in Section 3.3 of [RFC7932] and affects the number of size given in Section 3.3 of [RFC7932] and affects the number of
skipping to change at line 638 skipping to change at line 645
* The stream may have the format of regular brotli [RFC7932] or the * The stream may have the format of regular brotli [RFC7932] or the
format of large window brotli as described in Section 6. format of large window brotli as described in Section 6.
8. Shared Brotli Framing Format Stream 8. Shared Brotli Framing Format Stream
A compliant shared brotli framing format stream has the format A compliant shared brotli framing format stream has the format
described below. described below.
8.1. Main Format 8.1. Main Format
4 bytes: File signature, in hexadecimal the bytes 0x91, 0x0a, 0x42, 4 bytes: File signature in hexadecimal format (bytes 0x91, 0x0a,
0x52. The first byte contains the invalid WBITS combination for 0x42, and 0x52). The first byte contains the invalid WBITS
brotli [RFC7932] and large window brotli. combination for brotli [RFC7932] and large window brotli.
1 byte: Container flags that are 8 bits and have the following 1 byte: Container flags that are 8 bits and have the following
meanings: meanings:
bit 0 and 1: Version indicator that must be b'00. Otherwise, the bits 0 and 1: Version indicator that must be b'00. Otherwise,
decoder must reject the data stream as invalid. the decoder must reject the data stream as invalid.
bit 2: If 0, the file contains no final footer, may not contain bit 2: If 0, the file contains no final footer, may not contain
any metadata chunks, may not contain a central directory, and any metadata chunks, may not contain a central directory, and
may encode only a single resource (using one or more data may encode only a single resource (using one or more data
chunks). If 1, the file may contain one or more resources, chunks). If 1, the file may contain one or more resources,
metadata, and a central directory, and it must contain a final metadata, and a central directory, and it must contain a final
footer. footer.
multiple times: A chunk, each with the format specified in multiple times: A chunk, each with the format specified in
Section 8.2. Section 8.2.
skipping to change at line 707 skipping to change at line 714
can be 1 serialized dictionary and 15 prefix dictionaries can be 1 serialized dictionary and 15 prefix dictionaries
maximum (a serialized dictionary may already contain one of maximum (a serialized dictionary may already contain one of
those). Circular references are not allowed (any dictionary those). Circular references are not allowed (any dictionary
reference that directly or indirectly uses this chunk itself as reference that directly or indirectly uses this chunk itself as
dictionary). dictionary).
Per dictionary reference: Per dictionary reference:
1 byte: Flags: 1 byte: Flags:
bit 0 and 1: Dictionary source: bits 0 and 1: Dictionary source:
00: Internal dictionary reference to a full resource by 00: Internal dictionary reference to a full resource by
pointer, which can span one or more chunks. Must pointer, which can span one or more chunks. Must
point to a full data chunk or a first partial data point to a full data chunk or a first partial data
chunk. chunk.
01: Internal dictionary reference to single chunk 01: Internal dictionary reference to single chunk
contents by pointer. May point to any chunk with contents by pointer. May point to any chunk with
content (data or metadata). If a partial data content (data or metadata). If a partial data
chunk, only this part is the dictionary. In this chunk, only this part is the dictionary. In this
skipping to change at line 731 skipping to change at line 738
10: Reference to a dictionary by hash code of a 10: Reference to a dictionary by hash code of a
resource. The dictionary can come from an external resource. The dictionary can come from an external
source, such as a different container. The user of source, such as a different container. The user of
the decoder must be able to provide the dictionary the decoder must be able to provide the dictionary
contents given its hash code (even if it comes from contents given its hash code (even if it comes from
this container itself) or treat it as an error when this container itself) or treat it as an error when
the user does not have it available. the user does not have it available.
11: Invalid bit combination 11: Invalid bit combination
bit 2 and 3: Dictionary type: bits 2 and 3: Dictionary type:
00: Prefix dictionary, set in front of the sliding 00: Prefix dictionary, set in front of the sliding
window window
01: Serialized dictionary in the shared brotli format as 01: Serialized dictionary in the shared brotli format as
specified in Section 5. specified in Section 5.
10: Invalid bit combination 10: Invalid bit combination
11: Invalid bit combination 11: Invalid bit combination
bit 4-7: Must be 0 bits 4-7: Must be 0
If hash-based: If hash-based:
1 byte: Type of hash used. Only supported value: 3, 1 byte: Type of hash used. Only supported value: 3,
indicating 256-bit HighwayHash [HWYHASH]. indicating 256-bit HighwayHash [HWYHASH].
32 bytes: 256-bit HighwayHash checksum to refer to 32 bytes: 256-bit HighwayHash checksum to refer to
dictionary. dictionary.
If pointer based: Varint-encoded pointer to its chunk in this If pointer based: Varint-encoded pointer to its chunk in this
container. The chunk must come in the container earlier container. The chunk must come in the container earlier
than the current chunk. than the current chunk.
X bytes: Extra header bytes, depending on CHUNK_TYPE. If present, X bytes: Extra header bytes, depending on CHUNK_TYPE. If present,
they are specified in the subsequent sections. they are specified in the subsequent sections.
remaining bytes: The chunk contents. The uncompressed data in remaining bytes: The chunk contents. The uncompressed data in the
the chunk content depends on CHUNK_TYPE and is specified in the chunk content depends on CHUNK_TYPE and is specified in the
subsequent sections. The compressed data has following format subsequent sections. The compressed data has following format
depending on CODEC: depending on CODEC:
* uncompressed: The raw bytes. * uncompressed: The raw bytes.
* If "keep decoder", the continuation of the compressed stream * If "keep decoder", the continuation of the compressed stream
that was interrupted at the end of the previous chunk. The that was interrupted at the end of the previous chunk. The
decoder from the previous chunk must be used and its state decoder from the previous chunk must be used and its state it
it had at the end of the previous chunk must be kept at the had at the end of the previous chunk must be kept at the start
start of the decoding of this chunk. of the decoding of this chunk.
* brotli: The bytes are in brotli format [RFC7932]. * brotli: The bytes are in brotli format [RFC7932].
* shared brotli: The bytes are in the shared brotli format * shared brotli: The bytes are in the shared brotli format
specified in Section 7. specified in Section 7.
8.3. Metadata Format 8.3. Metadata Format
All the metadata chunk types use the following format for the All the metadata chunk types use the following format for the
uncompressed content: uncompressed content:
Per field: Per field:
2 bytes: Code to identify this metadata field. This must be two 2 bytes: Code to identify this metadata field. This must be two
lowercase or two uppercase alpha ASCII characters. If the lowercase or two uppercase alpha ASCII characters. If the
decoder encounters a lowercase field that it does not recognize decoder encounters a lowercase field that it does not recognize
skipping to change at line 828 skipping to change at line 835
This chunk contains metadata that applies to the resource whose This chunk contains metadata that applies to the resource whose
beginning is encoded in the subsequent data chunk or first partial beginning is encoded in the subsequent data chunk or first partial
data chunk. data chunk.
The contents of this chunk follows the format described in The contents of this chunk follows the format described in
Section 8.3. Section 8.3.
The following field types are recognized: The following field types are recognized:
id: Name field. May appear 0 or 1 times. Has the following format: id (N bytes): Name field. May appear 0 or 1 times. Has the
following format: name in UTF-8 encoding, length determined by the
N bytes: Name in UTF-8 encoding, length determined by the field field length. Treated generically but may be used as a filename.
length. Treated generically but may be used as a filename. If If used as a filename, forward slashes '/' should be used as
used as a filename, forward slashes '/' should be used as directory separators, relative paths should be used, and filenames
directory separators, relative paths should be used, and ending in a slash with 0-length content in the matching data chunk
filenames ending in a slash with 0-length content in the should be treated as an empty directory.
matching data chunk should be treated as an empty directory.
mt: Modification type. May appear 0 or 1 times. Has the following
format:
8 bytes: Microseconds since epoch, as a little-endian, signed mt (8 bytes): Modification type. May appear 0 or 1 times. Has the
two's complement 64-bit integer. following format: contains microseconds since epoch, as a little-
endian, signed two's complement 64-bit integer.
custom user field: Any two uppercase ASCII characters. custom user field: Any two uppercase ASCII characters.
8.4.3. Data Chunk (Type 2) 8.4.3. Data Chunk (Type 2)
A data chunk contains the actual data of a resource. A data chunk contains the actual data of a resource.
This chunk has the following extra header bytes: This chunk has the following extra header bytes:
1 byte: Flags: 1 byte: Flags:
bit 0: If true, indicates this is not a resource that should be bit 0: If true, indicates this is not a resource that should be
output implicitly as part of extracting resources from this output implicitly as part of extracting resources from this
container. Instead, it may be referred to only explicitly, container. Instead, it may be referred to only explicitly,
e.g., as a dictionary reference by hash code or offset. This e.g., as a dictionary reference by hash code or offset. This
flag should be set for data used as dictionary to improve flag should be set for data used as dictionary to improve
compression of actual resources. compression of actual resources.
bit 1: If true, hash code is given bit 1: If true, hash code is given.
bits 2-7: Must be zero. bits 2-7: Must be zero.
If hash code is given: If hash code is given:
1 byte: Type of hash used. Only supported value: 3, indicating 1 byte: Type of hash used. Only supported value: 3, indicating
256-bit HighwayHash [HWYHASH]. 256-bit HighwayHash [HWYHASH].
32 bytes: 256-bit HighwayHash checksum of the uncompressed data. 32 bytes: 256-bit HighwayHash checksum of the uncompressed data.
skipping to change at line 1003 skipping to change at line 1007
8.4.10. Central Directory Chunk (Type 9) 8.4.10. Central Directory Chunk (Type 9)
The central directory chunk along with the repeat metadata chunks The central directory chunk along with the repeat metadata chunks
allow quickly finding and listing compressed resources in the allow quickly finding and listing compressed resources in the
container file. container file.
The central directory chunk is always uncompressed and does not have The central directory chunk is always uncompressed and does not have
the codec byte. It instead has the following format: the codec byte. It instead has the following format:
varint: Pointer into the file where the repeat metadata chunks are varint: Pointer into the file where the repeat metadata chunks are
located or 0 if they are not present per chunk listed: located or 0 if they are not present.
per chunk listed:
varint: Pointer into the file where this chunk begins. varint: Pointer into the file where this chunk begins.
varint: Number of header bytes N used below. varint: Number of header bytes N used below.
N bytes: Copy of all the header bytes of the pointed at chunk, N bytes: Copy of all the header bytes of the pointed at chunk,
including total size, chunk type byte, codec, uncompressed including total size, chunk type byte, codec, uncompressed
size, dictionary references, and X extra header bytes. The size, dictionary references, and X extra header bytes. The
content is not repeated here. content is not repeated here.
The last listed chunk is reached when the end of the contents of the The last listed chunk is reached when the end of the contents of the
central directory are reached. If the end does not match the last central directory are reached. If the end does not match the last
byte of the central directory, the decoder must reject the data byte of the central directory, the decoder must reject the data
stream as invalid. stream as invalid.
If present, the central directory must list all data and metadata If present, the central directory must list all data and metadata
chunks of all types. chunks of all types.
8.4.11. Final Footer Chunk (Type 10) 8.4.11. Final Footer Chunk (Type 10)
The final footer chunk closes the file and is only present if in the The final footer chunk closes the file and is only present if bit 2
initial container header flags bit 2 was set. of the initial container flags was set.
This chunk has the following content, which is always uncompressed: This chunk has the following content, which is always uncompressed:
reversed varint: Size of this entire framing format file, including reversed varint: Size of this entire framing format file, including
these bytes themselves, or 0 if this size is not given. these bytes themselves, or 0 if this size is not given.
reversed varint: Pointer to the start of the central directory, or 0 reversed varint: Pointer to the start of the central directory, or 0
if there is none. if there is none.
A reversed varint has the same format as a varint but its bytes are A reversed varint has the same format as a varint but its bytes are
skipping to change at line 1092 skipping to change at line 1098
The dictionary must be treated with the same security precautions as The dictionary must be treated with the same security precautions as
the content because a change to the dictionary can result in a change the content because a change to the dictionary can result in a change
to the decompressed content. to the decompressed content.
The CRIME attack [CRIME] shows that it's a bad idea to compress data The CRIME attack [CRIME] shows that it's a bad idea to compress data
from mixed (e.g., public and private) sources -- the data sources from mixed (e.g., public and private) sources -- the data sources
include not only the compressed data but also the dictionaries. For include not only the compressed data but also the dictionaries. For
example, if you compress secret cookies using a public-data-only example, if you compress secret cookies using a public-data-only
dictionary, you still leak information about the cookies. dictionary, you still leak information about the cookies.
Not only can the dictionary reveal information about the compressed The dictionary can reveal information about the compressed data and
data, but vice versa; data compressed with the dictionary can reveal vice versa. That is, data compressed with the dictionary can reveal
the contents of the dictionary when an adversary can control parts of contents of the dictionary when an adversary can control parts of the
data to compress and see the compressed size. On the other hand, if data to compress and see the compressed size. On the other hand, if
the adversary can control the dictionary, the adversary can learn the adversary can control the dictionary, the adversary can learn
information about the compressed data. information about the compressed data.
The most robust defense against CRIME is not to compress private The most robust defense against CRIME is not to compress private
data, e.g., sensitive headers like cookies or any content with data, e.g., sensitive headers like cookies or any content with
personally identifiable information (PII). The challenge has been to personally identifiable information (PII). The challenge has been to
identify secrets within a vast amount of data to be compressed. identify secrets within a vast amount of data to be compressed.
Cloudflare uses a regular expression [CLOUDFLARE]. Another idea is Cloudflare uses a regular expression [CLOUDFLARE]. Another idea is
to extend existing web template systems (e.g., Soy [SOY]) to allow to extend existing web template systems (e.g., Soy [SOY]) to allow
skipping to change at line 1173 skipping to change at line 1179
[CRIME] CVE Program, "CVE-2012-4929", [CRIME] CVE Program, "CVE-2012-4929",
<https://www.cve.org/CVERecord?id=CVE-2012-4929>. <https://www.cve.org/CVERecord?id=CVE-2012-4929>.
[LZ77] Ziv, J. and A. Lempel, "A Universal Algorithm for [LZ77] Ziv, J. and A. Lempel, "A Universal Algorithm for
Sequential Data Compression", IEEE Transactions on Sequential Data Compression", IEEE Transactions on
Information Theory, vol. 23, no. 3, pp. 337-343, Information Theory, vol. 23, no. 3, pp. 337-343,
DOI 10.1109/TIT.1977.1055714, May 1977, DOI 10.1109/TIT.1977.1055714, May 1977,
<https://doi.org/10.1109/TIT.1977.1055714>. <https://doi.org/10.1109/TIT.1977.1055714>.
[SOY] Google Developers, "Closure Tools", [SOY] Google Developers, "Closure Tools",
<https://developers.google.com/closure/templates/>. <https://developers.google.com/closure>.
Acknowledgments Acknowledgments
The authors would like to thank Robert Obryk for suggesting The authors would like to thank Robert Obryk for suggesting
improvements to the format and the text of the specification. improvements to the format and the text of the specification.
Authors' Addresses Authors' Addresses
Jyrki Alakuijala Jyrki Alakuijala
Google, Inc. Google, Inc.
 End of changes. 34 change blocks. 
75 lines changed or deleted 81 lines changed or added

This html diff was produced by rfcdiff 1.48.