wdiff rfc9841.original.xml rfc9841.xml

<?xml version='1.0' encoding='utf-8'?>

<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="IETF" docName="draft-vandevenne-shared-brotli-format-15" number="9841" consensus="true" category="info" updates="7932" ipr="trust200902" obsoletes="" xml:lang="en" symRefs="true" sortRefs="true" tocInclude="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.27.0 -->
  <!-- Generated by id2xml 1.5.2 on 2025-02-12T17:39:08Z -->

  <front>
    <title abbrev="Shared Brotli Data Format">Shared Brotli Compressed Data Format</title>
    <seriesInfo name="Internet-Draft" value="draft-vandevenne-shared-brotli-format-15"/> name="RFC" value="9841"/>
    <author initials="J." surname="Alakuijala" fullname="Jyrki Alakuijala">
      <organization abbrev="Google, Inc">Google, Inc.</organization>
      <address>
        <email>jyrki@google.com</email>
      </address>
    </author>
    <author initials="T." surname="Duong" fullname="Thai Duong">
      <organization abbrev="Google, Inc">Google, Inc.</organization>
      <address>
        <email>thaidn@google.com</email>
      </address>
    </author>
    <author initials="E." surname="Kliuchnikov" fullname="Evgenii Kliuchnikov">
      <organization abbrev="Google, Inc">Google, Inc.</organization>
      <address>
        <email>eustas@google.com</email>
      </address>
    </author>
    <author initials="Z." surname="Szabadka" fullname="Zoltan Szabadka">
      <organization abbrev="Google, Inc">Google, Inc.</organization>
      <address>
        <email>szabadka@google.com</email>
      </address>
    </author>
    <author initials="L." surname="Vandevenne" fullname="Lode Vandevenne"> Vandevenne" role="editor">
      <organization abbrev="Google, Inc">Google, Inc.</organization>
      <address>
        <email>lode@google.com</email>
      </address>
    </author>
    <date year="2025" month="June"/> month="August"/>

    <area>WIT</area>

<keyword>dictionary compression lz77</keyword>

    <abstract>
      <t>
   This specification defines a data format for shared brotli
   compression, which adds support for shared dictionaries, large window window,
   and a container format to brotli (RFC 7932). Shared dictionaries and
   large window support allow significant compression gains compared to
   regular brotli. This document updates specifies an extension to the method defined in RFC 7932.</t>
    </abstract>
  </front>
  <middle>
    <section anchor="sect-1" numbered="true" toc="default">
      <name>Introduction</name>
      <section anchor="sect-1.1" numbered="true" toc="default">
        <name>Purpose</name>
        <t>
   The purpose of this specification is to extend the brotli compressed
   data format (<xref <xref target="RFC7932" format="default"/>) format="default"/> with new abilities that allow further
   compression gains:</t> gains.</t>
        <ul spacing="normal">
              <li>
                <t>Shared dictionaries allow a static shared context between
        encoder and decoder for significant compression gains.</t>
              </li>
              <li>
                <t>Large window brotli allows much larger back reference distances
        to give compression gains for files over 16MiB.</t> 16 MiB.</t>
              </li>
              <li>
                <t>The framing format is a container format that allows storage of
        multiple resources and that reference references dictionaries.</t>
              </li>
            </ul>
        <t>
   This document is the authoritative specification of shared brotli
   data formats and the backwards compatible changes to brotli, and
   defines:</t> brotli. This document also defines the following:</t>
        <ul>
          <li>
                <t>The data format of serialized shared dictionaries</t>
              </li>
              <li>
                <t>The data format of the framing format</t>
              </li>
              <li>
                <t>The encoding of window bits and distances for large window
        brotli in the brotli data format</t>
              </li>
              <li>
                <t>The encoding of shared dictionary references in the brotli data
        format</t>
              </li>
        </ul>
      </section>
      <section anchor="sect-1.2" numbered="true" toc="default">
        <name>Intended audience</name> Audience</name>
        <t>
   This specification is intended for use by software implementers to
   compress data into and/or decompress data from the shared brotli
   dictionary format.</t>
        <t>
   The text of the specification assumes a basic background in
   programming at the level of bits and other primitive data
   representations. Familiarity with the technique of LZ77 coding <xref target="LZ77"/>
   is helpful helpful, but not required.</t>
      </section>
      <section anchor="sect-1.3" numbered="true" toc="default">
        <name>Scope</name>
        <t>
   This specification defines a data format for shared brotli
   compression, which adds support for dictionaries and extended
   features to brotli <xref target="RFC7932" format="default"/>.</t>
      </section>
      <section anchor="sect-1.4" numbered="true" toc="default">
        <name>Compliance</name>
        <t>
   Unless otherwise indicated below, a compliant decompressor must be
   able to accept and decompress any data set that conforms to all the
   specifications presented here. A Additionally, a compliant compressor must produce
   data sets that conform to all the specifications presented here.</t>
      </section>
      <section anchor="sect-1.5" numbered="true" toc="default">
        <name>Definitions of terms Terms and conventions used</name> Conventions Used</name>
<dl>
<dt>Byte:</dt><dd> 8
<dt>Byte:</dt><dd>8 bits stored or transmitted as a unit (same as an octet).  For
   this specification, a byte is exactly 8 bits, even on machines that
   store a character on a number of bits different from eight.  See
   below for the numbering of bits within a byte.</dd>

<dt>String:</dt><dd>a

<dt>String:</dt><dd>A sequence of arbitrary bytes.</dd>
</dl>
        <t>
   Bytes stored within a computer do not have a "bit order", order" since they are
   always treated as a unit. However, a byte considered as an integer between
   0 and 255 does have a most- most significant bit (MSB) and least-significant
   bit, least significant bit
   (LSB), and since we write numbers with the most-significant most significant digit on the left, we also write
   bytes with the most-significant bit MSB are also written on the left. In the diagrams below, we number the
   bits of a byte are written so that bit 0 is the least-significant bit, LSB, i.e., the bits are numbered:</t>
   numbered as follows:</t>
        <artwork name="" type="" align="left" alt=""><![CDATA[
   +--------+
   |76543210|
   +--------+
]]></artwork>
        <t>
   Within a computer, a number may occupy multiple bytes. All multi-byte
   numbers in the format described here are unsigned and stored with the
   least-significant
   least significant byte first (at the lower memory address). For
	example, the decimal 16-bit number 520 is stored as:</t>

        <artwork name="" type="" align="left" alt=""><![CDATA[
   0        1
   +--------+--------+
   |00001000|00000010|
   +--------+--------+
   ^        ^
   |        |
   |        + more significant byte = 2 x 256
   + less significant byte = 8
]]></artwork>
        <section anchor="sect-1.5.1" numbered="true" toc="default">
          <name>Packing into bytes</name> Bytes</name>
          <t>
   This document does not address the issue of the order in which bits
   of a byte are transmitted on a bit-sequential medium, since the final
   data format described here is byte- rather than bit-oriented.
   However, we describe the compressed block format is described below as a sequence
   of data elements of various bit lengths, not a sequence of bytes. We Therefore,
   we must therefore specify how to pack these data elements into bytes to
   form the final compressed byte sequence:</t>
              <ul spacing="normal">
                <li>
                  <t>Data elements are packed into bytes in order of
        increasing bit number within the byte, i.e., starting
        with the least-significant bit LSB of the byte.</t>
                </li>
                <li>
                  <t>Data elements other than prefix codes are packed
        starting with the least-significant bit LSB of the data
        element. These are referred to here as integer values
        and are considered unsigned.</t>
                </li>
                <li>
                  <t>Prefix codes are packed starting with the most-significant
        bit MSB of the code.</t>
                </li>
              </ul>
<t>
   In other words, if one were to print out the compressed data as a
   sequence of bytes, bytes starting with the first byte at the *right* <strong>right</strong> margin
   and proceeding to the *left*, <strong>left</strong>, with the most-significant bit MSB of each
   byte on the left as usual, one would be able to parse the result from
   right to left, left with fixed-width elements in the correct MSB-to-LSB
   order and prefix codes in bit-reversed order (i.e., with the first
   bit of the code in the relative LSB position).</t>
          <t>
   As an example, consider packing the following data elements into a
   sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2,
   3-bit prefix code b'110, 2-bit prefix code b'10, and 12-bit integer value
   3628.</t>
          <artwork name="" type="" align="left" alt=""><![CDATA[
     byte 2   byte 1   byte 0
   +--------+--------+--------+
   |11100010|11000101|10010110|
   +--------+--------+--------+
    ^            ^ ^   ^   ^
    |            | |   |   |
    |            | |   |   +------ integer value 6
    |            | |   +---------- integer value 2
    |            | +-------------- prefix code 110
    |            +---------------- prefix code 10
    +----------------------------- integer value 3628
]]></artwork>
        </section>
      </section>
    </section>
    <section anchor="sect-2" numbered="true" toc="default">
      <name>Shared Brotli Overview</name>
      <t>
   Shared brotli extends brotli <xref target="RFC7932" format="default"/> with support for shared
   dictionaries, a larger LZ77 window window, and a framing format.</t>
    </section>
    <section anchor="sect-3" numbered="true" toc="default">
      <name>Shared Dictionaries</name>
      <t>
   A shared dictionary is a piece of data shared by a compressor and
   decompressor. The compressor can take advantage of the dictionary
   context to encode the input in a more compact manner. The compressor
   and the decompressor must use exactly the same dictionary. A shared
   dictionary is specially useful to compress short input sequences.</t>
        <t>A shared brotli dictionary can use two methods of sharing context:</t>
              <t>An LZ77 dictionary. The
              <dl><dt>LZ77 dictionary:</dt><dd>The encoder and decoder could refer
        to a given sequence of bytes. Multiple LZ77 dictionaries
        can be set.</t>
              <t>A custom set.</dd>
              <dt>Custom static dictionary: a dictionary:</dt><dd>A word list with transforms. The
        encoder and decoder will replace the static dictionary data
        with the data in the shared dictionary. The original static
        dictionary is described in <xref target="sect-8" format="default"/> in <xref target="RFC7932" format="default"/>. The original
        data from Appendix A Appendices <xref section="A" target="RFC7932" sectionFormat="bare"/> and Appendix B of <xref section="B" target="RFC7932" format="default"/> sectionFormat="bare"/> of <xref target="RFC7932"/> will be
        replaced. In addition, it is possible to dynamically switch
        this dictionary based on the data compression context, context and/or
        to
        include a reference to the original dictionary in the custom
        dictionary.</t>
        dictionary.</dd></dl>
      <t>
   If no shared dictionary is set set, the decoder behaves the same as in
   <xref target="RFC7932" format="default"/> on a brotli stream.</t>
   <t>
   If a shared dictionary is set, then it can set any of: LZ77 dictionaries, overriding override
   static dictionary words, and/or overriding override transforms.</t>
      <section anchor="sect-3.1" numbered="true" toc="default">
        <name>Custom Static Dictionaries</name>
        <t>

   If a custom word list is set, then the following behavior behaviors of the RFC
   7932 decoder defined in <xref target="RFC7932" format="default"/> is are overridden:</t>

<t indent="3">
      Instead of the Static Dictionary Data from Appendix A
      of <xref section="A" target="RFC7932" format="default"/>, one or more word lists from the custom static
      dictionary data are used.</t>
<t indent="3">
      Instead of NDBITS at the end of Appendix A, <xref section="A" target="RFC7932" format="default"/>, a custom
      SIZE_BITS_BY_LENGTH per custom word list is used.
        </t>
          <t indent="3">
      The copy length for a static dictionary reference must be
      between 4 and 31 and may not be a value for which
      SIZE_BITS_BY_LENGTH of this dictionary is 0.</t>

        <t>
   If a custom transforms list is set without context dependency, then
   the following behavior behaviors of the RFC 7932 decoder defined in <xref target="RFC7932" format="default"/> is are
   overridden:</t>
          <t indent="3">
      The "List of Word Transformations" from Appendix B <xref section="B" target="RFC7932" format="default"/> is
      overridden by one or more lists of custom prefixes, suffixes suffixes, and
      transform operations.</t>
          <t indent="3">
      The transform_id must be smaller than the number of transforms
      given in the custom transforms list.</t>

        <t>
   If the dictionary is context dependent, it includes a lookup table of
   a 64 word list and transform list combinations. When resolving a static
   dictionary word, the decoder computes the literal context id, Context ID as described in
   section 7.1. of
   <xref target="RFC7932" format="default"/>. section="7.1"/>. The literal context id Context ID is used as the index in
   the lookup tables to select the word list and transforms to use. If
   the dictionary is not context dependent, this id ID is implicitely implicitly 0
   instead.</t>
        <t>

   If a distance goes beyond the dictionary for the current id ID and
   multiple word list / transform word/transform list combinations are defined, then a the
   next dictionary is used in the following order: if not context
   dependent, the same order as defined in the shared dictionary. If order:</t>
<ul><li><t>If context dependent, dependent:</t>
<ul>
<li>use the index matching the current context is used first, and then</li>
<li>use the same order as defined in the shared dictionary excluding (excluding the current context) next.</li></ul></li>
<li><t>If not context are used next.</t> dependent:</t>
<ul>
<li>use the same order as defined in the shared dictionary.</li>
</ul></li>
	</ul>
        <section anchor="sect-3.1.1" numbered="true" toc="default">
          <name>Transform Operations</name>
          <t>
   A shared dictionary may include custom word transformations, transformations to
   replace those specified in <xref target="sect-8" format="default"/> and Appendix B of <xref section="B" target="RFC7932" format="default"/>.

A
   transform consists of a possible prefix, a transform operation, for
   some operations a parameter, parameter (for some operations), and a possible suffix. In the shared
   dictionary format, the transform operation is represented by a
   numerical ID, which is listed in the table below.</t>

<table anchor="operation-ids">  <!-- Assign an anchor -->
  <name></name>    <!-- Give the table a title -->
  <thead>
    <tr>
      <th>ID</th>    <!-- <th>:  header -->
      <th>Operation</th>
    </tr>
  </thead>
  <tbody>          <!-- The rows -->
    <tr>
      <td>0</td>     <td>Identity</td>
    </tr><tr>
      <td>1</td>     <td>OmitLast1</td>
    </tr><tr>
      <td>2</td>     <td>OmitLast2</td>
    </tr><tr>
      <td>3</td>     <td>OmitLast3</td>
    </tr><tr>
      <td>4</td>     <td>OmitLast4</td>
    </tr><tr>
      <td>5</td>     <td>OmitLast5</td>
    </tr><tr>
      <td>6</td>     <td>OmitLast6</td>
    </tr><tr>
      <td>7</td>     <td>OmitLast7</td>
    </tr><tr>
      <td>8</td>     <td>OmitLast8</td>
    </tr><tr>
      <td>9</td>     <td>OmitLast9</td>
    </tr><tr>
     <td>10</td>     <td>FermentFirst</td>
    </tr><tr>
     <td>11</td>     <td>FermentAll</td>
    </tr><tr>
     <td>12</td>     <td>OmitFirst1</td>
    </tr>
    <tr>
     <td>13</td>     <td>OmitFirst2</td>
    </tr><tr>
     <td>14</td>     <td>OmitFirst3</td>
    </tr><tr>
     <td>15</td>     <td>OmitFirst4</td>
    </tr><tr>
     <td>16</td>     <td>OmitFirst5</td>
    </tr><tr>
     <td>17</td>     <td>OmitFirst6</td>
    </tr><tr>
     <td>18</td>     <td>OmitFirst7</td>
    </tr><tr>
     <td>19</td>     <td>OmitFirst8</td>
    </tr><tr>
     <td>20</td>     <td>OmitFirst9</td>
    </tr><tr>
     <td>21</td>     <td>ShiftFirst (by PARAMETER)</td>
    </tr><tr>
     <td>22</td>     <td>ShiftAll (by PARAMETER)</td>
    </tr>
  </tbody>
</table>

<!--

          <artwork name="" type="" align="left" alt=""><![CDATA[

  ID     Operation
   0     Identity
   1     OmitLast1
   2     OmitLast2
   3     OmitLast3
   4     OmitLast4
   5     OmitLast5
   6     OmitLast6
   7     OmitLast7
   8     OmitLast8
   9     OmitLast9
  10     FermentFirst
  11     FermentAll
  12     OmitFirst1

  13     OmitFirst2
  14     OmitFirst3
  15     OmitFirst4
  16     OmitFirst5
  17     OmitFirst6
  18     OmitFirst7
  19     OmitFirst8
  20     OmitFirst9
  21     ShiftFirst (by PARAMETER)
  22     ShiftAll (by PARAMETER)
]]></artwork>
-->

          <t>
   Operations 0 to 20 are specified in <xref target="sect-8" format="default"/> in <xref section="8" target="RFC7932" format="default"/>.
   ShiftFirst and ShiftAll transform specifically encoded SCALARs.</t>
          <t>
   A SCALAR is a 7-, 11-, 16- 16-, or 21-bit unsigned integer encoded with 1,
   2, 3 3, or 4 bytes respectively bytes, respectively, with the following bit contents:</t>

<!-- SG: should these be individual figures? -->

<artwork name="" type="" align="left" alt=""><![CDATA[
   7-bit SCALAR:
   +--------+
   |0sssssss|
   +--------+

   11-bit SCALAR:
   +--------+--------+
   |110sssss|XXssssss|
   +--------+--------+

   16-bit SCALAR:
   +--------+--------+--------+
   |1110ssss|XXssssss|XXssssss|
   +--------+--------+--------+

   21-bit SCALAR:
   +--------+--------+--------+--------+
   |11110sss|XXssssss|XXssssss|XXssssss|
   +--------+--------+--------+--------+
]]></artwork>
          <t>
   Given the input bytes matching the SCALAR encoding pattern, the SCALAR
   value is obtained by concatenation of the "s" bits, with the most
   significant bits MSBs coming from the earliest byte. The "X" bits could
   have arbitrary value.</t>
          <t>
   An ADDEND is defined as the result of limited sign extension of
   a 16-bit unsigned PARAMETER:</t>
<t indent="3">
      At first first, the PARAMETER is zero-extended to 32 bits. After this,
      0xFF0000 is added if the resulting value is greater or equal than 0x8000,
      then 0xFF0000 is added.</t> 0x8000.</t>
          <t>
   ShiftAll starts at the beginning of the word and repetitively applies
   the following transform transformation until the whole word is transformed:</t>
<t indent="3">
      If the next untransformed byte matches the first byte of the 7-,
      11-, 16- 16-, or 21-bit SCALAR pattern, then:</t>
<t indent="6">
         If the untransformed part of the word is not long enough to
         match the whole SCALAR pattern, then the whole word is
         marked as transformed.</t>
<t indent="6">
Otherwise, let SHIFTED be the sum of the ADDEND and the
         encoded SCALAR. The lowest bits from SHIFTED
         are written back into the corresponding "s" bits. The "0",
         "1"
         "1", and "X" bits remain unchanged. Next, 1, 2, 3 3, or
         4 not transformed untransformed bytes are marked as transformed, transformed according to
         the SCALAR pattern length.</t>
<t indent="3">
      Otherwise, the next untransformed byte is marked as transformed.</t>
          <t>
   ShiftFirst applies the same transform transformation as ShiftAll, but does not
   iterate.</t>
        </section>
      </section>
      <section anchor="sect-3.2" numbered="true" toc="default">
        <name>LZ77 Dictionaries</name>
        <t>
   If an LZ77 dictionary is set, then the decoder treats this it as a
   regular LZ77 copy, copy but behaves as if the bytes of this dictionary are
   accessible as the uncompressed bytes outside of the regular LZ77
   window for backwards references.</t>
        <t>
   Let LZ77_DICTIONARY_LENGTH be the length of the LZ77 dictionary.
   Then word_id, described in <xref target="sect-8" format="default"/> in <xref section="8" target="RFC7932" format="default"/>, is redefined as:</t>
        <artwork name="" type="" align="left" alt=""><![CDATA[
word_id = distance - (max allowed distance + 1 +
LZ77_DICTIONARY_LENGTH)
]]></artwork>
        <t>
   For the case when LZ77_DICTIONARY_LENGTH is 0, word_id matches the
   <xref target="RFC7932" format="default"/> definition.</t>
        <t>
   Let dictionary_address be</t>
        <t> be:</t>

  <t indent="3"> LZ77_DICTIONARY_LENGTH + max allowed distance - distance</t>
        <t>
   Then distance values of &lt;length, distance&gt; pairs <xref target="RFC7932" format="default"/> in range
   (max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed
   distance) are interpreted as references starting in the LZ77
   dictionary at the byte at dictionary_address. If length is longer
   than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the
   reference continues to copy (length - LZ77_DICTIONARY_LENGTH +
   dictionary_address) bytes from the regular LZ77 window starting at
   the beginning.</t>
      </section>
    </section>
    <section anchor="sect-4" numbered="true" toc="default">

      <name>Varint Encoding</name>
<t>A varint is encoded in base 128 in one or more bytes as follows:</t>

      <artwork name="" type="" align="left" alt=""><![CDATA[
   +--------+--------+             +--------+
   |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx|
   +--------+--------+             +--------+
]]></artwork>
      <t>
   where the "x" bits of the first byte are the least significant bits LSBs
   of the value and the "x" bits of the last byte are the most
   significant bits MSBs of the value.

   The last byte must have its MSB set to
   0, 0 and all other bytes must
   have their MSBs set to 1 to indicate there is a next byte.</t>
      <t>
   The maximum allowed amount of bits to read is 63 bits, bits; if the 9th
   byte is present and has its MSB set set, then the stream must be
   considered as invalid.</t>
    </section>
    <section anchor="sect-5" numbered="true" toc="default">
      <name>Shared Dictionary Stream</name>
      <t>
   The shared dictionary stream encodes a custom dictionary for brotli brotli,
   including custom words and/or custom transformations. A shared
   dictionary may appear as a standalone or as contents of a resource in a
   framing format container.</t>
      <t>
   A compliant shared brotli dictionary stream must have the following
   format:</t>
      <dl newline="false" spacing="normal" indent="3">
        <dt>2 bytes:</dt>
        <dd>
      file signature,
        <dd>File signature in hexadecimal the bytes 91, 0.</dd> format (bytes 91 and 0).</dd>
            <dt>varint:</dt> <dd>LZ77_DICTIONARY_LENGTH, <dd>LZ77_DICTIONARY_LENGTH. The number of bytes for a LZ77 an LZ7711
	dictionary, or 0 if there is none.
              The maximum allowed value is the maximum possible sliding
              window size of brotli or of large window brotli.
	    </dd>

        <dt>
      LZ77_DICTIONARY_LENGTH bytes:</dt><dd> contents bytes:</dt><dd>Contents of the LZ77 dictionary.</dd>
      <dt>1 byte:</dt><dd>
      <t>NUM_CUSTOM_WORD_LISTS, may byte:</dt><dd><t>NUM_CUSTOM_WORD_LISTS. May have a value in range  0 to 64</t>

<t>            NUM_CUSTOM_WORD_LISTS 64.</t></dd>
<dt>NUM_CUSTOM_WORD_LISTS times a word list, list with the following format for each word list:
              </t> list:</dt>
<dd>
<t><br/></t>
              <dl>
                <dt>28 bytes:</dt><dd>SIZE_BITS_BY_LENGTH, bytes:</dt><dd>SIZE_BITS_BY_LENGTH. An array of 28 unsigned 8-bit
	integers, indexed by word lengths 4 to 31. The value
                   represents log2(number of words of this length),
                   with the exception of 0 meaning 0 words of this
                   length. The max allowed length value is 15 bits.
                   OFFSETS_BY_LENGTH is computed from this as
		   OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] +
                   (SIZE_BITS_BY_LENGTH[i] ? (i &lt;&lt; SIZE_BITS_BY_LENGTH[i]) : 0) </dd> 0).</dd>

                <dt>N bytes:</dt><dd> words bytes:</dt><dd>Words dictionary data, where N is
	OFFSETS_BY_LENGTH[31] + (SIZE_BITS_BY_LENGTH[31] ?
                  (31 &lt;&lt; SIZE_BITS_BY_LENGTH[31]) : 0), first with all the words of shortest length, length first, then all words of the next length, and so on, where for each length there are either 0 or a positive power of two amount number of words. words for each length. </dd>
              </dl></dd>

        <dt>
      1 byte:</dt><dd><t>NUM_CUSTOM_TRANSFORM_LISTS, may byte:</dt><dd>NUM_CUSTOM_TRANSFORM_LISTS. May have a value in range 0 to 64</t>

          <t> 64.</dd>

          <dt>
            NUM_CUSTOM_TRANSFORM_LISTS times a transform list, list with the
	following format for each transform list:
              </t>
              </dt>
<dd>
<t><br/></t>
<dl>
                <dt>2 bytes:</dt><dd> PREFIX_SUFFIX_LENGTH, the PREFIX_SUFFIX_LENGTH. The length of prefix/suffix
	data. Must be at least 1 because the list must
                  always end with a zero-length stringlet even
                  if it is empty.
	</dd>
                <dt>NUM_PREFIX_SUFFIX times:</dt> <dd><t>prefix/suffix stringlet.</t>
                  <t> times:</dt><dd><t>Prefix/suffix stringlet. NUM_PREFIX_SUFFIX is the amount number of stringlets parsed and
            must be in range 1..256.
                  </t><dl>
                    <dt>1 byte:</dt><dd> STRING_LENGTH, the STRING_LENGTH. The length of the entry contents.
	0 for the last (terminating) entry of the
                    transform list. For other entries entries, STRING_LENGTH
                    must be in range 1..255. The 0 entry must be
                    present and must be the last byte of the
                    PREFIX_SUFFIX_LENGTH bytes of prefix/suffix
                    data, else the stream must be rejected as
                    invalid.</dd>
	<dt>STRING_LENGTH bytes:</dt><dd> contents Contents of the prefix/suffix.</dd>
      </dl></dd>

	<dt>1 byte:</dt><dd> NTRANSFORMS, amount NTRANSFORMS. Number of transformation triplets.</dd>
                <dt>NTRANSFORMS times:</dt><dd><t> times the data for each transform:</t> transform listed below:</dt><dd>
<t><br/></t>
                <dl>
                  <dt>
	1 byte:</dt><dd> index byte:</dt><dd>Index of prefix in prefix/suffix data;
                    must be less than NUM_PREFIX_SUFFIX.
                  </dd>
                    <dt>1 byte:</dt><dd> index byte:</dt><dd>Index of suffix in prefix/suffix data;
                    must be less than NUM_PREFIX_SUFFIX.</dd>
                    <dt>1 byte:</dt><dd> operation index, byte:</dt><dd>Operation index; must be an index in the table of
	operations listed in the Section
                    "Transform Operations".</dd></dl>
                      <t> <xref target="sect-3.1.1"/>.</dd></dl></dd></dl>
                      <dl><dt>
	If and only if at least one transform has operation index
         ShiftFirst or ShiftAll:
                      </t>

                      <t> ShiftAll, then NTRANSFORMS times:</t> times the following:</dt><dd>
<t><br/></t>
<dl>

                      <dt>
	2 bytes:</dt><dd> parameters bytes:</dt><dd>Parameters for the transform. If the transform
                        does not have type ShiftFirst or ShiftAll, the
                        value must be 0. ShiftFirst and ShiftAll
                        interpret these bytes as an unsigned 16-bit
                        integer.
                      </dd></dl>
            <t>if
                      </dd></dl></dd></dl></dd></dl>
<dl>
            <dt>If NUM_CUSTOM_WORD_LISTS &gt; 0 or NUM_CUSTOM_TRANSFORM_LISTS &gt; 0
	(else implicitly NUM_DICTIONARIES is 1 and points to the
         brotli built-in and there is no context map)
              </t> map):</dt>
<dd>
<t><br/></t>
              <dl>
                <dt>1 byte:</dt><dd> NUM_DICTIONARIES, may byte:</dt>
        <dd>NUM_DICTIONARIES. May have a value in range 1 to 64. Each
	dictionary is a combination of a word list and a
                 transform list. Each next dictionary is used when the
                 distance goes beyond the previous. If a CONTEXT_MAP is
                 enabled, then the dictionary matching the context is
                 moved to the front in the order for this context.
	</dd>
                <dt>NUM_DICTIONARIES times:</dt><dd> <t>the DICTIONARY_MAP:</t> times the DICTIONARY_MAP, which contains:</dt><dd>
<t><br/></t>
                  <dl><dt>
	1 byte:</dt><dd> index byte:</dt><dd>Index into a custom word list, list or value
                    NUM_CUSTOM_WORD_LISTS to indicate to use using the brotli
                    <xref target="RFC7932" format="default"/> built-in default word list list.
                  </dd>
                    <dt>1 byte:</dt><dd>index byte:</dt><dd>Index into a custom transform list, list or value
	NUM_CUSTOM_TRANSFORM_LISTS to indicate to use using the
                    brotli <xref target="RFC7932" format="default"/> built-in default transform list list.
	</dd>
                  </dl>
                </dd>

		<dt>1 byte:</dt><dd><t> CONTEXT_ENABLED, if 0 byte:</dt><dd>CONTEXT_ENABLED. If 0, there is no context map, if 1 map. If 1, a
	context map used to select the dictionary is encoded
                 below</t></dd></dl>
                <t>If as
        below.</dd>
	      </dl>
		<dl>
                <dt>If CONTEXT_ENABLED is 1, there is a context map for the 64 brotli
                  <xref target="RFC7932" format="default"/> literals contexts:
                  </t>
                  </dt>
<dd><t><br/></t>
                  <dl>
                    <dt>64 bytes:</dt><dd> CONTEXT_MAP, index CONTEXT_MAP. Index into the DICTIONARY_MAP for
	the first dictionary to use for this context context.
	</dd></dl></dd>
                  </dl>
                </dd>
              </dl>
</dl></dd></dl>

    </section>
    <section anchor="sect-6" numbered="true" toc="default">
      <name>Large Window Brotli Compressed Data Stream</name>
      <t>
   Large window brotli allows a sliding window beyond the 24-bit maximum
   of regular brotli <xref target="RFC7932" format="default"/>.</t>
      <t>
   The compressed data stream is backwards compatible to brotli
      <xref target="RFC7932" format="default"/>, format="default"/> and may optionally have the following differences:</t>
<dl><dt>Encoding

<dl>
<dt>In the encoding of WBITS in the stream header:</dt><dd><t> header, the following new pattern of 14 bits is supported:</t>
<dl> supported:</dt><dd><t><br/></t>
<dl newline="false" spacing="normal">
                <dt>8 bits:</dt><dd> value 00010001, bits:</dt><dd>Value 00010001 to indicate a large window
	brotli stream</dd> stream.</dd>

	<dt>6 bits:</dt><dd> WBITS, must WBITS. Must have value in range 10 to 62</dd> 62.</dd>
	      </dl></dd>
            <dt>Distance alphabet:</dt><dd> if alphabet:</dt><dd>If the stream is a large window brotli
	stream, the maximum number of extra bits is 62 and the
         theoretical maximum size of the distance alphabet is
         (16 + NDIRECT + (124 &lt;&lt; NPOSTFIX)). This overrides the value for the distance alphabet size given in <xref section="3.3" sectionFormat="of" target="RFC7932"/> and affects the amount number of bits in the encoding of the Simple Prefix Code for distances as described in <xref section="3.4" sectionFormat="of" target="RFC7932"/>. An additional limitation to distances, despite the large allowed alphabet size, is that the alphabet is not allowed to contain a distance symbol able to represent a distance larger than ((1 &lt;&lt; 63) - 4) when its extra bits have their maximum value. It depends on NPOSTFIX and NDIRECT when this can occur. </dd>
          </dl>
      <t>
   A decoder that does not support 64-bit integers may reject a stream
   if WBITS is higher than 30 or a distance symbol from the distance
   alphabet is able to encode a distance larger than 2147483644.</t>
    </section>
    <section anchor="sect-7" numbered="true" toc="default">
      <name>Shared Brotli Compressed Data Stream</name>
      <t>
   The format of a shared brotli compressed data stream without a framing
   format is backwards compatible with brotli <xref target="RFC7932" format="default"/>, format="default"/> with the
   following optional differences:</t>
<ul><li>LZ77 dictionaries as described above are supported</li> supported.</li>
<li> Custom static dictionaries replacing or extending the static
     dictionary of brotli <xref target="RFC7932" format="default"/> with different words or
         transforms are supported</li> supported.</li>
<li>The stream may have the format of regular brotli <xref target="RFC7932"/>, target="RFC7932"/>
	or the format of large window brotli as described in section
         6.</li> <xref target="sect-6" format="default"/>.</li>
      </ul>
    </section>
    <section anchor="sect-8" numbered="true" toc="default">
      <name>Shared Brotli Framing Format Stream</name>
      <t>
   A compliant shared brotli framing format stream has the format
   described below.</t>
      <section anchor="sect-8.1" numbered="true" toc="default">
        <name>Main Format</name>

            <dl>
              <dt>4 bytes:</dt><dd> file signature, bytes:</dt><dd>File signature in hexadecimal the bytes format (bytes 0x91, 0x0a, 0x42, 0x52. and 0x52). The first byte contains the invalid WBITS
              combination for brotli <xref target="RFC7932" format="default"/> and large window brotli.
	</dd>
        <dt>1 byte:</dt><dd><t> container flags, byte:</dt><dd><t>Container flags that are 8 bits with and have the following meanings:</t>

<dl><dt>	bit	bits 0 and 1:</dt><dd> version indicator, 1:</dt><dd>Version indicator that must be b'00, otherwise b'00. Otherwise, the
           decoder must reject the data stream as invalid.
                </dd>
                  <dt>bit 2:</dt> <dd>if <dd>If 0, the file contains no final footer, may not contain
	any metadata chunks, may not contain a central directory,
           and may encode only a single resource (using one or more
           data chunks). If 1, the file may contain one or more
           resources, metadata, and a central directory, and it must contain a
           final footer.
	</dd>
                </dl>
              </dd>
              <dt>multiple times:</dt><dd> a times:</dt><dd>A chunk, each with the format specified in section 8.2</dd> <xref target="sect-8.2"/>.</dd>
            </dl>
      </section>
      <section anchor="sect-8.2" numbered="true" toc="default">
        <name>Chunk Format</name>
            <dl>
              <dt>varint:</dt> <dd>length
              <dt>varint:</dt><dd>Length of this chunk excluding this varint
              but including all next header bytes and data. If the value is 0,
              then the chunk type byte is not present and the chunk type is
              assumed to be 0.</dd>
<dt>1 byte:</dt><dd><t>CHUNK_TYPE</t>
<dl>
<dl indent="5" spacing="compact">
<dt>               0:</dt><dd> padding chunk</dd>
<dt>               1:</dt><dd> metadata chunk</dd>
<dt>               2:</dt><dd> data chunk</dd>
<dt>               3:</dt><dd> first partial data chunk</dd>
<dt>               4:</dt><dd> middle partial data chunk</dd>
<dt>               5:</dt><dd> last partial data chunk</dd>
<dt>               6:</dt><dd> footer metadata chunk</dd>
<dt>               7:</dt><dd> global metadata chunk</dd>
<dt>               8:</dt><dd> repeat metadata chunk</dd>
<dt>               9:</dt><dd> central directory chunk</dd>
<dt>               10:</dt><dd> final footer</dd>
</dl></dd></dl>

<t>
      if
</dl></dd>

<dt>If CHUNK_TYPE is not padding chunk, central directory directory, or final
      footer:</t> footer:</dt>
<dd>
<t><br/></t>
<dl><dt>         1 byte:</dt><dd><t> CODEC:</t>
<dl>
<dl spacing="compact">
<dt>0:</dt><dd> uncompressed</dd>
<dt>	1:</dt><dd> keep decoder</dd>
<dt>	2:</dt><dd> brotli</dd>
<dt>	3:</dt><dd> shared brotli</dd>
</dl>
</dd></dl>

<t>if
</dd>
</dl>
</dd>
<dt>If CODEC is not "uncompressed":</t> "uncompressed":</dt>
<dd>
<t><br/></t>
<dl><dt>
	varint:</dt><dd> uncompressed
	varint:</dt><dd>Uncompressed size in bytes of the data contained
                 within the compressed stream
	</dd></dl>
<t>if stream.
	</dd></dl></dd>
<dt>If CODEC is "shared brotli":</t> brotli":</dt>
<dd><t><br/></t>
<dl><dt>
	1 byte:</dt><dd><t> amount byte:</dt><dd><t>Number of dictionary references. Multiple dictionary
                 references are possible with the following
                 restrictions: there can be maximum 1 serialized
                 dictionary,
                 dictionary and maximum 15 prefix dictionaries maximum (a
                 serialized dictionary may already contain one of
                 those). Circular references are not allowed (any
                 dictionary reference that directly or indirectly
                 uses this chunk itself as dictionary).</t></dd>
</dl>
                <t>         per

<dt>Per dictionary reference:</t> reference:</dt>
<dd><t><br/></t>
<dl><dt>1 byte:</dt><dd><t> flags:</t>
                <dl><dt>bit Flags:</t>
                <dl><dt>bits 0 and 1:</dt><dd><t> dictionary Dictionary source:</t>

	<dl><dt>00:</dt><dd>

	<dl indent="5"><dt>00:</dt><dd> Internal dictionary reference to a full resource
                      by pointer, which can span one or more chunks.
                      Must point to a full data chunk or a first
                      partial data chunk.</dd>

                      <dt>01:</dt><dd> Internal dictionary reference to single chunk
            contents by pointer. May point to any chunk with
                      content (data or metadata). If a partial data
                      chunk, only this part is the dictionary. In this
                      case, the dictionary type is not allowed to be a
                      serialised
                      serialized dictionary.
	</dd>
                      <dt>10:</dt><dd> Reference to a dictionary by hash code of a
	resource. The dictionary can come from an
                      external source source, such as a different container.
                      The user of the decoder must be able to provide
                      the dictionary contents given its hash code (even
                      if it comes from this container itself), itself) or treat
                      it as an error when the user does not have it
                      available.</dd>

	<dt>11:</dt><dd> invalid Invalid bit combination</dd>
                    </dl>
                  </dd>
                    <dt>               bit               bits 2 and 3:</dt><dd> dictionary type:</dd> 3:</dt><dd><t>Dictionary type:</t>
<dl indent="5">
         <dt>00:</dt><dd> <t>prefix <t>Prefix dictionary, set in front of the sliding
                      window</t>
                    <dl>
                        <dt>               01:</dt><dd> serialized
                      window</t></dd>

                        <dt>01:</dt><dd>Serialized dictionary in the shared brotli
                      format as specified in section 5.</dd> <xref target="sect-5"/>.</dd>
                        <dt>
	10:</dt><dd> invalid Invalid bit combination</dd>
	<dt>11:</dt><dd> invalid Invalid bit combination</dd>
	<dt>bit combination</dd></dl></dd>
	<dt>bits 4-7:</dt><dd> must Must be 0</dd>

                          <dt>if hash-based:</dt>
                          <dd> 0</dd></dl></dd>
                          <dt>If hash-based:</dt><dd><t><br/></t>
	<dl><dt>1 byte:</dt><dd> type byte:</dt><dd>Type of hash used. Only supported value: 3,
                       indicating 256-bit Highwayhash HighwayHash <xref target="HWYHASH" format="default"/>.
</dd>
                        </dl>
                      </dd>
                    </dl>
                  </dd>
                    <dt>               32
                    <dt>32 bytes:</dt><dd><t> 256-bit Highwayhash HighwayHash checksum to refer to
                         dictionary.</t>
                    <dl>
                        <dt>if
                         dictionary.</t></dd></dl></dd>
                        <dt>If pointer based:</dt><dd> varint encoded based:</dt><dd>Varint-encoded pointer to its
               chunk in this container. The chunk must come earlier in the container earlier
               than the current chunk.</dd> chunk.</dd></dl></dd></dl></dd>
                          <dt>X bytes:</dt><dd> extra bytes:</dt><dd>Extra header bytes, depending on CHUNK_TYPE. If present,
	they are specified in the subsequent sections.
	</dd>
                        </dl>
                      </dd> sections.</dd>
                        <dt>remaining bytes:</dt><dd> <t>the <t>The chunk contents. The uncompressed data
                          in the chunk content depends on CHUNK_TYPE
                          and is specified in the subsequent sections.
                          The compressed data has following
                          format depending on CODEC:</t>
<ul><li>uncompressed:
<!-- [rfced] May we update the following unordered list into a
definition list for consistency with the rest of Section 8.2?

Original:
         *  uncompressed: the raw bytes</li>
<li>if bytes

         *  if "keep decoder", the continuation of the compressed stream
            which was interrupted at the end of the previous chunk.  The
            decoder from the previous chunk must be used and its state
            it had at the end of the previous chunk must be kept at the
            start of the decoding of this chunk.

         *  brotli: the bytes are in brotli format [RFC7932]

         *  shared brotli: the bytes are in the shared brotli format
            specified in Section 7

Perhaps:
         uncompressed: The raw bytes.

         "keep decoder": If "keep decoder", the continuation of the compressed stream
            that was interrupted at the end of the previous chunk.  The
            decoder from the previous chunk must be used and its state
            it had at the end of the previous chunk must be kept at the
            start of the decoding of this chunk.

         brotli: The bytes are in brotli format [RFC7932].

         shared brotli: The bytes are in the shared brotli format
         specified in Section 7.
 -->
<ul><li>uncompressed: The raw bytes.</li>
<li>If "keep decoder", the continuation of the compressed
	stream that was interrupted at the end of the previous
               chunk. The decoder from the previous chunk must be used
               and its state it had at the end of the previous chunk
               must be kept at the start of the decoding of this chunk.
	</li>
                  <li>brotli: the The bytes are in brotli format
                    <xref target="RFC7932" format="default"/> format="default"/>.
                  </li>
                  <li>shared brotli: the The bytes are in the
	shared brotli format specified in section
                              7</li></ul></dd>
                </dl>
              </dd> <xref target="sect-7"/>.</li></ul></dd>
                </dl>
	  </section>

      <section anchor="sect-8.3" numbered="true" toc="default">
        <name>Metadata Format</name>
        <t>All the metadata chunk types use the following format for the
   uncompressed content:</t>

            <dl newline="true">
              <dt>Per field:</dt>
              <dd>
		  <dl><dt>2 bytes:</dt>
		  <dd><t> code
		  <dd><t>Code to identify this metadata field. This must be
                  two lowercase or two uppercase alpha ascii ASCII
                  characters. If the decoder encounters a lowercase
                  field that it does not recognise recognize for the current
                  chunk type, non-ascii characters non-ASCII characters, or non-alpha
                  characters, the decoder must reject the data stream
                  as invalid. Uppercase codes may be used for custom
                  user metadata and can be ignored by a compliant
                  decoder.</t></dd>
                 <dt>varint:</dt>
		 <dd> <t>length
		 <dd><t>Length of the content of this field in bytes,
	excluding the code bytes and this varint</t>
		  <dl> varint.</t></dd>
		  <dt>N bytes:</dt>
		  <dd> the
		  <dd>The contents of this field</dd>
                  </dl>
              </dd> field.</dd>
                  </dl>
              </dd>
            </dl>
        <t>
   The last field is reached when the chunk content end is reached. If
   the length of the last field does not end at the same byte as the end
   of the uncompressed content of the chunk, the decoder must reject the
   data stream as invalid.</t>
      </section>

      <section anchor="sect-8.4" numbered="true" toc="default">
        <name>Chunk Specifications</name>
        <section anchor="sect-8.4.1" numbered="true" toc="default">
          <name>Padding Chunk (Type 0)</name>
          <t>
   All bytes in this chunk must be zero, zero except for the initial varint
   that specifies the remaining chunk length.</t>
          <t>
   Since the varint itself takes up bytes as well, when the goal is to
   introduce an amount a number of padding bytes, the dependence of the length of
   the varint on the value it encodes must be taken into account.</t>
          <t>
   A single byte varint with a value of 0 is a padding chunk of length 1.
   For more padding, use higher varint values. Do not use multiple
   shorter padding chunks, chunks since this is slower to decode.</t>
        </section>
        <section anchor="sect-8.4.2" numbered="true" toc="default">
          <name>Metadata Chunk (Type 1)</name>
          <t>
   This chunk contains metadata that applies to the resource whose
   beginning is encoded in the subsequent data chunk or first partial
   data chunk.</t>
          <t>
   The contents of this chunk follows the format described in <xref target="sect-8.3" format="default"/>.</t>

   <t>The following field types are recognised:</t>

<dl><dt>id:</dt><dd><t> name recognized:</t>

<dl><dt>id (N bytes):</dt><dd>Name field. May appear 0 or 1 times. Has the following
          format:</t>
              <dl>
                  <dt>         N bytes:</dt><dd><t>
          format: name in UTF-8 encoding, length
                  determined by the field length. Treated generically but may
                  be used as a filename. If used as a filename, forward slashes
                  '/' should be used as directory separator, separators, relative paths
                  should be used used, and filenames ending in a slash with 0-length
                  content in the matching data chunk should be treated as an
                  empty directory.</t>
                  <dl>
                    <dt>mt:</dt> <dd><t>modification directory.</dd>
		   <dt>mt (8 bytes):</dt><dd>Modification type. May appear 0 or 1 times. Has the following format:</t>
<dl>
                <dt>8 bytes:</dt><dd> format: contains microseconds since epoch, as a little endian little-endian, signed	twos two's complement 64-bit integer</dd>
<dt>	custom integer.</dd>
<dt>custom user field:</dt><dd> any field:</dt><dd>Any two uppercase ASCII characters.</dd>
		  </dl>
                </dd>
              </dl>
            </dd>
          </dl>

            </dd>
          </dl>

        </section>
        <section anchor="sect-8.4.3" numbered="true" toc="default">
          <name>Data Chunk (Type 2)</name>
          <t>
   A data chunk contains the actual data of a resource.</t>

          <t>This chunk has the following extra header bytes:</t>
	  <dl>

	<dt>1 byte: </dt> <dd><t>flags: <dd><t>Flags:
              </t>
              <dl>
                  <dt>        bit 0:</dt><dd> if If true, indicates this is not a resource that should be
               output implicitly as part of extracting resources from
               this container. Instead, it may be referred to only
               explicitly, e.g. e.g., as a dictionary reference by hash code
               or offset. This flag should be set for data used as
               dictionary to improve compression of actual resources.</dd>
                  <dt>
	bit 1:</dt><dd> if 1:</dt><dd>If true, hash code is given</dd> given.</dd>
                  <dt>
	bits 2-7:</dt><dd> must 2-7:</dt><dd>Must be zero</dd></dl>

                    <t>if zero.</dd></dl></dd>

                    <dt>If hash code is given:</t> given:</dt><dd><t><br/></t>
                    <dl>
	<dt>1 byte:</dt><dd> type byte:</dt><dd>Type of hash used. Only supported value: 3,
                indicating 256-bit Highwayhash HighwayHash <xref target="HWYHASH" format="default"/>.
	</dd>
                <dt>        32 bytes:</dt><dd> 256-bit Highwayhash HighwayHash checksum of the uncompressed
                  data</dd>
                  data.</dd>
              </dl>
		  </dd>
		  </dl>
          <t>
   The uncompressed content bytes of this chunk are the actual data of
   the resource.</t>
        </section>
        <section anchor="sect-8.4.4" numbered="true" toc="default">
          <name>First Partial Data Chunk (Type 3)</name>
          <t>
   This chunk contains partial data of a resource. This is the first
   chunk in a series containing the entire data of the resource.</t>
          <t>
   The format of this chunk is the same as the format of a Data Chunk data chunk
   (<xref target="sect-8.4.3" format="default"/>) except for the differences noted below.</t>
          <t>
   The second bit of flags must be set to 0 and no hash code given.</t>
          <t>
   The uncompressed data size is only of this part of the resource, not
   of the full resource.</t>
        </section>
        <section anchor="sect-8.4.5" numbered="true" toc="default">
          <name>Middle Partial Data Chunk (Type 4)</name>
          <t>
   This chunk contains partial data of a resource, resource and is neither the
   first nor the last part of the full resource.</t>
          <t>
   The format of this chunk is the same as the format of a Data Chunk data chunk
   (<xref target="sect-8.4.3" format="default"/>) except for the differences noted below.</t>
          <t>
   The first and second bits of flags must be set to 0.</t>
          <t>
   The uncompressed data size is only of this part of the resource, not
   of the full resource.</t>
        </section>
        <section anchor="sect-8.4.6" numbered="true" toc="default">
          <name>Last Partial Data Chunk (Type 5)</name>
          <t>
   This chunk contains the final piece of partial data of a resource.</t>
          <t>
   The format of this chunk is the same as the format of a Data Chunk data chunk
   (<xref target="sect-8.4.3" format="default"/>) except for the differences noted below.</t>
          <t>
   The first bit of the flags must be set to 0.</t>
          <t>
   If a hash code is given, the hash code of the full resource
   (concatenated from all previous chunks and this chunk) is given in
   this chunk.</t>
          <t>
   The uncompressed data size is only of this part of the resource, not
   of the full resource.</t>
          <t>
   The type of this chunk indicates that there are no further chunk
   encoding this resource, so the full resource is now known.</t>
        </section>
        <section anchor="sect-8.4.7" numbered="true" toc="default">
          <name>Footer Metadata Chunk (Type 6)</name>
          <t>
   This metadata applies to the resource whose encoding ended in the
   preceding data chunk or last partial data chunk.</t>
          <t>
   The contents of this chunk follows the format described in <xref target="sect-8.3" format="default"/>.</t>
   <t>
   There are no lowercase field types defined for global footer metadata.
   Uppercase field types can be used as custom user data.</t>
        </section>
        <section anchor="sect-8.4.8" numbered="true" toc="default">
          <name>Global Metadata Chunk (Type 7)</name>
          <t>
   This metadata applies to the whole container instead of a single
   resource.</t>
          <t>
   The contents of this chunk follows the format described in <xref target="sect-8.3" format="default"/>.</t>
          <t>
   There are no lowercase field types defined for footer global metadata.
   Uppercase field types can be used as custom user data.</t>
        </section>
        <section anchor="sect-8.4.9" numbered="true" toc="default">
          <name>Repeat Metadata Chunk (Type 8)</name>
          <t>
   These chunks optionally repeat metadata that is interleaved between
   data chunks. To use these chunks, it is necessary to also read
   additional information, such as pointers to the original chunks, from
   the central directory.</t>
          <t>
   The contents of this chunk follows the format described in <xref target="sect-8.3" format="default"/>.</t>

            <t>This chunk has an extra header byte:</t>
<dl>            <dt>
	1 byte:</dt><dd> chunk byte:</dt><dd>Chunk type of repeated chunk (metadata chunk
              or footer metadata chunk) chunk).
	</dd></dl>

            <t>This set of chunks must follow the following restrictions:</t>
<ul><li>
	It is optional whether or not repeat metadata chunks are
         present.</li>
                <li>If they are present, then they must be present for all
	metadata chunks and footer metadata chunks.
	</li>
                <li>There may be only 1 repeat metadata chunk per repeated metadata chunk.</li>

                <li>They must appear in the same order as the chunks appear in the container, which is also the same order as listed in the
         central directory.
	</li>
                <li>Compression of these chunks is allowed, however allowed; however, it is not allowed
	to use any internal dictionary except an earlier repeat
         metadata chunk of this series, and it is not allowed for a
         metadata chunk to keep the decoder state if the previous chunk
         is not a repeat metadata chunk. That is, the series of
         metadata chunks must be decompressible without using other
         chunks of the framing format file.
	</li>
              </ul>
          <t>
   The fields contained in this metadata chunk must follow the following
   restrictions:</t>
<ul>
                <li>If a field is present, it must
	exactly match the corresponding field of the copied chunk.</li>

                <li>It is allowed to leave out a field that is present
	in the copied chunk.
	</li>
<li>If a field is present, then it must be present in *all* <strong>all</strong> other
	repeat metadata chunks when the copied chunk contains this
         field. In other words, if you know you can get the name field
         from a repeat chunk, you know that you will be able to get all
         names of all resources from all repeat chunks.
	</li>
              </ul>
        </section>
        <section anchor="sect-8.4.10" numbered="true" toc="default">
          <name>Central Directory Chunk (Type 9)</name>
          <t>
   The central directory chunk, chunk along with the repeat metadata chunks, chunks
   allow to quickly find finding and list listing compressed resources in the container
   file.</t>
          <t>
   The central directory chunk is always uncompressed and does not have
   the codec byte. It instead has the following format:</t>

<dl>
                <dt>varint:</dt><dd> <t>pointer <t>Pointer into the file where the repeat metadata chunks are located, located or 0 if they are not present per present.</t></dd>
<dt>per chunk listed:</t> listed:</dt><dd><t><br/></t>
<dl>                  <dt>
	varint:</dt><dd> pointer
	varint:</dt><dd>Pointer into the file where this chunk begins</dd>

	<dt>varint:</dt><dd> amount begins.</dd>

	<dt>varint:</dt><dd>Number of header bytes N used below</dd> below.</dd>
                  <dt>N bytes:</dt><dd> copy bytes:</dt><dd>Copy of all the header bytes of the pointed at chunk,
	including total size, chunk type byte, codec,
                 uncompressed size, dictionary references, and X extra
                 header bytes. The content is not repeated here.
	</dd>
                  </dl>
                </dd>
              </dl>
          <t>
   The last listed chunk is reached when the end of the contents of the
   central directory are reached. If the end does not match the last
   byte of the central directory, the decoder must reject the data
   stream as invalid.</t>
          <t>
   If present, the central directory must list all data and metadata
   chunks of all types.</t>
        </section>
        <section anchor="sect-8.4.11" numbered="true" toc="default">
          <name>Final Footer Chunk (Type 10)</name>
          <t>
   Chunk that

   The final footer chunk closes the file, file and is only present if in bit 2 of the
   initial container
   header flags bit 2 was set.</t>

            <t>This chunk has the following content, which is always uncompressed:</t>
<dl>
              <dt>
	reversed varint:</dt><dd><t> size varint:</dt><dd><t>Size of this entire framing format file,
                       including these bytes themselves, or 0 if this
                       size is not given</t>
              <dl> given.</t></dd>
                <dt>reversed varint:</dt><dd> pointer varint:</dt><dd>Pointer to the start of the central directory,or directory, or 0 if there is none
	</dd>
              </dl> none.
	</dd>
              </dl>
          <t>
   A reversed varint has the same format as a varint, varint but has its bytes
   are in reversed order order, and it is designed to be parsed from the end of the file
   towards the beginning.</t>
        </section>
        <section anchor="sect-8.4.12" numbered="true" toc="default">
          <name>Chunk ordering</name> Ordering</name>
          <t>
   The chunk ordering must follow the rules described below, if below. If the
   decoder sees otherwise, it must reject the data stream as invalid.</t>
<t indent="3">
      Padding chunks may be inserted anywhere, even between chunks for
      which the rules below say no other chunk types may come in
      between.</t>
<t indent="3">
      Metadata chunks must come immediately before the Data data chunks of
      the resource they apply to.</t>

<t indent="3">
      Footer metadata chunks must come immediately after the Data data
      chunks of the resource they apply to.</t>
<t indent="3">
      There may be only 0 or 1 metadata chunks per resource.</t>
<t indent="3">
      There may be only 0 or 1 footer metadata chunks per resource.</t>
<t indent="3">
      A resource must exist out of either 1 data chunk, chunk or 1 first
      partial data chunk, 0 or more middle partial data
      chunks, and 1 last partial data chunk, in that order.</t>
<t indent="3">
      Repeat metadata chunks must follow the rules of section 8.4.9.</t> <xref target="sect-8.4.9"/>.</t>
<t indent="3">
      There may be only 0 or 1 central directory chunks.</t>
<t indent="3">
      If bit 2 of the container flags is set, there may be only a
      single resource, no metadata chunks of any type, no central
      directory, and no final footer.</t>
<t indent="3">
      If bit 2 of the container flags is not set, there must be exactly
      1 final footer chunk chunk, and it must be the last chunk in the file.</t>

        </section>
      </section>
    </section>
    <section anchor="sect-9" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>
   The security considerations for brotli <xref target="RFC7932" format="default"/> apply to shared
   brotli as well.</t>
      <t>
   In addition, the same considerations apply to the decoding of new
   file format streams for shared brotli, including shared dictionaries,
   the framing format format, and the shared brotli format.</t>
      <t>
   The dictionary must be treated with the same security precautions as
   the content, content because a change to the dictionary can result in a
   change to the decompressed content.</t>
      <t>
   The CRIME attack <xref target="CRIME" format="default"/> shows that it's a bad idea to compress data
   from mixed (e.g. (e.g., public and private) sources -- the data sources
   include not only the compressed data but also the dictionaries. For
   example, if you compress secret cookies using a public-data-only
   dictionary, you still leak information about the cookies.</t>
      <t>
   Not only can the
      <t>The dictionary can reveal information about the compressed
   data, but data and vice versa, versa. That is, data compressed with the dictionary can reveal
   the contents of the dictionary when an adversary can control parts of the data to compress and see the compressed size. On the other hand, if
   the adversary can control the dictionary, the adversary can learn
   information about the compressed data.</t>
      <t>
   The most robust defense against CRIME is not to compress private data
   (e.g., data, e.g., sensitive headers like cookies or any content with PII). personally identifiable information (PII).  The
   challenge has been to identify secrets within a vast amount of data to be
   compressed data. compressed.
   Cloudflare uses a regular expression <xref target="CLOUDFLARE" format="default"/>.
   Another idea is to extend existing web template systems (e.g., Soy
   <xref target="SOY" format="default"/>) to allow developers to mark secrets that must not be
   compressed.</t>
      <t>
   A less robust idea, but easier to implement, is to randomize the
   compression algorithm, i.e., adding randomly generated padding,
   varying the compression ratio, etc. The tricky part is to find the
   right balance between cost and security, i.e., security (i.e., on one hand hand, we don't
   want to add too much padding because it adds a cost to data, but on the
   other hand hand, we don't want to add too little because the adversary can
   detect a small amount of padding with traffic analysis.</t> analysis).</t>
      <t>
   Another
   Additionally, another defense in addition is to not use dictionaries for cross-
   domain requests, requests and to only use shared brotli for the response when the
   origin is the same as where the content is hosted (using CORS).  This
   prevents an adversary from using a private dictionary with user
   secrets to compress content hosted on the adversary's origin.  It
   also helps prevent CRIME attacks that try to benefit from a public
   dictionary by preventing data compression with dictionaries for
   requests that do not originate from the host itself.</t>
      <t>
   The content of the dictionary itself should not be affected by
   external users, users; allowing adversaries to control the dictionary allows
   a form of chosen plaintext attack. Instead, only base the dictionary
   on content you control or generic large scale content such as a
   spoken language, language and update the dictionary with large time intervals
   (days, not seconds) to prevent fast probing.</t>
      <t>
   The use of Highwayhash HighwayHash <xref target="HWYHASH" format="default"/> for dictionary identifiers does not
   guarantee against collisions in an adversarial environment and is
   intended to be used for identifying the dictionary within a trusted,
   known set of dictionaries. In an adversarial environment, users of
   shared brotli should use another mechanism to validate a negotiated
   dictionary,
   dictionary such as using a cryptographically-proven cryptographically proven secure hash.</t>
    </section>
    <section anchor="sect-10" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>
   This document has no IANA actions.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7932.xml"/>

        <reference anchor="HWYHASH" target="https://arxiv.org/abs/1612.06257">
          <front>
            <title>Fast keyed hash/pseudo-random function using SIMD multiply and permute</title>
            <author><organization>
              Alakuijala, J., Cox, B., Wassenberg, J.</organization>
            </author>

            <date/>
            <author fullname="Jyrki Alakuijala"/>
            <author fullname="Bill Cox"/>
            <author fullname="Jan Wassenberg"/>
            <date month="February" year="2017"/>
          </front>
          <seriesInfo name="DOI" value="10.48550/arXiv.1612.06257"/>
        </reference>
      </references>

      <references>
        <name>Informative References</name>
        <reference anchor="LZ77">
          <front>
            <title>A Universal Algorithm for Sequential Data Compression</title>
            <author initials="J." surname="Ziv" fullname="J. Ziv"/>
            <author initials="A." surname="Lempel" fullname="A. Lempel"/>
            <date month="May" year="1997"/> year="1977"/>
	  </front>
          <seriesInfo name="DOI" value="10.1109/TIT.1977.1055714"/>
          <refcontent>IEEE Transactions on Information Theory. 23 (3): Theory, vol. 23, no. 3, pp. 337-343</refcontent>
        </reference>

        <reference anchor="CLOUDFLARE" target="https://blog.cloudflare.com/a-solution-to-compression-oracles-on-the-web/">
          <front>
            <title/>
            <author>
	</author>
            <date/>
            <title>A Solution to Compression Oracles on the Web</title>
            <author fullname="Blake Loring"/>
            <date day="27" month="March" year="2018"/>
          </front>
          <refcontent>The Cloudflare Blog</refcontent>
        </reference>

        <reference anchor="SOY" target="https://developers.google.com/closure/templates/"> target="https://developers.google.com/closure">
          <front>
            <title/>
            <title>Closure Tools</title>
            <author>
              <organization>Google Developers</organization>
	    </author>
            <date/>
          </front>
        </reference>

        <reference anchor="CRIME" target="https://www.cve.org/CVERecord?id=CVE-2012-4929">
          <front>
            <title/>
            <title>CVE-2012-4929</title>
            <author>
              <organization>CVE Program</organization>
	</author>
            <date/>
          </front>
        </reference>
      </references>
    </references>
    <section numbered="false" anchor="acknowledgments" toc="default">
      <name>Acknowledgments</name>
      <t>
   The authors would like to thank Robert Obryk <contact fullname="Robert Obryk"/> for suggesting
   improvements to the format and the text of the specification.</t>
    </section>

  </back> </rfc>