aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorFelix Handte <w@felixhandte.com>2020-12-10 01:20:51 -0500
committerGitHub <noreply@github.com>2020-12-10 01:20:51 -0500
commitfe48742c69e71843254b0f05557b16046850cd0e (patch)
treedcbf248c55ab6932a03f413f15e1c309a169a32c
parentf861e8c07befb495f96ea4f3f121f417308eb914 (diff)
parent2d46d764cf4b3a915f45ffeb2c9a85f7ee213fd8 (diff)
downloadzstd-fe48742c69e71843254b0f05557b16046850cd0e.tar.gz
Merge pull request #2422 from felixhandte/doc-update-repcodes
Update Zstd Compression Format to Clarify Repcode Behavior
-rw-r--r--doc/zstd_compression_format.md70
1 files changed, 37 insertions, 33 deletions
diff --git a/doc/zstd_compression_format.md b/doc/zstd_compression_format.md
index c69e63dc..0af6bf91 100644
--- a/doc/zstd_compression_format.md
+++ b/doc/zstd_compression_format.md
@@ -16,7 +16,7 @@ Distribution of this document is unlimited.
### Version
-0.3.6 (25/05/20)
+0.3.7 (2020-12-09)
Introduction
@@ -918,38 +918,41 @@ Note that blocks which are not `Compressed_Block` are skipped, they do not contr
###### Offset updates rules
-The newest offset takes the lead in offset history,
-shifting others back by one rank,
-up to the previous rank of the new offset _if it was present in history_.
-
-__Examples__ :
-
-In the common case, when new offset is not part of history :
-`Repeated_Offset3` = `Repeated_Offset2`
-`Repeated_Offset2` = `Repeated_Offset1`
-`Repeated_Offset1` = `NewOffset`
-
-When the new offset _is_ part of history, there may be specific adjustments.
-
-When `NewOffset` == `Repeated_Offset1`, offset history remains actually unmodified.
-
-When `NewOffset` == `Repeated_Offset2`,
-`Repeated_Offset1` and `Repeated_Offset2` ranks are swapped.
-`Repeated_Offset3` is unmodified.
-
-When `NewOffset` == `Repeated_Offset3`,
-there is actually no difference with the common case :
-all offsets are shifted by one rank,
-`NewOffset` (== `Repeated_Offset3`) becomes the new `Repeated_Offset1`.
-
-Also worth mentioning, the specific corner case when `offset_value` == 3,
-and the literal length of the current sequence is zero.
-In which case , `NewOffset` = `Repeated_Offset1` - 1_byte.
-Here also, from an offset history update perspective, it's just a common case :
-`Repeated_Offset3` = `Repeated_Offset2`
-`Repeated_Offset2` = `Repeated_Offset1`
-`Repeated_Offset1` = `NewOffset` ( == `Repeated_Offset1` - 1_byte )
-
+During the execution of the sequences of a `Compressed_Block`, the
+`Repeated_Offsets`' values are kept up to date, so that they always represent
+the three most-recently used offsets. In order to achieve that, they are
+updated after executing each sequence in the following way:
+
+When the sequence's `offset_value` does not refer to one of the
+`Repeated_Offsets`--when it has value greater than 3, or when it has value 3
+and the sequence's `literals_length` is zero--the `Repeated_Offsets`' values
+are shifted back one, and `Repeated_Offset1` takes on the value of the
+just-used offset.
+
+Otherwise, when the sequence's `offset_value` refers to one of the
+`Repeated_Offsets`--when it has value 1 or 2, or when it has value 3 and the
+sequence's `literals_length` is non-zero--the `Repeated_Offsets` are re-ordered
+so that `Repeated_Offset1` takes on the value of the used Repeated_Offset, and
+the existing values are pushed back from the first `Repeated_Offset` through to
+the `Repeated_Offset` selected by the `offset_value`. This effectively performs
+a single-stepped wrapping rotation of the values of these offsets, so that
+their order again reflects the recency of their use.
+
+The following table shows the values of the `Repeated_Offsets` as a series of
+sequences are applied to them:
+
+| `offset_value` | `literals_length` | `Repeated_Offset1` | `Repeated_Offset2` | `Repeated_Offset3` | Comment |
+|:--------------:|:-----------------:|:------------------:|:------------------:|:------------------:|:-----------------------:|
+| | | 1 | 4 | 8 | starting values |
+| 1114 | 11 | 1111 | 1 | 4 | non-repeat |
+| 1 | 22 | 1111 | 1 | 4 | repeat 1; no change |
+| 2225 | 22 | 2222 | 1111 | 1 | non-repeat |
+| 1114 | 111 | 1111 | 2222 | 1111 | non-repeat |
+| 3336 | 33 | 3333 | 1111 | 2222 | non-repeat |
+| 2 | 22 | 1111 | 3333 | 2222 | repeat 2; swap 1 & 2 |
+| 3 | 33 | 2222 | 1111 | 3333 | repeat 3; rotate 3 to 1 |
+| 3 | 0 | 2221 | 2222 | 1111 | insert resolved offset |
+| 1 | 0 | 2222 | 2221 | 3333 | repeat 2 |
Skippable Frames
@@ -1666,6 +1669,7 @@ or at least provide a meaningful error code explaining for which reason it canno
Version changes
---------------
+- 0.3.7 : clarifications for Repeat_Offsets
- 0.3.6 : clarifications for Dictionary_ID
- 0.3.5 : clarifications for Block_Maximum_Size
- 0.3.4 : clarifications for FSE decoding table