credit to oss-fuzz
This issue could happen when using the new Sequence Compression API in Explicit Delimiter Mode
with a too small dstCapacity.
In which case, there was one place where the buffer size wasn't checked.
* It couldn't detect that the `fastCoverParams` can't be non-null, since it was just an assertion.
* It thought we were accesing `wksp->dtable` beyond the bounds because we were using it to set the `workSpace` value. Instead, compute the workspace size used in a different way.
discovered by oss-fuzz
It's a bug in the test itself :
ZSTD_compressBound() as an upper bound of the compress size
only works for data compressed "normally".
But in situations where many flushes are forcefully introduced,
this creates many more blocks,
each of which has a potential to increase the size by 3 bytes.
In extreme cases (lots of small incompressible blocks), the expansion can go beyond ZSTD_compressBound().
This situation is similar when using the CompressSequences() API
with Explicit Block Delimiters.
In which case, each explicit block acts like a deliberate flush.
When employed by a fuzzer, it's possible to generate scenarios like the one described above,
with tons of incompressible blocks of small sizes,
thus going beyond ZSTD_compressBound().
fix : when using Explicit Block Delimiters, use a larger bound, to account for this scenario.
credit to oss-fuzz
In rare circumstances, the block-splitter might cut a block at the exact beginning of a repcode.
In which case, since litlength=0, if the repcode expected 1+ literals in front, its signification changes.
This scenario is controlled in ZSTD_seqStore_resolveOffCodes(),
and the repcode is transformed into a raw offset when its new meaning is incorrect.
In more complex scenarios, the previous block might be emitted as uncompressed after all,
thus modifying the expected repcode history.
In the case discovered by oss-fuzz, the first block is emitted as uncompressed,
so the repcode history remains at default values: 1,4,8.
But since the starting repcode is repcode3, and the literal length is == 0,
its meaning is : = repcode1 - 1.
Since repcode1==1, it results in an offset value of 0, which is invalid.
So that's what the `assert()` was verifying : the result of the repcode translation should be a valid offset.
But actually, it doesn't matter, because this result will then be compared to reality,
and since it's an invalid offset, it will necessarily be discarded if incorrect,
then the repcode will be replaced by a raw offset.
So the `assert()` is not useful.
Furthermore, it's incorrect, because it assumes this situation cannot happen, but it does, as described in above scenario.
which properly represents the maximum bit size of compressed literals (11) as defined in the specification.
To be preferred from HUF_TABLELOG_DEFAULT which represents the same value but by accident.
Name selected to keep the same convention as existing width definitions,
MLFSELog, LLFSELog and OffFSELog.
In rare cases, the default huffman depth selector is a bit too harsh,
requiring brutal adaptations to the tree,
resulting is some loss of compression ratio.
This new heuristic avoids the worse cases, favoring compression ratio.
As an example, compression of a specific distribution of 771 literals
is now improved to 441 bytes, from 601 bytes before.
* Async IO decompression:
- Added --[no-]asyncio flag for CLI decompression.
- Replaced dstBuffer in decompression with a pool of write jobs.
- Added an ability to execute write jobs in a separate thread.
- Added an ability to wait (join) on all jobs in a thread pool (queued and running).
Append -z cet-report=error to LDFLAGS if -fcf-protection is enabled by
default in compiler to catch the missing Intel CET marker:
compiling multi-threaded dynamic library 1.5.1
/usr/local/bin/ld: obj/conf_f408b4c825de923ffc88f7f21b6884b1/dynamic/huf_decompress_amd64.o: error: missing IBT and SHSTK properties
collect2: error: ld returned 1 exit status
...
LINK obj/conf_dbc0b41e36c44111bb0bb918e093d7c1/zstd
/usr/local/bin/ld: obj/conf_dbc0b41e36c44111bb0bb918e093d7c1/huf_decompress_amd64.o: error: missing IBT and SHSTK properties
collect2: error: ld returned 1 exit status
Intel Control-flow Enforcement Technology (CET):
https://en.wikipedia.org/wiki/Control-flow_integrity#Intel_Control-flow_Enforcement_Technology
requires that on Linux, all linker input files are marked as CET enabled
in .note.gnu.property section. For high-level language source codes,
.note.gnu.property section is added by compiler with the -fcf-protection
option. For assembly sources, include <cet.h> to add .note.gnu.property
section.
oss-fuzz uncovered a scenario where we're evaluating the cost of litLength = 131072,
which can't be represented in the zstd format, so we accessed 1 beyond LL_bits.
Fix the issue by making it cost 1 bit more than litLength = 131071.
There are still follow ups:
1. This happened because literals_cost[0] = 0, so the optimal parser chose 36 literals
over a match. Should we bound literals_cost[literal] > 0, unless the block truly only
has one literal value?
2. When no matches are found, the cost model isn't updated. In this case no matches were
found for an entire block. So the literals cost model wasn't updated at all. That made
the optimal parser think literals_cost[0] = 0, where it is actually quite high, since
the block was entirely random noise.
Credit to OSS-Fuzz.
This commit makes several changes:
1. It adds modules for the dictionary builder and errors headers.
2. It captures all of the macros that are used to configure these headers.
When the headers are imported as modules and one of these macros is defined
the compiler issues a warning that it needs to be defined on the CLI.
3. It promotes the modulemap file into the root of the lib directory.
Experimentation shows that clang's `-fimplicit-module-maps` will find the
modulemap when placed here, but not when it's put in a subdirectory.
When re-using a compression state, across multiple successive compressions,
the state should minimize the amount of allocation and initialization required.
This mostly matters in situations where initialization is an overwhelming task
compared to compression itself.
This can happen when the amount to compress is small,
while the compression state was given the impression that it would be much larger,
aka, streaming mode without providing a srcSize hint.
This lean-initialization optimization was broken in 980f3bbf83 .
This commit fixes it, making this scenario once again on par with v1.4.9.
Note that this does not completely fix#2966,
since another heavy initialization, specific to row mode,
is also happening (and was not present in v1.4.9).
This will be fixed in a separate commit.
Apparently, even when the assembly file is empty (because
`ZSTD_ENABLE_ASM_X86_64_BMI2` is false), it still is marked as possibly
needing an executable stack and so the whole library is marked as such. This
commit applies a simple patch for this problem by moving the noexecstack
indication outside the macro guard.
This commit builds on #2857.
This commit addresses #2963.
directly at ZSTD_storeSeq() interface.
In the process, remove ZSTD_REP_MOVE.
This makes it possible, in future commits,
to update and effectively simplify the naming scheme
to properly label the updated processing pipeline :
offset | repcode => offBase => offCode + offBits
the new contracts seems to make more sense :
updateRep() updates an array of repeat offsets _in place_,
while newRep() generates a new structure with the updated repeat-offset array.
Most callers are actually expecting the in-place variant,
and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.
to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`.
This makes it possible to abstract away this representation by using the macros to extract these values.
First user : ZSTD_updateRep() .
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.
Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.
While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".
One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.
This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.
At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
`CFLAGS=-O0 make`
will now use `-O0` instead of enforcing `-O3`
which used to be the behavior before introduction of `libzstd.mk`.
This should result in faster tests,
since a few tests depend on this capability for faster roundtrips.
since this is effectively what is stored in this field (== matchLength - MINMATCH).
This makes it clearer what needs to be done when reading from / writing to this field.
the variable has only very limited usage,
being only used once at the beginning of the block for prefetching only,
hence the error had no impact on compression ratio.
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.
Should not have negative performance implications.
mostly for maintenance convenience.
Performance wise, there is very little change,
slightly faster for slog 3 & 4,
neutral or very slightly negative for slot 5 & 6.
I couldn't find a good way to spread `ip0` and `ip1` apart when we accelerate
due to incompressible inputs. (The methods I tried slowed things down quite a
bit.)
Since we aren't splaying ip0 and ip1 apart (which would be like `0_1_2_3_`, as
opposed to the `01__23__` we were actually doing), it's a big ambitious to
increment `step` by 2. Instead, let's increment it by 1, which has the benefit
sliiightly improving compression. Speed remains pretty much unchanged.
The position updates are rewritten from `ip[N] = ip[N-1] + step` to be
`ip[N] = ip[N-2] + step`. This lets us only deal with the asymmetric spacing
of gaps at setup and then we only have to keep a single `step` variable.
This seems to work quite well on GCC and Clang!