We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
When we have a stable output buffer take the single-pass shortcut.
It is okay to return `dstSizeTooSmall` if the output buffer isn't
big enough, because we know it will never grow.
Sets these parameters in ZSTD_compress2() then resets them to their
orignal values after the compression call.
An alternative design could be to add a flush mode `ZSTD_e_singlePass`
which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single
compression call, by directly setting the applied parameters. I've opted
for the smaller change, but this is open for discussion.
Previously only `nbWorkers` was set. Set all parameters, because that is
what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer`
parameters.
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
for the next sync point.
This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.
Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.
The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
Simplifies the code and removes blocking from zstdmt.
At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.
Fixes#2327.
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.
Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.
Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.
Adds a simple test to ensure that we aren't downsizing too far.
The DDS structure can't be copied into the working tables like the DMS.
So it doesn't need to account for the source size when sizing its
parameters, just the dictionary size.
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).
Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.
Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.
Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
Even if the discrepancies are at the moment benign, it's probably better to
standardize on using the one true initializer, rather than trying (and failing)
to correctly duplicate its behavior.
Rewrite ZSTD_createCDict_advanced() as a wrapper around
ZSTD_createCDict_advanced2(). Evaluate whether to use DDSS mode *after* fully
resolving cparams. If not, fall back.
Rather than restrict our temp chain table to 2 ** chainLog entries, this
commit uses all available space to reach further back to gather longer
chains to pack into the DDSS chain table.
Rather than interleave all of the chain table entries, tying each entry's
position to the corresponding position in the input, this commit changes the
layout so that all the entries in a single chain are laid out next to each
other. The last entry in the hash table's bucket for this hash is now a packed
pointer of position + length of this chain.
This cannot be merged as written, since it allocates temporary memory inside
ZSTD_dedicatedDictSearch_lazy_loadDictionary().
Previously, if DDSS was enabled on a CCtx and a dictionary was inserted into
the CCtx, the CCtx MatchState would be filled as a DDSS struct, causing
segfaults etc. This changes the check to use whether the MatchState is marked
as using the DDSS (which is only ever set for CDict MatchStates), rather than
looking at the CCtxParams.
Entries in the hashTable chain cache aren't subject to the same aliasing that
the circular chain table is subject to. As such, we don't need to stop when we
cross the chain limit. We can delve deeper. :)
This caused us to double-search the first position and fail to search the
last position in the chain, slowing down search and making it less effective.
This makes it clear that not only is the feature allowed here, we're actually
using it, as opposed to the CCtxParam field, in which it's enabled, but we may
or may not be using it.
The unused function definitions are hidden behind a
`#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check.
Initially hiding all functions which are unused and take up more than
2KB of stack space, because these will show up as warnings in the
Linux Kernel build system.
Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression:
cdict->dictContentSize * 6U
with type unsigned int (32 bits, unsigned) is evaluated using 32-bit
arithmetic, and then used in a context that expects an expression of
type U64 (64 bits, unsigned).
new version easier to vectorize
leads to smaller code and faster execution
notably at the last recombination stage
(basically, fixed cost per block).
Assembly inspected with godbolt
On my laptop, with `clang` and `-mavx2` :
2K block : 1280 MB/s -> 1550 MB/s
8K block : 1750 MB/s -> 1860 MB/s
The bound check condition should always be met because we selected `set_basic` as
our encoding type. But that code is very far away, so assert it is true so if it is
ever false we can catch it, and add a bounds check.
Fixes#2213.
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.
Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.
Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.
Fixes#2174.
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.
Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```
TODO: Add an explicit test that loads a small dictionary in MT mode
This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams`
into a helper. It then migrates two other callsites to use that helper,
a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which
folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`,
which is more invasive.
This attempts to guarantee that the estimates returned to users are always
correct.
`ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which
is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the
buffers, assuming they'll be zero. However, the actual workspace allocation
logic always allocates those buffers, and when running under ASAN, the
workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized
buffers end up consuming 512 bytes of space, which is accounted for in the
actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't
in the estimation path, since it ignores the buffers entirely.
This commit fixes this.
Fixes:
Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:
Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:
O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes#2096
* adding long support for patch-from
* adding refPrefix to dictionary_decompress
* adding refPrefix to dictionary_loader
* conversion nit
* triggering log mode on chainLog < fileLog and removing old threshold
* adding refPrefix to dictionary_round_trip
* adding docs
* adding enableldm + forceWindow test for dict
* separate patch-from logic into FIO_adjustParamsForPatchFromMode
* moving memLimit adjustment to outside ifdefs (need for decomp)
* removing refPrefix gate on dictionary_round_trip
* rebase on top of dev refPrefix change
* making sure refPrefx + ldm is < 1% of srcSize
* combining notes for patch-from
* moving memlimit logic inside fileio.c
* adding display for optimal parser and long mode trigger
* conversion nit
* fuzzer found heap-overflow fix
* another conversion nit
* moving FIO_adjustMemLimitForPatchFromMode outside ifndef
* making params immutable
* moving memLimit update before createDictBuffer call
* making maxSrcSize unsigned long long
* making dictSize and maxSrcSize params unsigned long long
* error on files larger than 4gb
* extend refPrefix test to include round trip
* conversion to size_t
* making sure ldm is at least 10x better
* removing break
* including zstd_compress_internal and removing redundant macros
* exposing ZSTD_cycleLog()
* using cycleLog instead of chainLog
* add some more docs about user optimizations
* formatting
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized
The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.
The copyright and license of `divsufsort.{h,c}` is not changed.
Super blocks must never violate the zstd block bound of input_size + ZSTD_blockHeaderSize. The individual sub-blocks may, but not the super block. If the superblock violates the block bound we are liable to violate ZSTD_compressBound(), which we must not do. Whenever the super block violates the block bound we instead emit an uncompressed block.
This means we increase the latency because of the single uncompressed block. I fix this by enabling streaming an uncompressed block, so the latency of an uncompressed block is 1 byte. This doesn't reduce the latency of the buffer-less API, but I don't think we really care.
* I added a test case that verifies that the decompression has 1 byte latency.
* I rely on existing zstreamtest / fuzzer / libfuzzer regression tests for correctness. During development I had several correctness bugs, and they easily caught them.
* The added assert that the superblock doesn't violate the block bound will help us discover any missed conditions (though I think I got them all).
Credit to OSS-Fuzz.
* Allow zero sized buffers in `stream_decompress`. Ensure that we never have two
zero sized buffers in a row so we guarantee forwards progress.
* Make case 4 in `stream_round_trip` do a zero sized buffers call followed by
a full call to guarantee forwards progress.
* Fix `limitCopy()` in legacy decoders.
* Fix memcpy in `zstdmt_compress.c`.
Catches the bug fixed in PR #1939
* Adding fail logging for superblock flow
* Dividing by targetCBlockSize instead of blockSize
* Adding new const and using more acurate formula for nbBlocks
* Only do dstCapacity check if using superblock
* Remvoing disabling logic
* Updating test to make it catch more extreme case of previou bug
* Also updating comment
* Only taking compressEnd shortcut on non-superblock
Fixes new fuzz issue
Credit to OSS-Fuzz
* Initializing unsigned value
* Initialilzing to 1 instead of 0 because its more conservative
* Unconditionoally setting to check first and then checking zero
* Moving bool to before block for c90
* Move check set before block
Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table.
* Only setting loaded dict huf table to valid on non-zero
* Adding hasNoZeroWeights test to fse tables
* Forbiding nbBits != 0 when weight == 0
* Reverting the last commit
* Setting table log to 0 when weight == 0
* Small (invalid) zero weight dict test
* Small (valid) zero weight dict test
* Initializing repeatMode vars to check before zero check
* Removing FSE changes to seperate pr
* Reverting accidentally changed file
* Negating bool, using unsigned, optimization nit
This parameter is unused in single-threaded compression. We should make it
behave like the other multithread-only parameters, for which we only accept
zero when we are not built with multithreading.
* Silently skip dictionaries less than 8 bytes, unless using `ZSTD_dct_fullDict`.
This changes the compressor, which silently skips dictionaries <= 8 bytes.
* Allow repcodes that are equal to the dictionary content size, since it is in bounds.
Compression ratio of fast strategies (levels 1 & 2)
was seriously reduced, due to accidental disabling of Literals compression.
Credit to @QrczakMK, which perfectly described the issue, and implementation details,
making the fix straightforward.
Example : initCStream with level 1 on synthetic sample P50 :
Before : 5,273,976 bytes
After : 3,154,678 bytes
ZSTD_compress (for comparison) : 3,154,550
Fix#1787.
To follow : refactor the test which was supposed to catch this issue (and failed)