Linearly back off the frequency of overflow correction based on the
number of times the `ZSTD_window_t` has been overflow corrected. This
will still allow the fuzzer to quickly find overflow correction bugs,
while also keeping good speed for larger inputs.
Additionally, the `nbOverflowCorrections` variable can be useful for
debugging coredumps, since we can inspect the `ZSTD_CCtx` to see if
overflow correction has happened yet.
I've verified this fixes the timeouts in OSS-Fuzz (176 seconds -> 6
seconds). I've also verified that fuzzers and `fuzzer` and `zstreamtest`
still catch the row-hash overflow correction bug.
The FAQ covers the questions asked in Issue #2566. It first covers why
you would want to use a dictionary, then what a dictionary is, and
finally it tells you how to train a dictionary, and clarifies some of
the parameters.
There is definitely more that could be said about some of the advanced
trainers, but this should be a good start.
This flag forces zstd to always load the prefix in ext-dict mode, even
if it happens to be contiguous, to force determinism. It also applies to
dictionaries that are re-processed.
A determinism test case is also added, which fails without
`ZSTD_c_deterministicRefPrefix` and passes with it set.
Question: Should this be the default behavior? It isn't in this PR.
* Take `params` by const reference in `ZSTD_resetCCtx_internal()`.
* Add `simpleApiParams` to the CCtx and use them in the simple API
functions, instead of creating those parameters on the stack.
I think this is a good direction to move in, because we shouldn't need
to worry about adding parameters to `ZSTD_CCtx_params`, since it should
always be on the heap (unless they become absoultely gigantic).
Some `ZSTD_CCtx_params` are still on the stack in the CDict functions,
but I've left them for now, because it was a little more complex, and we
don't use those functions in stack-constrained currently.
pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s
I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.
This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
previous lower limit was 1 MB.
Note : by default, the lowest job size is 2 MB, achieved at level 1.
Even lower job sizes can be achieved by manipulating this value directly,
or manually modifying window sizes to lower amounts.
Updated unit test to ensure that this new limit works fine
(test would fail with previous 1 MB limit).
LDM does especially poorly on repetitive data when that data's hash happens
to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or
random chance. Optimize this case by skipping over repetitive patterns.
The detection is very simplistic, but should catch most of the offending
cases.
```
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
21.187881087 seconds time elapsed
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
1.149707921 seconds time elapsed
```
Any stable API entry point introduced after v1.0
should be documented with its minimum version number.
Since PR fixes this requirement
updating mostly new entry points since v1.4.0
and newly introduced ones for future v1.5.0.
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
got the correction wrong in this case, and our chain tables and binary
trees would be corrupted. Now, we work as long as `maxDist` is a power
of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
run overflow correction as frequently as allowed without impacting
compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
`zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
speed penalty at most, which seems reasonable.
`zstd_errors.h` and `zdict.h` are public headers, so they deserve to be
in the root `lib/` directory with `zstd.h`, not mixed in with our private
headers.
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
* Perform 64-byte alignment of wksp tables and aligneds internally
* Clean up cwskp_finalize() function to only do two allocs
* Refactor aligned/buffer reservation code, remove ASAN req for alignment reservations
* Change from allocating 128 bytes always to allocating only buffer space as needed for tables/aligned
* Back out aligned/table reservation order restriction
* Add stricter bounds for new/resized wksps, fix comment in zstd_cwksp.h
* Do not emit last partitions of blocks as RLE/uncompressed
* Fix repcode updates within block splitter
* Add a entropytables confirm function, redo ZSTD_confirmRepcodesAndEntropyTables() for better function signature
* Add a repcode updater to block splitter, no longer need to force emit compressed blocks
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
Following #2545,
I noticed that one field in `seq_t` is optional,
and only used in combination with prefetching.
(This may have contributed to static analyzer failure to detect correct initialization).
I then wondered if it would be possible to rewrite the code
so that this optional part is handled directly by the prefetching code
rather than delegated as an option into `ZSTD_decodeSequence()`.
This resulted into this refactoring exercise
where the prefetching responsibility is better isolated into its own function
and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations.
Incidently, due to better code locality,
it reduces the need to send information around,
leading to simplified interface, and smaller state structures.
* Move `counting` to a struct in `FSE_decompress_wksp_body()`
* Fix error code in `FSE_decompress_wksp_body()`
* Rename a variable in `HUF_ReadDTableX2_Workspace`
The simple compression functions are intended to ignore the advanced
parameters, but they were accidentally using them. All the
`ZSTD_parameters` were set correctly, but any extra parameters were
used as-is. E.g. `ZSTD_c_format`.
This PR makes all the simple single-pass functions listed below ignore
the advanced parameters, as intended.
* `ZSTD_compressCCtx()`
* `ZSTD_compress_usingDict()`
* `ZSTD_compress_usingCDict()`
* `ZSTD_compress_advanced()`
* `ZSTD_compress_usingCDict_advanced()`
It also adds a test case that ensures that each of these functions
ignore the advanced parameters.
Forward the correct compressionLevel to the appliedParams in all cases.
It was already correct for the advanced API, so only the old single-pass
functions needed to be fixed.
This compression level is unused by the library, but is set so that the
tracing framework can consume it.
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.
This keeps the simple case simple.
Treat ZSTD_getCParams() and ZSTD_adjustCParams() in the same way
we treat streaming compression. Choose parameters based on the
dictionary size + source size, and assume the source size is small
if unkown. But, don't shrink the window log down in
ZSTD_adjustCParams_internal().
Fixes#2442.
1. When creating a dictionary keep the same behavior as before.
Assume the source size is 513 bytes when adjusting parameters.
2. When calling ZSTD_getCParams() or ZSTD_adjustCParams() keep
the same behavior as before.
3. When attaching a dictionary keep the same behavior of ignoring
the dictionary size. When streaming this will select the
largest parameters and not adjust them down. But, the CDict
will use the correctly sized parameters, which seems like the
right tradeoff.
4. When not attaching a dictionary (either forced not to, or
using a prefix dictionary) we select parameters based on the
dictionary size + source size, and assume the source size is
small, which is the same behavior as before. But, now we don't
adjust the window log (and hash and chain log) down when the
source size is unknown.
When the source size is unknown all cdicts should attach, except
when the user disables attaching, or `forceWindow` is used. This
means that when streaming with a CDict we end up in the good case
where we get small CDict parameters, and large source parameters.
TODO: Add a streaming + dictionary regression test case.
This ensures the symbols aren't redefined, which would result in a compiler
error.
I was getting redefined symbols for _LARGEFILE64_SOURCE when building for
32-bit x86 Linux on an older CentOS release in a CI environment. With this
change, I'm able to compile the single file library in this environment.
Closes#2443.
Comment and refactor `HUF_buildCTable()` and the helper functions
it calls as I read and understand the code. Hopefully this refactor
makes the code a bit more clear.
We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
When we have a stable output buffer take the single-pass shortcut.
It is okay to return `dstSizeTooSmall` if the output buffer isn't
big enough, because we know it will never grow.
Sets these parameters in ZSTD_compress2() then resets them to their
orignal values after the compression call.
An alternative design could be to add a flush mode `ZSTD_e_singlePass`
which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single
compression call, by directly setting the applied parameters. I've opted
for the smaller change, but this is open for discussion.
Previously only `nbWorkers` was set. Set all parameters, because that is
what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer`
parameters.
Makefile now automatically detects modifications of compilation flags,
and produce object files in directories dedicated to this compilation flags.
This makes it possible, for example, to compile libzstd with different DEBUGLEVEL.
Object files sharing the same configration will be generated into their dedicated directories.
Also : new compilation variables
- DEBUGLEVEL : select the debug level (assert & traces) inserted during compilation (default == 0 == release)
- HASH : select a hash function to differentiate configuration (default == md5sum)
- BUILD_DIR : skip the hash stage, store object files into manually specified directory
%.o object files generated for dynamic library
must be different from those generated for static library.
Due to this difference, %.o were so far only generated for the static library.
The dynamic library was rebuilt from %.c source.
This meant that, for every minor change, the entire dynamic library had to be rebuilt.
This is fixed in this PR :
only the modified %.c source get rebuilt.
There are compilation environments in aarch64 where NEON isn't
available. While these environments could define ZSTD_NO_INTRINSICS,
it's more fail-safe to use the more specific symbol to know if NEON
extensions are available.
__ARM_NEON is the proper symbol, defined in ARM C Language Extensions
Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some
sources suggest __ARM_NEON__, but that's the obsolete spelling from
prior versions of the standard.
Signed-off-by: Warner Losh <imp@bsdimp.com>
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
for the next sync point.
This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.
Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.
The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
Simplifies the code and removes blocking from zstdmt.
At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.
Fixes#2327.
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.
Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.
Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.
Adds a simple test to ensure that we aren't downsizing too far.
The DDS structure can't be copied into the working tables like the DMS.
So it doesn't need to account for the source size when sizing its
parameters, just the dictionary size.
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).
Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.
Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.
Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
Even if the discrepancies are at the moment benign, it's probably better to
standardize on using the one true initializer, rather than trying (and failing)
to correctly duplicate its behavior.
Rewrite ZSTD_createCDict_advanced() as a wrapper around
ZSTD_createCDict_advanced2(). Evaluate whether to use DDSS mode *after* fully
resolving cparams. If not, fall back.
Rather than restrict our temp chain table to 2 ** chainLog entries, this
commit uses all available space to reach further back to gather longer
chains to pack into the DDSS chain table.
Rather than interleave all of the chain table entries, tying each entry's
position to the corresponding position in the input, this commit changes the
layout so that all the entries in a single chain are laid out next to each
other. The last entry in the hash table's bucket for this hash is now a packed
pointer of position + length of this chain.
This cannot be merged as written, since it allocates temporary memory inside
ZSTD_dedicatedDictSearch_lazy_loadDictionary().
Previously, if DDSS was enabled on a CCtx and a dictionary was inserted into
the CCtx, the CCtx MatchState would be filled as a DDSS struct, causing
segfaults etc. This changes the check to use whether the MatchState is marked
as using the DDSS (which is only ever set for CDict MatchStates), rather than
looking at the CCtxParams.
Entries in the hashTable chain cache aren't subject to the same aliasing that
the circular chain table is subject to. As such, we don't need to stop when we
cross the chain limit. We can delve deeper. :)