Commit Graph

4226 Commits

Author SHA1 Message Date
Lvv.me
81b96205af
Using module.modulemap replace symbol link for public header 2021-11-15 13:23:50 +08:00
Lvv.me
27455638aa
Support Swift Package Manager 2021-11-14 17:29:33 +08:00
ko-zu
c67e07f34e Remove executable flag from GNU_STACK section
Putting stack marking into every assembly files is required to indicate
that the stack does not need to be executable.
Executable flag on stack conflicts with some security measures, Systemd
MemoryDenyWriteExecute=yes for example.
2021-11-13 22:58:33 +09:00
Dimitris Apostolou
ebbd675998
Fix typos 2021-11-13 10:04:04 +02:00
Yann Collet
9ba07907c8
Merge pull request #2836 from animalize/copy16
ZSTD_copy16() uses ZSTD_memcpy()
2021-11-11 07:53:08 -08:00
W. Felix Handte
48572f52b1 Rewrite Fix to Still Auto-Vectorize 2021-11-09 12:17:03 -05:00
W. Felix Handte
61765cacd0 Avoid Reducing Indices to Reserved Values
Previously, if an index was equal to `reducerValue + 1`, it would get remapped
during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the
parsing of the input slightly, by causing tree nodes to be nullified when they
otherwise wouldn't be. This hardly matters from a correctness or efficiency
perspective, but it does impact determinism.

So this commit changes index reduction to avoid mapping indices to collide with
`ZSTD_DUBT_UNSORTED_MARK`.
2021-11-08 20:03:52 -05:00
Nick Terrell
d46995efeb Backport zstd patch from LKML
Credit to Nathan Chancellor for the bug fix and Nick Desaulniers for the
bug report.

Link: https://github.com/ClangBuiltLinux/linux/issues/1486
Link: https://lore.kernel.org/all/20211021202353.2356400-1-nathan@kernel.org/
2021-11-05 14:09:49 -07:00
senhuang42
384744888e Void out unused functions 2021-11-04 14:32:07 +03:00
Ma Lin
b10357ce65 ZSTD_copy16() uses SSE2 instructions
This accelerates the decompression speed of MSVC build.
2021-11-04 11:37:10 +08:00
binhdvo
b399b47467
Move mingw tests from appveyor to github actions (#2838) 2021-11-02 13:17:55 -04:00
binhdvo
04734ee84a
Fix oss fuzz test error (#2837) 2021-10-29 10:29:50 -04:00
Yann Collet
aba88fa996
Merge pull request #2829 from facebook/ZSTD_DECODER_INTERNAL_BUFFER
minor : change build macro to ZSTD_DECODER_INTERNAL_BUFFER
2021-10-26 10:48:16 -07:00
Yann Collet
2b2a5c449a fix minor cast warning 2021-10-26 08:38:17 -07:00
Yann Collet
518f06b281 added minimum for decoder buffer
also : introduced macro BOUNDED()
2021-10-26 08:21:31 -07:00
Yann Collet
12e177cba8
Merge pull request #2830 from facebook/clevels
separate compression level tables into their own file
2021-10-25 13:35:54 -07:00
Nick Terrell
ad739e5959
Merge pull request #2828 from terrelln/lazy-compile
[lazy] Speed up compilation times
2021-10-25 10:22:23 -07:00
Yann Collet
082d6c6775 separate compression level tables into their own files
that's clearer than finding the tables somewhere in the middle of `compress.c`.

Also, down the line, it may potentially allows zstd to feature adjusted tables depending on target cpu.
2021-10-25 08:49:54 -07:00
Yann Collet
02be2a830f build macro ZSTD_DECODER_INTERNAL_BUFFER
just to make the topic more accessible for potential users.
2021-10-25 08:09:04 -07:00
binhdvo
6a7ede3dfc
Reduce size of dctx by reutilizing dst buffer (#2751)
* Reduce size of dctx by reutilizing dst buffer

Co-authored-by: Binh Vo <binhvo@fb.com>
2021-10-25 10:38:01 -04:00
Yann Collet
0a794f5afe
Merge pull request #2822 from marxin/fix-zstd-thread-pool-documentation
Support thread pool section in HTML documentation.
2021-10-22 16:46:08 -07:00
Nick Terrell
13cad3abb1 [lazy] Speed up compilation times
Speed up compilation times by moving each specialized search function
into its own function. This is faster because compilers can handle many
smaller functions much faster than one gigantic function. The previous
approach generated one giant function with `switch` statements and
inlining to select the implementation.

| Compiler | Flags                               | Dev Time (s) | PR Time (s) | Delta |
|----------|-------------------------------------|--------------|-------------|-------|
| gcc      | -O3                                 |         16.5 |         5.6 |  -66% |
| gcc      | -O3 -g -fsanitize=address,undefined |        158.9 |        38.2 |  -75% |
| clang    | -O3                                 |         36.5 |         5.5 |  -85% |
| clang    | -O3 -g -fsanitize=address,undefined |         27.8 |        17.5 |  -37% |

This also reduces the binary size because the search functions are no
longer inlined into the main body.

| Compiler | Dev libzstd.a Size (B) | PR libzstd.a Size (B) | Delta |
|----------|------------------------|-----------------------|-------|
| gcc      |                1563868 |               1308844 |  -16% |
| clang    |                1924372 |               1376020 |  -28% |

Finally, the performance is not impacted significantly by this change,
in fact we generally see a small speed boost.

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|-------|
| gcc      |     5 |            110.6 |           110.0 | -0.5% |
| gcc      |     7 |             70.4 |            72.2 | +2.5% |
| gcc      |     9 |             53.2 |            53.5 | +0.5% |
| gcc      |    13 |             12.7 |            12.9 | +1.5% |
| clang    |     5 |            113.9 |           110.4 | -3.0% |
| clang    |     7 |             67.7 |            70.6 | +4.2% |
| clang    |     9 |             51.9 |            52.2 | +0.5% |
| clang    |    13 |             12.4 |            13.3 | +7.2% |

The compression strategy is unmodified in this PR, so the compressed size
should be exactly the same. I may have a follow up PR to slightly improve
the compression ratio, if it doesn't cost too much speed.
2021-10-22 13:45:26 -07:00
Nick Terrell
abd717a5fa [asm] Switch to C style comments
Switch to C style comments for increased portability, and consistency.
2021-10-20 11:37:05 -07:00
Yann Collet
9d62957b31
Merge pull request #2800 from animalize/fix_c89
Fix a C89 error in msvc
2021-10-18 14:32:04 -07:00
Martin Liska
1c2b02eee9 Support thread pool section in HTML documentation. 2021-10-15 18:35:39 +02:00
Felix Handte
23c1a2d260
Merge pull request #2774 from felixhandte/zstd-dfast-pipelined-single
Pipelined Implementation of ZSTD_dfast
2021-10-13 16:38:43 -04:00
W. Felix Handte
0bfc935add Convert Outer Control Structure to Loop 2021-10-12 13:34:17 -04:00
Nick Terrell
b77d95b053
Merge pull request #2820 from terrelln/nb-compares
[binary-tree] Fix underflow of nbCompares
2021-10-11 09:59:57 -07:00
Nick Terrell
26486db9ab
Merge pull request #2819 from terrelln/ldm-hash-rate-log
[ldm] Fix ZSTD_c_ldmHashRateLog bounds check
2021-10-08 14:58:29 -07:00
Nick Terrell
802745e88a
Merge pull request #2818 from terrelln/indentation-fix
[nit] Fix buggy indentation
2021-10-08 14:57:52 -07:00
Nick Terrell
c6c482fe07 [binary-tree] Fix underflow of nbCompares
Fix underflow of `nbCompares` by switching to an `int` and comparing
`nbCompares > 0`. This is a minimal fix, because I don't want to change
the logic. These loops seem to be doing `nbCompares + 1` comparisons.

The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 13:22:55 -07:00
Nick Terrell
31316cf158 [multiple-ddicts] Fix NULL checks
The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 11:24:58 -07:00
Nick Terrell
1bbb372e3e [ldm] Fix ZSTD_c_ldmHashRateLog bounds check
There is no minimum value check, so the parameter could be negative.
Switch to the standard pattern of using `BOUNDCHECK()`.

The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 11:17:40 -07:00
Nick Terrell
399644b1f1 [nit] Fix buggy indentation
The bug was reported by Dan Carpenter and found by Smatch static
checker.

https://lore.kernel.org/all/20211008063704.GA5370@kili/
2021-10-08 11:13:11 -07:00
W. Felix Handte
79ca830766 Style: Add Comments to Variables and Move a Couple into the Loop 2021-10-05 16:18:09 -04:00
W. Felix Handte
62536ef7da Simplify DMS Implementation by Removing noDict Support 2021-10-05 14:54:37 -04:00
W. Felix Handte
051b473e7e Fall Back in _extDict to New _noDict Rather than Old Merged Impl 2021-10-05 14:54:37 -04:00
W. Felix Handte
fcab4841aa Nit: Rename Function 2021-10-05 14:54:37 -04:00
W. Felix Handte
47fd762ecc Nit: Unnest Blocks that Don't Declare Anything 2021-10-05 14:54:37 -04:00
W. Felix Handte
2cdfad538c Search One Last Position 2021-10-05 14:54:37 -04:00
W. Felix Handte
6ae44c0db8 Advance Long Index Lookup (+0.5% Speed)
This lookup can be advanced to before the short match check because either way
we will use it (in the next loop iter or in `_search_next_long`).
2021-10-05 14:54:37 -04:00
W. Felix Handte
2ddef7c872 Write Back Advanced Hash in Long Matches as Well (+Ratio)
Since we're now hashing the position ahead even if we find a long match and
don't search that next position, we can write it back into the hashtable even
in long matches. This seems to cost us no speed, and improves compression
ratio slightly!
2021-10-05 14:54:37 -04:00
W. Felix Handte
39f2491bfc Use Look-Ahead Hash for Next Long Check after Short Match (+0.5% Speed)
This costs a little ratio, unfortunately.
2021-10-05 14:54:37 -04:00
W. Felix Handte
db4e1b5479 Hash Long One Position Ahead (+2.5% Speed)
Aside from maybe a latency win in the loop, this means that when we find a
short match, we've already done the hash we need to check the next long match.
2021-10-05 14:54:37 -04:00
W. Felix Handte
a1ac7205d0 Pull Match Found Stuff Out of the Loop 2021-10-05 14:54:37 -04:00
W. Felix Handte
072ffaad67 Extract Working Variables 2021-10-05 14:54:37 -04:00
W. Felix Handte
1bdf041071 Track Step Rather than Recalculating (+0.5% Speed) 2021-10-05 14:54:37 -04:00
W. Felix Handte
258c0623e1 Extract Single-Segment Variant of ZSTD_dfast 2021-10-05 14:54:37 -04:00
stanjo74
52598d54e9
Limit train samples (#2809)
* Limit training samples size to 2GB

* simplified DISPLAYLEVEL() macro to use global vqriable instead of local.

* refactored training samples loading

* fixed compiler warning

* addressed comments from the pull request

* addressed @terrelln comments

* missed some fixes

* fixed type mismatch

* Fixed bug passing estimated number of samples rather insted of the loaded number of samples.
Changed unit conversion not to use bit-shifts.

* fixed a declaration after code

* fixed type conversion compile errors

* fixed more type castting

* fixed more type mismatching

* changed sizes type to size_t

* move type casting

* more type cast fixes
2021-10-04 17:47:52 -07:00
Yann Collet
7868f38019
Merge pull request #2747 from Helflym/dev
Add AIX support in Makefile
2021-10-01 08:13:39 -07:00
Nick Terrell
3a4d421c0f
Merge pull request #2802 from solbjorn/fix-kernel-wundef
[contrib][linux] Fix -Wundef inside Linux kernel tree
2021-09-29 09:48:17 -07:00
Ma Lin
894f05e88d Fix ZSTD_countTrailingZeros() bug
`>> 3` is wrong.
2021-09-29 07:20:09 +08:00
Sen Huang
4b7f45cb04 Pull hot loop into its own function 2021-09-28 08:19:44 -07:00
Sen Huang
ccdcbf4621 Try beginning and end of match 2021-09-28 08:19:44 -07:00
Sen Huang
b8fd6bf30c Skip most long matches in lazy hash table update 2021-09-28 08:19:39 -07:00
Nick Terrell
9ef055d706
Merge pull request #2808 from terrelln/huf-oss-fuzz-fix
[huf] Fix OSS-Fuzz assert
2021-09-27 15:00:52 -07:00
Felix Handte
8b7a19fcd4
Merge pull request #2805 from nolange/smaller_code_with_disabled_features
Smaller code with disabled features
2021-09-27 17:43:21 -04:00
Nick Terrell
a07ddb47f7 [huf] Fix OSS-Fuzz assert
PR #2784 introduced a bug in the decompressor that caused some valid
inputs to fail to decompress. The bitstream isn't reloaded after the 4X*
loop if the number of elements remaining is small enough, causing us to
read more bits than are available in the bitcontainer.

This was caught by the MSAN fuzzer in OSS-Fuzz because the assembly
implementation isn't used in the MSAN build.

Credit to OSS-Fuzz.
2021-09-27 13:56:07 -07:00
Ma Lin
ae986fcdb8 Use __assume(0) for unreachable code path in msvc
msvc will optimize away the condition check.
2021-09-27 19:23:57 +08:00
Yann Collet
2ed14c2476 minor : fix comment
provide correct reasons to include zstd_internal.h
2021-09-26 08:44:18 -07:00
Norbert Lange
6763f40331 zstd_decompress: use a helper function for context create
Multiple ZSTD_createDCtx* functions call other (public)
ZSTD_createDCtx* functions, this makes it harder for humans
and compilers to throw out code that is not used.

This farms out the logic into a static function, if a program
only uses a single ZSTD_createDCtx variant, all others can be easily
dropped and the remaining implementation can be specialized.
2021-09-26 14:41:37 +02:00
Norbert Lange
0d45540695 decompress: conditionally remove bmi2 from context
Use an helper function, which will just return 0 in case
the feature is disabled.
Allows constant propagation and removal of dead code.
2021-09-26 14:41:37 +02:00
Norbert Lange
02296cac82 decompress: conditionally remove legacy members from context
Remove the then unneeded variables from the struct,
and all accesses to them.
2021-09-26 12:12:17 +02:00
Alexander Lobakin
71526e6f29 [contrib][linux] Fix -Wundef inside Linux kernel tree
Commit d7ef97a013
("[build] Fix oss-fuzz build with the dataflow sanitizer") broke
build inside Linux-kernel after 'import', as it no longer can
conditionally remove ZSTD_MEMORY_SANITIZER definition from
the #if DEF_A || DEF_B block. This emits -Wundef warning which
can be treated as error.
Split this preprocessor condition into two separate conditions
to fix this.

Fixes: d7ef97a013 ("[build] Fix oss-fuzz build with the dataflow sanitizer")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
2021-09-25 13:35:25 +02:00
Ma Lin
e5ba858270 Don't initialize the first parameter of _BitScanForward* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64
2021-09-25 16:36:53 +08:00
Ma Lin
95f492ea17 Don't initialize the first parameter of _BitScanReverse* functions
Like the document example, no need to initialize `r` to 0.
https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64
2021-09-25 16:36:53 +08:00
Ma Lin
cc22042da0 Fix a C89 error in msvc
Variables (r) must be declared at the beginning of a code block.
This causes msvc2012 to fail to compile 64-bit build.
2021-09-25 16:32:06 +08:00
Nick Terrell
14772d97be
Merge pull request #2796 from terrelln/linux-fixes
[lib] Make lib compatible with `-Wfall-through` excepting legacy
2021-09-23 16:11:53 -07:00
Nick Terrell
01976ce4cd
Merge pull request #2799 from terrelln/oss-fuzz-build
[build] Fix oss-fuzz build with the dataflow sanitizer
2021-09-23 15:55:10 -07:00
Nick Terrell
1903d6a5a8
Merge pull request #2798 from abxhr/typo-fix
Fix typo
2021-09-23 13:11:45 -07:00
Nick Terrell
d7ef97a013 [build] Fix oss-fuzz build with the dataflow sanitizer
The dataflow sanitizer requires all code to be instrumented. We can't
instrument the ASM function, so we have to disable it.
2021-09-23 11:48:39 -07:00
Abshar Mohammed Aslam
54a888b57b
Fix typo 2021-09-23 21:54:38 +04:00
Nick Terrell
189e87bcbe [lib] Make lib compatible with -Wfall-through excepting legacy
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.

This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.

Also add a test to CI to ensure that we don't regress.
2021-09-23 10:51:18 -07:00
Yann Collet
fa2a4d77c7 constify MatchState* parameter when possible
turns out, it's possible to constify MatchState* parameter
in some parts of the binary tree algorithm,
making it a pure read-only parameter,
as opposed to a mutable state.

This is supposed to be helpful for both maintenance and the compiler.
2021-09-23 08:27:44 -07:00
senhuang42
1d8143c84f Move block splitter from stack to CCtx 2021-09-23 00:02:31 -04:00
sen
044c8b4722
Merge pull request #2779 from senhuang42/fse_fix
Fix NCountWriteBound
2021-09-22 13:51:21 -04:00
sen
1e99d36361
Merge pull request #2788 from senhuang42/param_switch
Use new paramSwitch enum for row matchfinder and block splitter
2021-09-22 13:27:55 -04:00
Nick Terrell
9450876a9d [huf] Fix compilation when DYNAMIC_BMI2=0 && BMI2 is supported
* Fix compilation issues pointed out in PR #2790.
* Add test cases to GitHub actions that test all combinations of
  `DYNAMIC_BMI2` BMI2 support.
2021-09-21 16:49:13 -07:00
senhuang42
06f42c3bfd Use new paramSwitch enum for LDM 2021-09-21 14:22:09 -04:00
senhuang42
b5c35d7ea3 Use new paramSwitch enum for LCM, row matchfinder, and block splitter 2021-09-21 14:22:02 -04:00
Nick Terrell
a5f2c45528 Huffman ASM 2021-09-20 14:46:43 -07:00
Nick Terrell
d7542aacd9 [fuzzer] Add huf_decompress fuzzer
Add a fuzzer for Huffman decompression. Fix several bugs in Huffman
decompression, mostly related to `op == NULL` and pointer underflow.
2021-09-17 15:00:49 -07:00
Nick Terrell
8bf699aa59 [build] Add support for ASM files in Make + CMake
* Extract out common portion of `lib/Makefile` into `lib/libzstd.mk`.
  Most relevantly, the way we find library files.
* Use `lib/libzstd.mk` in the other Makefiles instead of repeating the
  same code.
* Add a test `tests/test-variants.sh` that checks that the builds of
  `make -C programs allVariants` are correct, and run it in Actions.
* Adds support for ASM files in the CMake build.

The Meson build is not updated because it lists every file in zstd,
and supports ASM off the bat, so the Huffman ASM commit will just add
the ASM file to the list.

The Visual Studios build is not updated because I'm not adding ASM
support to Visual Studios yet.
2021-09-17 14:13:53 -07:00
sen
9d2a45a705
Merge pull request #2778 from senhuang42/opt_inlining_revert
Revert opt outlining change
2021-09-15 14:22:10 -04:00
Sen Huang
a7aa2c5df6 Fix NCountWriteBound 2021-09-15 09:51:42 -07:00
Sen Huang
bd84e4a9d3 Revert opt outlining change 2021-09-15 09:08:41 -07:00
Nick Terrell
2fabd370bb
Merge pull request #2777 from terrelln/oss-fuzz-fix
[rsyncable] Fix test failures
2021-09-14 13:20:22 -07:00
Nick Terrell
9d9e2ed00b [rsyncable] Fix test failures
Test failures showed up on the daily cron job. They didn't show up
in CI because the condition is somewhat rare, and didn't trigger
during the CI tests.

This PR fixes up the logic in `findSynchronizationPoint()` to correctly
handle the edge case. It also un-comments an assert that helps catch the
issue, and verify that rsyncable mode is calculating the correct hash.

After the fix, the test that failed passes:

```
./zstreamtest --newapi -t1 --no-big-tests -s9680
```
2021-09-14 12:28:53 -07:00
Yann Collet
2e6f5bc0d8
Merge pull request #2771 from facebook/opt_investigation
Improve optimal parser performance on small data
2021-09-14 10:36:34 -07:00
Nick Terrell
d22bbed5db
Merge pull request #2776 from terrelln/oss-fuzz-fix
[rsyncable] Ensure ZSTD_compressBound() is respected
2021-09-14 09:37:43 -07:00
Yann Collet
fd94b9d1c9 Merge branch 'dev' into opt_investigation 2021-09-14 01:15:51 -07:00
Nick Terrell
a418b4e478 [rsyncable] Ensure ZSTD_compressBound() is respected
In degenerate cases `--rsyncable` could create very small blocks (1
byte). This causes the compressed output to be larger than
`ZSTD_compressBound()`. Fix the issue by ensuring that rsyncable mode
never outputs blocks smaller than 128 KB.

The minimum job size is 512 KB, so we shouldn't lose many
synchronization points from skipping any that cause blocks smaller than
128 KB. And even if we do, that is fine, because we'll find the next
one.

This fixes the `raw_dictionary_round_trip` oss-fuzz assert.

Credit to OSS-Fuzz
2021-09-13 17:14:07 -07:00
Sen Huang
1daf3c8dbc Use 32 buckets for log2 bucketing in huffman sort 2021-09-13 12:29:16 -04:00
Yann Collet
f58e63bee7 Merge branch 'dev' into opt_investigation 2021-09-12 01:42:49 -07:00
Felix Handte
d68aa19a2f
Merge pull request #2749 from felixhandte/zstd-fast-pipelined
Pipelined Implementation of ZSTD_fast (~+5% Speed)
2021-09-09 17:05:30 -04:00
Yann Collet
b7f46ebc23 use ZSTD_memcpy() for better portability
notably within kernel space
2021-09-08 14:45:53 -07:00
Yann Collet
7fce9a41b5 change update rate to 12/11/11/11
better for large files, and sources with relatively "stable" entropy,
like silesia.tar.
slightly worse for files with rapidly changing entropy,
like Calgary.tar/.

Updated small files tests in fuzzer
2021-09-08 14:05:57 -07:00
Yann Collet
ef78611c26 change update rate to 11/10/10/10
better for larger blocks,
very small inefficiency on small block.
2021-09-08 08:58:28 -07:00
Yann Collet
42a3ed752a removed frequency booster for stat initialization of btultra2
used to be necessary to counter-balance the fixed-weight frequency update
which has been recently changed for an adaptive rate (targeting stable starting frequency stats).
2021-09-08 07:56:43 -07:00
Yann Collet
08ceda3dfc new statistics update policy
small general compression ratio improvement for btopt+ strategies/
2021-09-04 00:52:44 -07:00
Yann Collet
23a9368c45 new starting offcode table for zstd_opt 2021-09-03 17:41:42 -07:00
Yann Collet
27a8bbe265 new initializer for ll price 2021-09-03 16:07:31 -07:00
Yann Collet
f0fc8cb3e1 Disable console notification by default within the library
As a library, the default shouldn't be to write anything on console.
`cover` and `fastcover` have a `g_displayLevel` variable to control this behavior.
It's now set to 0 (no display) by default.
Setting notification to a higher level should be an explicit operation by a console application.
2021-09-03 13:44:07 -07:00
Yann Collet
eab692211e removed pretty-print of sizes in benchmark
This is less appropriate for this mode :
benchmark is about accuracy,
it's important to read the exact values.
2021-09-03 12:51:02 -07:00
sen
71076b7a01
Merge pull request #2763 from senhuang42/opt_compiletime
Improve compile speed and binary size in `opt`
2021-09-02 11:59:02 -04:00
Yann Collet
a8cf85ad0a
Merge pull request #2762 from facebook/level13
minor rebalancing of level 13
2021-09-01 20:32:53 -07:00
Sen Huang
d88c1d95ce Remove inlining for opt 2021-09-01 16:58:57 -04:00
Yann Collet
70d89e5a12 minor rebalancing of level 13
This new setup is slighly better on `silesia.tar` :
Ratio : 3.649 -> 3.655
Speed : 11.9 MB/s -> 12.2 MB/s
At the cost of more memory : 24 MB -> 32 MB
The new memory budget is a reasonable interpolation between neighboring levels 12 and 14:
level 12 : 24 MB
level 13 : 32 MB (increased from 24 MB)
level 14 : 48 MB
Window size remains unaffected (4 MB)
2021-09-01 13:05:10 -07:00
senhuang42
414e24becf Add 8 bytes to FSE workspace 2021-09-01 15:56:33 -04:00
W. Felix Handte
d6fd7761c9 Fix VS Build: Explicitly Cast to Narrow Ints 2021-09-01 14:15:04 -04:00
W. Felix Handte
15e67bfa7e Deduplicate Implementations
This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new
`ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is
functionally a no-op.
2021-09-01 14:15:04 -04:00
W. Felix Handte
64054dec44 Tweak Step 2021-09-01 14:15:04 -04:00
W. Felix Handte
24fcccd05c Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed)
Unrolling the loop to handle 2 positions in each iteration allows us to reduce
the frequency of some operations that don't need to happen at every position.
One such operation is the step calculation, which is a very rough heuristic
anyways. It's fine if we do this a position later. The other operation is the
repcode check. But since the repcode check already tries expanding back one
position, we're really not missing much of importance by only trying it every
other position.

This commit also slightly reorders some operations.
2021-09-01 14:15:04 -04:00
W. Felix Handte
57a100f6dc Add ip1 + 128 Prefetch; Tiny Cleanup 2021-09-01 14:15:04 -04:00
W. Felix Handte
991d660ea9 Nit: Only Store 2 Hash Variables 2021-09-01 14:15:04 -04:00
W. Felix Handte
8706bc115a Nit: Dedup idx0 and idx1 2021-09-01 14:15:04 -04:00
W. Felix Handte
7c24c3e6ce Give Up on Searching End of Block
Amusingly, it seems to be a non-trivial performance hit to add in final
searches or even hash table insertions during cleanup. So let's not. It seems
to not make any meaningful difference in compression ratio.
2021-09-01 14:15:03 -04:00
W. Felix Handte
35932ab2f1 Prefetch Input in Incompressible Sections (+0.25% Speed) 2021-09-01 14:15:03 -04:00
W. Felix Handte
b092dd75b7 Shrink Pipeline from 4 Positions to 3 2021-09-01 14:15:03 -04:00
W. Felix Handte
387840af79 Re-Order Operations for Slightly Better Performance 2021-09-01 14:15:03 -04:00
W. Felix Handte
bc768bccc0 Track Step Size Statefully, Rather than Recalculating Every Time 2021-09-01 14:15:03 -04:00
W. Felix Handte
80bc12b33a Initial Pipelined Implementation for ZSTD_fast 2021-09-01 14:15:03 -04:00
Yann Collet
74b4171fb8 fix alignment condition in FSE_buildCTable
2-bytes alignment is enough for 16-bit fields
2021-08-29 19:05:04 -07:00
Yann Collet
18a20b3ad7
Merge pull request #2752 from facebook/hashLog3max
make ZSTD_HASHLOG3_MAX private
2021-08-20 12:51:17 -07:00
Yann Collet
2de42174bb make ZSTD_HASHLOG3_MAX private
This is an implementation detail,
it doesn't belong to public space (zstd.h).
2021-08-20 09:52:42 -07:00
sen
ae998544de
Merge pull request #2750 from senhuang42/sb_compress
Improve branch misses on FSE symbol spreading
2021-08-20 12:47:24 -04:00
senhuang42
da095ed899 Improve branch misses on FSE symbol spreading 2021-08-18 10:22:22 -07:00
Clément Chigot
399849e236 Makefile: add AIX support
For lib, AIX linker doesn't allow --soname.
2021-08-13 10:25:14 +02:00
Sen Huang
539b3aab9b Optimize 32-bit VecMask_next() 2021-08-04 17:14:58 -04:00
senhuang42
e411040ea1 Add 64 row entry support for lazy 2021-08-04 16:19:12 -04:00
senhuang42
31820e032c Rebalance clevels for lazy 2021-08-04 16:18:52 -04:00
senhuang42
aa1957477b Improve Huffman sorting algorithm 2021-08-04 12:43:34 -04:00
Nick Terrell
6ee70bae46
Merge pull request #2733 from terrelln/huf-cspeed
[HUF] Improve Huffman encoding speed
2021-08-03 12:59:54 -04:00
Nick Terrell
d8a0797268 [fuzz] Add Huffman round trip fuzzer
* Add a Huffman round trip fuzzer
* Fix two minor bugs in Huffman that aren't exposed in zstd
  - Incorrect weight comparison (weights are allowed to be equal to
    table log).
  - HUF_compress1X_usingCTable_internal() can return compressed
    size >= source size, so the assert that `cSize <= 65535` isn't
    correct, and it needs to be checked instead.
2021-08-03 08:10:06 -07:00
sen
5c46f62006
Merge pull request #2677 from senhuang42/ci_overhaul_2
[CI][2/2] Migrate CI tests which (currently) fail
2021-08-02 09:55:49 -04:00
Sen Huang
5ec7897a26 Fix static analyzer warnings 2021-07-29 09:11:12 -07:00
Nick Terrell
46f2710562 [HUF] Improve Huffman encoding speed
Improve Huffman encoding speed by 20% for gcc and 10% for clang.

| Compiler |     Benchmark     | Config  |   Dataset   | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc      | compress          | level_1 | enwik7      | 2.43  | 253.70           | 258.72                  | 2.0%                          |
| gcc      | compress          | level_1 | silesia     | 2.88  | 341.90           | 348.15                  | 1.8%                          |
| gcc      | compress_literals | level_1 | enwik7      | 1.49  | 761.83           | 912.76                  | 19.8%                         |
| gcc      | compress_literals | level_1 | silesia     | 1.28  | 754.83           | 902.37                  | 19.5%                         |
| gcc      | compress_literals | level_7 | enwik7      | 1.29  | 502.81           | 552.79                  | 9.9%                          |
| gcc      | compress_literals | level_7 | silesia     | 1.11  | 675.97           | 776.44                  | 14.9%                         |
| clang    | compress          | level_1 | enwik7      | 2.43  | 277.54           | 280.98                  | 1.2%                          |
| clang    | compress          | level_1 | silesia     | 2.88  | 369.98           | 375.46                  | 1.5%                          |
| clang    | compress_literals | level_1 | enwik7      | 1.49  | 828.83           | 918.41                  | 10.8%                         |
| clang    | compress_literals | level_1 | silesia     | 1.28  | 815.81           | 905.41                  | 11.0%                         |
| clang    | compress_literals | level_7 | enwik7      | 1.29  | 533.13           | 553.30                  | 3.8%                          |
| clang    | compress_literals | level_7 | silesia     | 1.11  | 714.52           | 775.38                  | 8.5%                          |
2021-07-27 15:10:35 -07:00
W. Felix Handte
da58821ff2 Fix DDSS Load
This PR fixes an incorrect comparison in figuring out `minChain` in
`ZSTD_dedicatedDictSearch_lazy_loadDictionary()`. This incorrect comparison
had been masked by the fact that `idx` was always 1, until @terrelln changed
that in #2726.

Credit-to: OSS-Fuzz
2021-07-27 11:49:44 -04:00
Nick Terrell
ba044bd6f1 [bug-fix] Fix a determinism bug with the DUBT
The DUBT can be non-deterministic if an index is equal to
`ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the
indices at 2.

This bug was found by the OSS-Fuzz determinism fuzzer. With this change
the fuzzer test passes. And I've confirmed that this is the root cause,
not just hiding the problem.

Aside: This took me a long time to figure out, because I thought I had
tried this first thing. But, apparantly I messed it up, because when I
was going through it again with @felixhandte, I was pointing out that it
wasn't the case, but it turns out it was.

Credit to: OSS-Fuzz
2021-07-15 13:02:49 -07:00
makise-homura
3cd085cec3 Clarify no-tree-vectorize usage for ICC and LCC 2021-07-14 20:00:44 +03:00
makise-homura
a5f518ae27 Change zstdcli's main() declaration due to -Wmain on some compilers 2021-07-14 19:55:47 +03:00
makise-homura
d4ad02c721 Add support for MCST LCC compiler 2021-07-10 03:57:06 +03:00
binhdvo
b3e372c171
Merge pull request #2717 from binhdvo/bootcamp
Proactively skip huffman compression based on sampling where non-comp…
2021-07-01 10:39:58 -04:00
Binh Vo
dc5b693f1e Proactively skip huffman compression based on sampling where non-compressibility is suspected 2021-06-30 11:02:47 -04:00
Nick Terrell
609be382ac
Merge pull request #2719 from danlark1/danlark_iwyu
Include what you use in zstd_ldm_geartab
2021-06-29 16:53:10 -07:00
Nick Terrell
094b26081f
Merge pull request #2689 from danlark1/dev
Optimize zstd decompression by another x%
2021-06-29 11:34:36 -07:00
Danila Kutenin
e855b78be6 Include what you use in zstd_ldm_geartab 2021-06-29 17:57:53 +01:00
Danila Kutenin
2c2c9e7dfd Add possible improvements for gcc-11 2021-06-29 09:06:47 +01:00
sen
45d707e908
Merge pull request #2715 from senhuang42/sequence_api_3
[RFC] Add internal API for converting ZSTD_Sequence into seqStore
2021-06-24 13:02:11 -04:00
senhuang42
76466dfadf Add simple API for converting ZSTD_Sequence into seqStore 2021-06-23 12:10:48 -04:00
Érico Nogueira
4d09952701 [lib] Fix libzstd.pc for lib-mt builds
Add the libzstd.pc target to the lib target in lib/Makefile, which makes
it inherit LDFLAGS_DYNLIB from the lib-mt target. This allows us to add
a Libs.private field to libzstd.pc which gets conditionally populated
with '-pthread'.

The 1.5.0 release notes mention that the static library isn't
multi-threaded by default, due to concern for people building static
binaries with libzstd:

   Now the dynamic library supports multi-threaded compression by
   default.  Note that this property is not extended to the static
   library because doing so would have impacted the build script of
   existing client applications (requiring them to add -pthread to their
   recipe), thus potentially breaking their build.

To get closer to being able to enable multi-threading for all library
builds by default, this commit makes it so that any libzstd consumer
using pkg-config gets the correct flags.

We also fix the indentation of the rule for libzstd.pc and move it
outside the if/endif block for install rules (which uses a list of OSs
where the rules were validated), so the rule is available for all users
of the 'lib*' targets.
2021-06-19 17:42:24 -03:00
Usuario
8bdce1ff97 lib/Makefile: Fix small typo in ZSTD_FORCE_DECOMPRESS_* build macros 2021-06-18 10:07:39 -04:00
binhdvo
0152435ab0
Merge pull request #2708 from binhdvo/skippable
Add API for fetching skippable frame content
2021-06-14 19:00:31 -04:00
Binh Vo
9d9f7680f8 Add API for fetching skippable frame content 2021-06-14 16:01:28 -04:00
Nick Terrell
05b6773fbc [fix] Add missing bounds checks during compression
* The block splitter missed a bounds check, so when the buffer is too small it
  passes an erroneously large size to `ZSTD_entropyCompressSeqStore()`, which
  can then write the compressed data past the end of the buffer. This is a new
  regression in v1.5.0 when the block splitter is enabled. It is either enabled
  explicitly, or implicitly when using the optimal parser and `ZSTD_compress2()`
  or `ZSTD_compressStream*()`.
* `HUF_writeCTable_wksp()` omits a bounds check when calling
  `HUF_compressWeights()`. If it is called with `dstCapacity == 0` it will pass
  an erroneously large size to `HUF_compressWeights()`, which can then write
  past the end of the buffer. This bug has been present for ages. However, I
  believe that zstd cannot trigger the bug, because it never calls
  `HUF_compress*()` with `dstCapacity == 0` because of [this check][1].

Credit to: Oss-Fuzz

[1]: 89127e5ee2/lib/compress/zstd_compress_literals.c (L100)
2021-06-14 11:35:33 -07:00
sen
d5f3568c4b
Merge pull request #2697 from senhuang42/entropy_repeat_fix
[bug] Fix entropy repeat mode bug
2021-06-10 16:39:17 +03:00
aqrit
dd4f6aa9e6
Flatten ZSTD_row_getMatchMask (#2681)
* Flatten ZSTD_row_getMatchMask

* Remove the SIMD abstraction layer.
* Add big endian support.
* Align `hashTags` within `tagRow` to a 16-byte boundary. 
* Switch SSE2 to use aligned reads.
* Optimize scalar path using SWAR.
* Optimize neon path for `n == 32`
* Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577)

* replace memcpy with MEM_readST

* silence alignment warnings

* fix neon casts

* Update zstd_lazy.c

* unify simd preprocessor detection (#3)

* remove duplicate asserts

* tweak rotates

* improve endian detection

* add cast

there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension
but must be uint16_t to get gcc to generate a rotate instruction..

* more casts

* fix casts

better work-around for the (bogus) warning: unary minus on unsigned
2021-06-09 08:50:25 +03:00
Danila Kutenin
08a3ddbd28 Add comment for gcc-11 2021-06-08 20:54:21 +01:00
Danila Kutenin
6534c0000f Be C89 compliant and fix alignment for gcc11 2021-06-08 20:45:57 +01:00
Felix Handte
8a3bdfaa7b
Merge pull request #2654 from wolfpld/dev
Initialize "potentially uninitialized" pointers.
2021-06-07 13:04:19 -04:00
Sen Huang
923e5ad3f5 Fix entropy repeat mode bug 2021-06-07 00:32:03 -07:00
Danila Kutenin
444f4db955 Move declaration of 1 to an inlined cast 2021-05-29 20:55:37 +01:00
Danila Kutenin
a80d268700 Optimize ZSTD_decodeSequence by another x% 2021-05-29 18:21:10 +01:00
senhuang42
939276cd0c Add ldm and block splitter auto-enable to old api 2021-05-24 13:09:32 -04:00
Nick Terrell
746f7976ab [trace] Refine the ZSTD_HAVE_WEAK_SYMBOLS detection
* Only enable for ELF on x86-64 or i386.
* Also explicitly disable for AIX.

Fixes #2658.
2021-05-18 20:22:36 -07:00
Yann Collet
02ece5d59f
Merge pull request #2653 from TrianglesPCT/dev
Enable SSE2 compression path to work on MSVC
2021-05-17 11:20:50 -07:00
Dan Nelson
54f78e3df8 ZSTD_VecMask_next: fix incorrect variable name in fallback code path 2021-05-15 10:20:37 -05:00
TrianglesPCT
bee0ef5647
Update zstd_lazy.c
It put the changes back when I tried to make a separate pull request, i don't understand githubs interface at all.
2021-05-14 19:23:13 -06:00
TrianglesPCT
d688ab1e0c
Add files via upload
AVX2
2021-05-14 19:18:12 -06:00
TrianglesPCT
bb1cdd8c63
Update zstd_lazy.c
add space
2021-05-14 19:11:28 -06:00
TrianglesPCT
a62856bf65
Update zstd_lazy.c
Remove the AVX2 part
2021-05-14 19:10:24 -06:00
TrianglesPCT
8f7ea1afeb
Update zstd_lazy.c
Switch to other comment style
2021-05-14 19:02:34 -06:00
TrianglesPCT
0e071214b5
Update zstd_lazy.c
switch to unaligned load as I don't know if buffer will always be aligned to 32 bytes, and compilers aside from MSVC might actually use aligned loads
2021-05-14 17:03:30 -06:00
TrianglesPCT
69ac124b12
Update zstd_lazy.c 2021-05-14 16:53:19 -06:00
TrianglesPCT
0b9f4bb0ff
Update zstd_lazy.c
use 8bit
2021-05-14 16:47:24 -06:00
Bartosz Taudul
7012c6e7a4
Initialize "potentially uninitialized" pointers. 2021-05-15 00:40:49 +02:00
TrianglesPCT
77d54eb3b3
Add files via upload 2021-05-14 16:40:32 -06:00
TrianglesPCT
52f44bb365
Add files via upload
msvc
2021-05-14 16:33:07 -06:00
TrianglesPCT
25bda9053a
Add files via upload
msvc suport
avx2 path
2021-05-14 16:32:04 -06:00
Stephen Kitt
e81d567547
Distinguish static symbols, allow hiding them
Even with -fvisibility=hidden added to CFLAGS, any symbol which is
given a default visibility attribute ends up exported in the dynamic
library. This happens through zstd_internal.h which defines
..._STATIC_LINKING_ONLY before including various header files, and is
included for example in lib/common/pool.c.

To avoid this, this patch distinguishes static and non-static APIs, by
using ZSTDLIB_API only for the latter, and introducing
ZSTDLIB_STATIC_API for the former. For now, both are exported, but
non-static APIs can be hidden by overriding the definition
ZSTDLIB_STATIC_API. lib/Makefile is modified to allow this using

	make CPPFLAGS_DYNLIB=-DZSTDLIB_STATIC_API=ZSTDLIB_HIDDEN

In addition, API declarations are dropped from zstd_compress.c (they
aren't needed there).

Signed-off-by: Stephen Kitt <steve@sk2.org>
2021-05-14 19:41:59 +02:00
Nick Terrell
03c4111299 [lib] Fix dictionary invalidation logic
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the
block. This ensures that `lowLimit` is updated to `dictLimit` whenever
the ext-dict is out of range, so we can use prefix mode for speed.

This can cause non-determinism because prefix mode and ext-dict mode
match finders can return different results. It can also hurt speed
because ext-dict match finders are slower.

The scenario is:
1. Compress large data with a dictionary.
2. The dictionary goes out of bounds, so we invalidate it.
3. However, we still have `lowLimit < dictLimit`, since it is
   never updated.
4. We will call the ext-dict match finder instead of the prefix one.
2021-05-13 17:05:59 -07:00
Nick Terrell
10b35b312b [lib] Fix off-by-one error in repcode checks
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
2021-05-13 17:05:59 -07:00
Nick Terrell
91c9a247b6 [lib] Fix determinism bug in the optimal parser
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.

40def70387/lib/compress/zstd_opt.c (L476)

This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.

The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
2021-05-13 17:05:59 -07:00
Yann Collet
8fae35591e Merge branch 'dev' of github.com:facebook/zstd into dev 2021-05-12 13:12:30 -07:00
Yann Collet
cb0cad9b79 reduce Max nb Workers to 64 in 32-bit mode
and restored limit to 256 when in 64-bit mode
(it was reduced to 200 to give more room for 32-bit).

This should fix test instability issues
using lot of threads in 32-bit environments.
2021-05-12 13:10:25 -07:00
sen
c730b8c5a3
Remove const data members in threadpooltest payload (#2639) (#2640) 2021-05-12 16:09:48 -04:00
sen
9c23ea9e2b
Bump version to 1.5.0, rebuild documentation (#2634) 2021-05-11 16:32:09 -04:00
Bernhard M. Wiedemann
28d0120b5a Avoid SIGBUS on armv6
When running armv6 userspace on armv8 hardware with a 64 bit Linux kernel,
the mode 2 caused SIGBUS (unaligned memory access).
Running all our arm builds in the build farm
only on armv8 simplifies administration a lot.

Depending on compiler and environment, this change might slow down
memory accesses (did not benchmark it). The original analysis is 6 years old.

Fixes #2632
2021-05-11 17:51:03 +02:00
Yann Collet
9fb5a0407c
Merge pull request #2630 from facebook/gcc9
improved gcc-9 and gcc-10 decoding speed
2021-05-10 10:54:16 -07:00
Yann Collet
334ac69db7
Merge pull request #2628 from skitt/libzstd-nomt-flags
Apply flags to libzstd-nomt in libzstd style
2021-05-08 00:21:59 -07:00
Yann Collet
439e58d060 improved gcc-9 and gcc-10 decoding speed
the new alignment setting is better for gcc-9 and gcc-10
by about ~+5%.

Unfortunately, it's worse for essentially all other compilers.

Make the new alignment setting conditional to gcc-9+.
2021-05-08 00:01:01 -07:00
Yann Collet
5b6d38a99e
Merge pull request #2547 from facebook/d_prefetch_refactor
Refactor prefetching for the decoding loop
2021-05-07 16:28:00 -07:00
Yann Collet
6755baf940 update decoder hot loop alignment
This seems to bring an additional ~+1.2% decompression speed
on average across 10 compilers x 6 scenarios.
2021-05-07 15:18:16 -07:00
Yann Collet
4d9caa4928 Merge branch 'd_prefetch_refactor' of github.com:facebook/zstd into d_prefetch_refactor 2021-05-07 11:30:44 -07:00
Yann Collet
1db5947591 improve decompression speed of long variant by ~+5%
changed strategy,
now unconditionally prefetch the first 2 cache lines,
instead of cache lines corresponding to the first and last bytes of the match.

This better corresponds to cpu expectation,
which should auto-prefetch following cachelines on detecting the sequential nature of the read.

This is globally positive, by +5%,
though exact gains depend on compiler (from -2% to +15%).
The only negative counter-example is gcc-9.
2021-05-07 11:26:14 -07:00
sen
13449d7ce1
Add PHONY targets to makefiles (#2629) 2021-05-07 14:03:19 -04:00
Nick Terrell
66772efe73
Merge pull request #2627 from terrelln/timeout-fix
[lib] Fix fuzzer timeouts by backing off overflow correction
2021-05-07 10:55:26 -07:00
sen
9e94b7cac5
Assert no divison by 0, correct superblocks 0 sequences case (#2592) 2021-05-07 13:26:56 -04:00
Yann Collet
a4d55c8748 Merge branch 'dev' into d_prefetch_refactor 2021-05-07 09:32:53 -07:00
sen
91465e23b2
[1.5.0] Enable multithreading in lib build by default (#2584)
* Update lib Makefile to have new targets

* Update lib/README.md for mt
2021-05-07 11:13:30 -04:00
Stephen Kitt
b2582de3c9
Apply flags to libzstd-nomt in libzstd style
... for consistency (this doesn't actually change the build flags used
in practice, currently).

Signed-off-by: Stephen Kitt <steve@sk2.org>
2021-05-07 13:25:27 +02:00
Nick Terrell
c2555f8c6f [lib] Fix fuzzer timeouts by backing off overflow correction
Linearly back off the frequency of overflow correction based on the
number of times the `ZSTD_window_t` has been overflow corrected. This
will still allow the fuzzer to quickly find overflow correction bugs,
while also keeping good speed for larger inputs.

Additionally, the `nbOverflowCorrections` variable can be useful for
debugging coredumps, since we can inspect the `ZSTD_CCtx` to see if
overflow correction has happened yet.

I've verified this fixes the timeouts in OSS-Fuzz (176 seconds -> 6
seconds). I've also verified that fuzzers and `fuzzer` and `zstreamtest`
still catch the row-hash overflow correction bug.
2021-05-06 22:03:41 -07:00
Yann Collet
ee425faaa7 Merge branch 'dev' into d_prefetch_refactor 2021-05-06 19:49:26 -07:00
Nick Terrell
b052b583e5 [lib] Fix UBSAN warning in ZSTD_decompressSequences() 2021-05-06 15:31:30 -07:00
sen
698f261b35
[1.5.0] Deprecate some functions (#2582)
* Add deprecated macro to zstd.h, mark certain functions as deprecated

* Remove ZSTD_compress.c dependencies on deprecated functions
2021-05-06 17:59:32 -04:00
Nick Terrell
2b82948e58
Merge pull request #2622 from terrelln/zdict-api
[zdict] Add a FAQ to the top of zdict.h
2021-05-06 12:42:56 -07:00
Nick Terrell
1874f0844d [zdict] Add a FAQ to the top of zdict.h
The FAQ covers the questions asked in Issue #2566. It first covers why
you would want to use a dictionary, then what a dictionary is, and
finally it tells you how to train a dictionary, and clarifies some of
the parameters.

There is definitely more that could be said about some of the advanced
trainers, but this should be a good start.
2021-05-06 12:48:19 -07:00
Nick Terrell
207e33bb61
Merge pull request #2616 from terrelln/deterministic-dict
[lib] Add ZSTD_c_deterministicRefPrefix
2021-05-06 11:09:22 -07:00
Nick Terrell
d2925de98a
Merge pull request #2615 from terrelln/stack-space
[lib] Move some ZSTD_CCtx_params off the stack
2021-05-05 19:43:39 -07:00
Nick Terrell
172b4b6ac4 [lib] Add ZSTD_c_deterministicRefPrefix
This flag forces zstd to always load the prefix in ext-dict mode, even
if it happens to be contiguous, to force determinism. It also applies to
dictionaries that are re-processed.

A determinism test case is also added, which fails without
`ZSTD_c_deterministicRefPrefix` and passes with it set.

Question: Should this be the default behavior? It isn't in this PR.
2021-05-05 18:49:56 -07:00
Nick Terrell
eb7e74ccb7 [tests] Set DEBUGLEVEL=2 by default
This allows us to quickly check for compile errors in debug log
messages, which are compiled out when `DEBUGLEVEL < 2`.
2021-05-05 13:29:06 -07:00
Nick Terrell
c2183d7cdf [lib] Move some ZSTD_CCtx_params off the stack
* Take `params` by const reference in `ZSTD_resetCCtx_internal()`.
* Add `simpleApiParams` to the CCtx and use them in the simple API
  functions, instead of creating those parameters on the stack.

I think this is a good direction to move in, because we shouldn't need
to worry about adding parameters to `ZSTD_CCtx_params`, since it should
always be on the heap (unless they become absoultely gigantic).

Some `ZSTD_CCtx_params` are still on the stack in the CDict functions,
but I've left them for now, because it was a little more complex, and we
don't use those functions in stack-constrained currently.
2021-05-05 13:25:16 -07:00
Yann Collet
7ef6d7b36c deeper prefetching pipeline for decompressSequencesLong
pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s

I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
2021-05-05 10:04:03 -07:00
Yann Collet
8cde167a27 Merge branch 'dev' into d_prefetch_refactor 2021-05-05 09:13:38 -07:00
Yann Collet
455fd1a067 updated documentation regarding minimum job size 2021-05-05 09:03:11 -07:00
Yann Collet
c077f257b4
Merge pull request #2611 from facebook/smallerJobs
allow jobSize to be as low as 512 KB
2021-05-05 00:03:29 -07:00
Nick Terrell
8389a5122b
Merge pull request #2602 from terrelln/ldm-opt
[LDM] Speed optimization on repetitive data
2021-05-04 23:13:09 -07:00
Nick Terrell
d40f55cd95
Merge pull request #2610 from senhuang42/lazy_underflow_fix
Fix bad integer wraparound in repcode index for fast, dfast, lazy
2021-05-04 23:10:23 -07:00
Nick Terrell
0b88c2582c [test] Add large dict/data --patch-from test
Dictionary size must be > `ZSTD_CHUNKSIZE_MAX`.
2021-05-04 17:31:32 -07:00
Sen Huang
e6c8a5dd40 Fix incorrect usages of repIndex across all strategies 2021-05-04 19:50:55 -04:00
Nick Terrell
94db4398a0 [lib] Always load the dictionary in one go
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.

This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
2021-05-04 16:45:25 -07:00
Yann Collet
1026b9fa10 fix rsyncable mode 2021-05-04 15:59:27 -07:00
Nick Terrell
8a8899fc08
Merge pull request #2612 from terrelln/minor-fix
[easy] Rewrite rowHashLog computation
2021-05-04 15:02:00 -07:00
Yann Collet
40cabd0efd
Merge pull request #2608 from facebook/docMinVer
Documented minimum version numbers
2021-05-04 12:10:52 -07:00
Nick Terrell
1ffa80a09e [easy] Rewrite rowHashLog computation
`ZSTD_highbit32(1u << x) == x` when it isn't undefined behavior.
2021-05-04 11:43:20 -07:00
Nick Terrell
a8ecf4ff88
Merge pull request #2597 from terrelln/public-headers
[1.5.0] Move `zstd_errors.h` and `zdict.h` to `lib/` root
2021-05-04 11:28:41 -07:00
Yann Collet
8f86c29c06 allow jobSize to be as low as 512 KB
previous lower limit was 1 MB.

Note : by default, the lowest job size is 2 MB, achieved at level 1.
Even lower job sizes can be achieved by manipulating this value directly,
or manually modifying window sizes to lower amounts.

Updated unit test to ensure that this new limit works fine
(test would fail with previous 1 MB limit).
2021-05-04 11:02:55 -07:00
Nick Terrell
32823bc150 [LDM] Speed optimization on repetitive data
LDM does especially poorly on repetitive data when that data's hash happens
to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or
random chance. Optimize this case by skipping over repetitive patterns.
The detection is very simplistic, but should catch most of the offending
cases.

```
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
      21.187881087 seconds time elapsed

head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
       1.149707921 seconds time elapsed

```
2021-05-04 10:57:42 -07:00
W. Felix Handte
ee122baacf Detect Presence of md5 on Darwin
This fixes #2568.
2021-05-04 12:33:19 -04:00
Yann Collet
8aafbd3604 Documented minimum version numbers
Any stable API entry point introduced after v1.0
should be documented with its minimum version number.

Since PR fixes this requirement
updating mostly new entry points since v1.4.0
and newly introduced ones for future v1.5.0.
2021-05-04 09:05:22 -07:00
Nick Terrell
34aff7ea06 Bug fix & run overflow correction much more frequently in tests
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
  got the correction wrong in this case, and our chain tables and binary
  trees would be corrupted. Now, we work as long as `maxDist` is a power
  of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
  run overflow correction as frequently as allowed without impacting
  compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
  `zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
  speed penalty at most, which seems reasonable.
2021-05-03 15:21:47 -07:00
sen
cc31bb8b66
Merge pull request #2598 from senhuang42/reduce_index_rowhash_fix
Fix chaintable check to include rowhash in ZSTD_reduceIndex()
2021-05-03 17:34:39 -04:00
sen
4c5cc345fb
Merge pull request #2581 from senhuang42/lcm_stable
[1.5.0] Promote ZSTD_c_literalCompressionMode to stable params
2021-05-03 11:59:19 -04:00
sen
cdc979ddb3
Merge pull request #2580 from senhuang42/defaultclevel_to_stable
[1.5.0] Promote ZSTD_defaultCLevel() into stable API
2021-05-03 11:59:05 -04:00
senhuang42
61fe571af6 Fix chaintable check to include rowhash in ZSTD_reduceIndex() 2021-04-30 19:52:04 -04:00
Nick Terrell
09149beaf8 [1.5.0] Move zstd_errors.h and zdict.h to lib/ root
`zstd_errors.h` and `zdict.h` are public headers, so they deserve to be
in the root `lib/` directory with `zstd.h`, not mixed in with our private
headers.
2021-04-30 15:13:54 -07:00
Nick Terrell
6cee3c2c4f [trace] Remove default definitions of weak symbols
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
2021-04-26 16:05:39 -07:00
sen
3e2fbfd056
Merge pull request #2579 from senhuang42/getcdictID_to_stable
[1.5.0] Promote ZSTD_getDictID_fromCDict() into stable API
2021-04-26 09:55:43 -04:00
Sen Huang
3c595a4a79 Add literalCompressionMode to stable cParams 2021-04-26 09:55:06 -04:00
felixhandte
efa6dfa729 Apply DDS adjustments to avoid assert failures 2021-04-23 16:41:00 -04:00
senhuang42
3b98987496 Remove building of ZBUFF/deprecated folder by default 2021-04-19 17:12:00 -04:00
Sen Huang
c5869677d9 Moved ZSTD_defaultCLevel() into stable API 2021-04-16 10:15:40 -07:00
Sen Huang
9c1ca3c00b Moved ZSTD_getDictID_fromCDict() into stable API 2021-04-16 10:14:29 -07:00
sen
12c045f74d
Merge pull request #2574 from senhuang42/repcode_mismatch_detector_fix
Correct the block splitter mismatched repcodes detection.
2021-04-12 23:27:43 -04:00
Sen Huang
8844f93957 Adjust nb elements to prefetch in ZSTD_row_fillHashCache() 2021-04-12 14:24:58 -04:00
Sen Huang
550f76f131 Correct the detection of mismatched repcodes 2021-04-09 09:08:51 -07:00
Sen Huang
4d63d6e8aa Update results.csv, add Row hash to regression test 2021-04-07 10:31:41 -07:00
Nick Terrell
4694423c4f Add and integrate lazy row hash strategy 2021-04-07 09:53:34 -07:00
sen
f71aabb5b5
Move clevel override to after initLocalDict() (#2571) 2021-04-06 21:05:37 -04:00
sen
f1e8b565c2
Maintain two repcode histories for block splitting, replace invalid repcodes (#2569) 2021-04-06 17:25:55 -04:00