Commit Graph

639 Commits

Author SHA1 Message Date
Yann Collet
439e58d060 improved gcc-9 and gcc-10 decoding speed
the new alignment setting is better for gcc-9 and gcc-10
by about ~+5%.

Unfortunately, it's worse for essentially all other compilers.

Make the new alignment setting conditional to gcc-9+.
2021-05-08 00:01:01 -07:00
Yann Collet
6755baf940 update decoder hot loop alignment
This seems to bring an additional ~+1.2% decompression speed
on average across 10 compilers x 6 scenarios.
2021-05-07 15:18:16 -07:00
Yann Collet
1db5947591 improve decompression speed of long variant by ~+5%
changed strategy,
now unconditionally prefetch the first 2 cache lines,
instead of cache lines corresponding to the first and last bytes of the match.

This better corresponds to cpu expectation,
which should auto-prefetch following cachelines on detecting the sequential nature of the read.

This is globally positive, by +5%,
though exact gains depend on compiler (from -2% to +15%).
The only negative counter-example is gcc-9.
2021-05-07 11:26:14 -07:00
Yann Collet
ee425faaa7 Merge branch 'dev' into d_prefetch_refactor 2021-05-06 19:49:26 -07:00
Nick Terrell
b052b583e5 [lib] Fix UBSAN warning in ZSTD_decompressSequences() 2021-05-06 15:31:30 -07:00
Yann Collet
7ef6d7b36c deeper prefetching pipeline for decompressSequencesLong
pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s

I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
2021-05-05 10:04:03 -07:00
Yann Collet
8cde167a27 Merge branch 'dev' into d_prefetch_refactor 2021-05-05 09:13:38 -07:00
Nick Terrell
6cee3c2c4f [trace] Remove default definitions of weak symbols
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
2021-04-26 16:05:39 -07:00
Nick Terrell
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
Nick Terrell
f8ac0ea7ef
Merge pull request #2539 from terrelln/linux-kernel-fixes
Fixes for the next linux kernel patch version
2021-03-24 10:34:29 -07:00
Yann Collet
f5434663ea Refactor prefetching for the decoding loop
Following #2545,
I noticed that one field in `seq_t` is optional,
and only used in combination with prefetching.
(This may have contributed to static analyzer failure to detect correct initialization).

I then wondered if it would be possible to rewrite the code
so that this optional part is handled directly by the prefetching code
rather than delegated as an option into `ZSTD_decodeSequence()`.

This resulted into this refactoring exercise
where the prefetching responsibility is better isolated into its own function
and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations.
Incidently, due to better code locality,
it reduces the need to send information around,
leading to simplified interface, and smaller state structures.
2021-03-19 15:48:17 -07:00
Nick Terrell
756bd59322 [huf][fse] Clean up workspaces
* Move `counting` to a struct in `FSE_decompress_wksp_body()`
* Fix error code in `FSE_decompress_wksp_body()`
* Rename a variable in `HUF_ReadDTableX2_Workspace`
2021-03-17 16:50:37 -07:00
Nick Terrell
cd1551d261 [lib][tracing] Add ZSTD_NO_TRACE macro
When defined, it disables tracing, and avoids including the header.
2021-03-16 11:47:27 -07:00
Nick Terrell
0f18059a4e [huf] Reduce stack usage of HUF_readDTableX2 by ~460 bytes
* Use `HUF_readStats_wksp()`
* Use workspace in `HUF_fillDTableX2*()`
* Clean up workspace usage to use a workspace struct
2021-03-05 12:39:46 -08:00
Nick Terrell
e59c9459a5 [trace] Keep track of a uint64_t tracing context
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.

This keeps the simple case simple.
2021-02-09 11:37:05 -08:00
Nick Terrell
54a4998a80 Add basic tracing functionality 2021-02-05 16:28:52 -08:00
Nick Terrell
f9b1e711ba [zstd] Fix NULL pointer addition in ZSTD_checkContinuity()
Don't start a new section when `dstSize == 0` to avoid NULL
pointer addition.
2021-02-05 12:18:06 -08:00
Yann Collet
b9748757b0 fixed minor cast warning 2021-02-05 09:55:54 -08:00
senhuang42
9ae0dd9336 Fix Visual and staticanalyze warnings 2021-01-07 17:58:37 -05:00
senhuang42
c2c9b8a7ec Address comments, clean up interface/internals 2021-01-07 12:29:12 -05:00
senhuang42
22b7bff2bc Add unit test, improve documentation 2021-01-07 12:29:12 -05:00
senhuang42
ea52fc3606 Use XXHash for hash function, create a sensible public interface 2021-01-07 12:29:12 -05:00
senhuang42
7c1a79f232 Add debuglog statements 2021-01-07 12:29:11 -05:00
senhuang42
d1a6a9d285 Reference requested dict ID at decompression time 2021-01-07 12:29:11 -05:00
senhuang42
5a6d3eef2b Allocate memory for DDict hash set when parameter is set 2021-01-07 12:29:11 -05:00
senhuang42
fd5b608f1c Add parameter to control multiple DDicts 2021-01-07 12:29:11 -05:00
senhuang42
f933668d3f Implement hashset for dictIDs 2021-01-07 12:29:11 -05:00
Nick Terrell
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Yann Collet
0b39531d75 moving all references to release branch
was previously `master`
2020-12-16 23:00:35 -08:00
Yann Collet
95e74616d5 fix multiple minor conversion warnings
unrelated to #2386, just cleaning up while I'm updating this file ...
2020-11-06 09:57:05 -08:00
Yann Collet
2769e4d459 fix incorrect assert
fix #2386, reported by @Neumann-A
2020-11-06 09:44:04 -08:00
Nick Terrell
e3e0775cc8 [API] Add ZSTD_c_stable{In,Out}Buffer parameters
This commit adds the parameters and sets the value in the CCtxParams
but it does not do anything with the value.
2020-10-30 10:54:39 -07:00
Nick Terrell
2e7d174130 Reset all decompression parameters in ZSTD_DCtx_reset()
* Reset all decompression parameters in `ZSTD_DCtx_reset()` when
  resetting parameters.
* Add a test case.
2020-10-01 14:19:21 -07:00
Nick Terrell
9261476b7d [lib] Wrap customMem xor checks in parens for readability
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
2020-09-23 23:26:07 -07:00
Nick Terrell
dec7fb03ec [lib] Silence -Wunused-const-variable warnings 2020-09-23 12:59:57 -07:00
Nick Terrell
79ded1b4a9 [lib] Add ZSTD_NO_UNUSED_FUNCTIONS macro to hide unused functions
The unused function definitions are hidden behind a
`#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check.

Initially hiding all functions which are unused and take up more than
2KB of stack space, because these will show up as warnings in the
Linux Kernel build system.
2020-09-09 14:35:39 -07:00
Nick Terrell
f91ed5c766 [lib] s/current/curr because it collides with Linux Kernel macro 2020-09-09 14:35:39 -07:00
Nick Terrell
c465f24457 ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free 2020-08-26 12:26:03 -07:00
Nick Terrell
a686d306d2 Rename ZSTD_{malloc,calloc,free} to ZSTD_custom{Malloc,Calloc,Free} 2020-08-26 12:25:08 -07:00
Nick Terrell
80f577baa2 Move standard includes to zstd_deps.h 2020-08-26 12:25:08 -07:00
Yann Collet
f82d9865b9
Merge pull request #2278 from senhuang42/ignore_checksum_advanced_param
New advanced decompression param to ignore checksums
2020-08-25 12:08:53 -07:00
Nick Terrell
614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
Nick Terrell
52f33a1da5 Fix compiler warnings 2020-08-24 16:09:45 -07:00
Nick Terrell
6f301a7903
Merge pull request #2272 from terrelln/dstSize_tooSmall
[fix] Always return dstSize_tooSmall when it is the case
2020-08-24 15:01:17 -07:00
Nick Terrell
6d2f750b37 Document the BMI2 default() functions 2020-08-24 14:44:33 -07:00
senhuang42
a030560d62 Add new DCtx param: validateChecksum and update unit tests 2020-08-24 17:28:00 -04:00
Nick Terrell
1302f8d676 [fix] Always return dstSize_tooSmall when it is the case 2020-08-24 13:38:13 -07:00
senhuang42
44c54a3e31 Addressing comments: more comments, cleanup, remove extra function, checksum logic 2020-08-24 16:14:19 -04:00
senhuang42
ffaa0df76d Document change in CLI for --no-check during decompression in --help menu 2020-08-24 09:49:12 -04:00
senhuang42
e3f5f9658a Added CLI tests for --no-check, fixed ignore checksum logic 2020-08-22 16:05:40 -04:00
senhuang42
20eb095882 Added unit test to fuzzer.c, changed definition param name 2020-08-22 13:26:33 -04:00
senhuang42
47685ac856 Move enum into zstd.h, and fix pesky switch() logic 2020-08-21 18:18:53 -04:00
senhuang42
08d3567ba8 Add function prototype 2020-08-21 16:51:43 -04:00
senhuang42
6a8dbdcd1f Modify decompression loop to gnore checksums if flag is enabled 2020-08-21 16:46:46 -04:00
senhuang42
2f39124342 Rename to ZSTD_d_forceIgnoreChecksum, add to DCtx, add function to set the advanced param 2020-08-21 16:23:39 -04:00
Nick Terrell
8f8bd2d1ac [regression] Update results.csv 2020-08-20 12:41:35 -07:00
Nick Terrell
575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Nick Terrell
612e947c5e wire up bmi2 support 2020-08-17 16:35:28 -07:00
Nick Terrell
ba1fd17a9f speed up literal header decoding 2020-08-17 12:17:53 -07:00
Nick Terrell
6004c1117f speed up small blocks 2020-08-16 23:03:38 -07:00
Carl Woffenden
4c81fae146 Fix clang -Wcomma warning 2020-08-13 16:11:22 +02:00
Nick Terrell
cce0edfdbe Fix unused variable warnings in fuzzing build mode without asserts
Fix unused vairable warnings when `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` is defined but asserts are disabled.

Fixes #2210.
2020-06-22 12:56:57 -07:00
Nick Terrell
f800e72a3c [lib] Fix assertion when dictionary is prefix 2020-05-12 14:33:59 -07:00
Nick Terrell
4b88bd3ee0 [lib][fuzz] Assert sequences are valid in round trip tests 2020-05-11 20:38:49 -07:00
Nick Terrell
5717bd39ee [lib] Fix NULL pointer dereference
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.

This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.

* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.

Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.

Credit to OSS-Fuzz.
2020-05-06 12:09:02 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
W. Felix Handte
5e5f262612 Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations 2020-05-04 10:58:55 -04:00
Nick Terrell
77a2945c43 Add some comments 2020-04-27 20:04:04 -07:00
Nick Terrell
f33de06c3e [lib] Fix single-pass mode for empty frames 2020-04-27 20:04:01 -07:00
Nick Terrell
a4ff217baf [lib] Add ZSTD_d_stableOutBuffer 2020-04-27 18:09:44 -07:00
Carl Woffenden
a93fadfcd9 Further replication removed
`CHECK_F` is now in `error_private.h`. Minor tidy.
2020-04-07 11:25:16 +02:00
Carl Woffenden
7af7735fa3 Merge remote-tracking branch 'upstream/dev' into single-file-lib 2020-04-07 11:13:02 +02:00
Carl Woffenden
edd9a07322 Code replicated in compression and decompression moved to shared headers
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
2020-04-07 11:02:06 +02:00
Bimba Shrestha
0154866749 moving consts to zstd_internal and reusing them 2020-04-03 14:26:15 -07:00
Bimba Shrestha
0a172c5e43 converting to if 2020-04-03 14:21:24 -07:00
Bimba Shrestha
3a4c8cc9b3 adding dctx to function name 2020-04-03 14:14:46 -07:00
Bimba Shrestha
ae47d50355 only computing sizes once 2020-04-03 14:12:23 -07:00
Bimba Shrestha
a4cbe79ccb Using in and out size together 2020-04-03 14:09:21 -07:00
Bimba Shrestha
936aa63ff1 adding oversized check on decompression 2020-04-03 13:25:32 -07:00
Bimba Shrestha
05574ec141 adding oversizeDuration to dctx and macros 2020-04-03 13:08:29 -07:00
Carl Woffenden
7c420344d2 Single-file decoder script can now (optionally) create an encoder
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
2020-04-03 19:07:46 +02:00
Carl Woffenden
7202184ee0
Fixes decompressor when using -Wshorten-64-to-32 (#2062)
Spotted on iOS when building with `-Wshorten-64-to-32` (since `__builtin_expect` returns a `long`).
2020-04-03 02:55:29 -07:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell
8d0ee37ac0 Align decompress sequences loop to 32+16 bytes
The alignment is added before the loop, so this shouldn't hurt
performance in any case. The only way it hurts is if there is already
performance instability, and we force it to be stable but in the bad
case.

This consistently gets us into the good case with gcc-{7,8,9} on an
Intel i9-9900K and clang-9. gcc-5 is 5% worse than its best case but has
stable performance. We get consistently good behavior on my Macbook Pro
compiled with both clang and gcc-8. It ends up in the 50% from DSB and
50% from MITE case, but the performance is the same as the 85% DSB case,
so thats fine.
2020-03-23 19:40:31 -07:00
Nick Terrell
7627759b4e
Merge pull request #1972 from terrelln/check-cont
Move ZSTD_checkContinuity() to zstd_decompress_block.c
2020-01-23 22:02:50 -08:00
Nick Terrell
fa6a772f38 Initialize dctx->bType to silence valgrind false positive 2020-01-23 17:54:48 -08:00
Nick Terrell
cb2abc3dbe Fix performance regression on aarch64 with clang 2020-01-23 17:31:14 -08:00
Nick Terrell
6e3cd5b024 Move ZSTD_checkContinuity() to zstd_decompress_block.c 2020-01-23 12:27:39 -08:00
Nick Terrell
036b30b555
Fix super block compression and stream raw blocks in decompression (#1947)
Super blocks must never violate the zstd block bound of input_size + ZSTD_blockHeaderSize. The individual sub-blocks may, but not the super block. If the superblock violates the block bound we are liable to violate ZSTD_compressBound(), which we must not do. Whenever the super block violates the block bound we instead emit an uncompressed block.

This means we increase the latency because of the single uncompressed block. I fix this by enabling streaming an uncompressed block, so the latency of an uncompressed block is 1 byte. This doesn't reduce the latency of the buffer-less API, but I don't think we really care.

* I added a test case that verifies that the decompression has 1 byte latency.
* I rely on existing zstreamtest / fuzzer / libfuzzer regression tests for correctness. During development I had several correctness bugs, and they easily caught them.
* The added assert that the superblock doesn't violate the block bound will help us discover any missed conditions (though I think I got them all).

Credit to OSS-Fuzz.
2020-01-10 18:02:11 -08:00
Igor Sugak
03ffda7b88 fix UBSAN's invalid-null-argument error in zstd_decompress.c (#1939) 2020-01-08 16:17:42 -08:00
Nick Terrell
718f00ff6f
Optimize decompression speed for gcc and clang (#1892)
* Optimize `ZSTD_decodeSequence()`
* Optimize Huffman decoding
* Optimize `ZSTD_decompressSequences()`
* Delete `ZSTD_decodeSequenceLong()`
2019-11-25 18:26:19 -08:00
Nick Terrell
659e9f05cf Fix null pointer addition 2019-11-20 18:36:04 -08:00
Nick Terrell
e0d6daabac Fix Appveyor failure 2019-11-19 11:12:26 -08:00
Nick Terrell
6a7f65117e
Merge pull request #1866 from legrosbuffle/dev
Optimized loop bounds to allow the compiler to unroll the loop.
2019-11-18 16:16:30 -08:00
Clement Courbet
b3c9fc27b4 Optimized loop bounds to allow the compiler to unroll the loop.
This has no measurable impact on large files but improves small file
decompression by ~1-2% for 10kB, benchmarked with:

head -c 10000 silesia.tar > /tmp/test
make CC=/usr/local/bin/clang-9 BUILD_STATIC=1 && ./lzbench -ezstd -t1,5 /tmp/test
2019-11-15 08:27:05 +01:00
Sen Huang
c787b351ea Use ZSTD Error codes, improve explanation of ZSTD_loadCEntropy() and ZSTD_loadDEntropy() 2019-11-08 13:57:26 -05:00
Sen Huang
4b141b63e0 Revert "Move decompress symbols into zstd_internal.h, remove dependency"
This reverts commit a152b4c67a5266f611db4a2eac4a79003852a795.
2019-11-08 13:57:26 -05:00
Sen Huang
84404cff6e Move decompress symbols into zstd_internal.h, remove dependency 2019-11-08 13:57:26 -05:00
Nick Terrell
60205fec02 Fix 2 bugs in dictionary loading
* Silently skip dictionaries less than 8 bytes, unless using `ZSTD_dct_fullDict`.
  This changes the compressor, which silently skips dictionaries <= 8 bytes.
* Allow repcodes that are equal to the dictionary content size, since it is in bounds.
2019-11-01 16:52:07 -07:00
Nick Terrell
9c1860861e Fix assert in ZSTD_safecopy
In the case that `op >= oend_w` it is possible that `diff < 8` because
the two buffers could be adjacent.

Credit to OSS-Fuzz, which found the bug. It isn't reproducible because
it depends on the memory layout.
2019-10-28 17:51:17 -07:00