townforge/zstd - zstd - Townforge git

Author	SHA1	Message	Date
Yann Collet	1bf3d8a475	Merge pull request #2896 from facebook/m68k Zstandard compiles and run on m68k cpus	2021-12-02 14:25:45 -08:00
Nick Terrell	e5bfaeede7	Improve zstd_opt build speed and size Use the same trick as we did for zstd_lazy in PR #2828: * Create one search function specialization for each (dictMode, mls). * Select the search function pointer at the top of the match finder. Additionally, we no longer inline `ZSTD_compressBlock_opt_generic` into every function, since `dictMode` is no longer used as a template. Create two specializations, for opt levels 0 and 2, and call one of the two specializations. Lastly, remove the hack that disabled inlining for zstd_opt for the Linux Kernel, as we've gotten most of the benefit already. Compilation time sees a ~4x reduction: \| Compiler \| Flags \| Dev Time (s) \| PR Time (s) \| Delta \| \|----------\|----------------------------------\|--------------\|-------------\|-------\| \| gcc \| -O3 \| 10.1 \| 2.3 \| -77% \| \| gcc \| -O3 -fsanitize=address,undefined \| 61.1 \| 10.2 \| -83% \| \| clang \| -O3 \| 9.0 \| 2.1 \| -76% \| \| clang \| -O3 -fsanitize=address,undefined \| 33.5 \| 5.1 \| -84% \| Build size is reduced by 150KB - 200KB: \| Compiler \| Dev libzstd.a Size (B) \| PR libzstd.a Size (B) \| Delta \| \|----------\|------------------------\|-----------------------\|-------\| \| gcc \| 1327476 \| 1177108 \| -11% \| \| clang \| 1378324 \| 1167780 \| -15% \| There is a <2% speed loss in all cases: \| Compiler \| Level \| Dev Speed (MB/s) \| PR Speed (MB/s) \| Delta \| \|----------\|-------\|------------------\|-----------------\|--------\| \| gcc \| 16 \| 4.78 \| 4.72 \| -1.25% \| \| gcc \| 17 \| 3.49 \| 3.46 \| -0.85% \| \| gcc \| 18 \| 2.92 \| 2.86 \| -2.04% \| \| gcc \| 19 \| 2.61 \| 2.61 \| 0.00% \| \| clang \| 16 \| 4.69 \| 4.80 \| 2.34% \| \| clang \| 17 \| 3.53 \| 3.49 \| -1.13% \| \| clang \| 18 \| 2.86 \| 2.85 \| -0.34% \| \| clang \| 19 \| 2.61 \| 2.61 \| 0.00% \| Fixes Issue #2862.	2021-12-02 14:19:41 -08:00
Nick Terrell	01ecd6ffc0	Merge pull request #2892 from terrelln/issue-2785 [CircleCI] Fix short-tests-0	2021-12-02 16:20:56 -05:00
Yann Collet	30b9db8ae4	changed macro name to ZSTD_ALIGNOF for better consistency	2021-12-02 12:57:42 -08:00
Nick Terrell	21e28f5c24	Merge pull request #2891 from supperPants/dev Fix typos	2021-12-02 13:53:33 -05:00
Yann Collet	39dced092e	fix align conditions for huf_compress	2021-12-01 23:02:00 -08:00
Nick Terrell	91f5891dd0	[CircleCI] Fix short-tests-0 short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead. Also fix all the test failures that were exposed. `make all` is failing on CircleCI because it is missing Docker. Move that test to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.	2021-12-01 17:43:46 -08:00
Yann Collet	e89e847820	added alignment test and fix an incorrect alignment check in cwksp which was failing on m68k	2021-12-01 17:16:36 -08:00
Yann Collet	3f64b31585	Merge branch 'dev' into tomerge2051	2021-12-01 15:29:49 -08:00
Yann Collet	8031dc7a48	Merge pull request #2885 from yoniko/limit-level-32bit-systems Limit `ZSTD_maxCLevel` to 21 for 32-bit binaries.	2021-12-01 14:19:16 -08:00
supperPants	d4713de5a3	Fix typos.	2021-12-01 22:36:21 +08:00
Nick Terrell	5414dd7978	[bmi2] Add lzcnt and bmi target attributes * When dynamic dispatching to bmi2 add lzcnt and bmi to the TARGET_ATTRIBUTE. * Centralize the bmi2 TARGET_ATTRIBUTE definition to BMI2_TARGET_ATTRIBUTE so we can change it in the future. * Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't be any cases where bmi2 is supported but bmi1 isn't. But, since we are using the instruction we should check bmi1 as well.	2021-11-30 17:54:56 -08:00
Yonatan Komornik	ef2cba609d	`ZSTD_maxCLevel` now limited to 21 for 32-bit binaries. CI tests for constrained memory runs with max level on 32-bit binaries.	2021-11-30 10:31:52 -08:00
Felix Handte	c2c6a4ab40	Merge pull request #2869 from felixhandte/oss-fuzz-fix-41005 Determinism: Avoid Mapping Window into Reserved Indices during Reduction	2021-11-18 10:11:48 -05:00
W. Felix Handte	66079085f0	Determinism: Avoid Mapping Window into Reserved Indices during Reduction PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It succeeded in addressing that source of non-determinism, but introduced a new one: it was possible, when index reduction occurred, to map indices in the window to the reserved value, which would cause them to be zeroed, potentially altering parsing of the input. This PR addresses this issue. It makes sure that the bottom of the window is always `>= ZSTD_WINDOW_START_INDEX`. I'm not sure if this makes #2850 redundant. I think it's probably still valuable to have that protection as well. Credit to OSS-Fuzz for discovering this issue.	2021-11-17 18:09:18 -05:00
Yann Collet	a37a8df532	Merge pull request #2856 from rex4539/typos Fix typos	2021-11-17 13:04:30 -08:00
Nick Terrell	b7d899d99d	Merge pull request #2864 from terrelln/linux-opt [linux-kernel] Don't inline function in zstd_opt.c	2021-11-16 14:13:39 -08:00
Nick Terrell	19eb459da3	[linux-kernel] Don't inline function in zstd_opt.c The optimal parser is unlikely to be used in the linux kernel in practice. There is no reason these functions should be force inlined, since we aren't gaining anything, and are losing build size. \| Compiler \| Before (Bytes) \| After (Bytes) \| Delta (Bytes) \| \|----------\|----------------\|---------------\|---------------\| \| gcc-11 \| 1142090 \| 952754 \| -189336 \| \| clang-12 \| 1228402 \| 976290 \| -252112 \| This is a temporary solution pending the resolution of PR #2862 in the `dev` branch.	2021-11-15 20:37:30 -08:00
Nick Terrell	802ea885ef	Reduce function size in fast & dfast Take the same approach as in PR #2828 [0] to remove functions that force inline many function bodies and `switch`. Instead, create one function per "template" combination, and then switch between these functions. This allows the compiler to break the large function into many small functions, which generally helps codegen. Also, in the `extDict` modes when there is no ext-dict, call the top level function instead of the force inlined one, to save on code size. I'm specifically doing this because gcc on the parisc architecture doesn't handle the large function body well, and ends up using a lot of excess stack space. Outlining these functions fixes it.	2021-11-15 19:05:48 -08:00
Dimitris Apostolou	ebbd675998	Fix typos	2021-11-13 10:04:04 +02:00
W. Felix Handte	48572f52b1	Rewrite Fix to Still Auto-Vectorize	2021-11-09 12:17:03 -05:00
W. Felix Handte	61765cacd0	Avoid Reducing Indices to Reserved Values Previously, if an index was equal to `reducerValue + 1`, it would get remapped during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the parsing of the input slightly, by causing tree nodes to be nullified when they otherwise wouldn't be. This hardly matters from a correctness or efficiency perspective, but it does impact determinism. So this commit changes index reduction to avoid mapping indices to collide with `ZSTD_DUBT_UNSORTED_MARK`.	2021-11-08 20:03:52 -05:00
senhuang42	384744888e	Void out unused functions	2021-11-04 14:32:07 +03:00
binhdvo	b399b47467	Move mingw tests from appveyor to github actions (#2838 )	2021-11-02 13:17:55 -04:00
Yann Collet	aba88fa996	Merge pull request #2829 from facebook/ZSTD_DECODER_INTERNAL_BUFFER minor : change build macro to ZSTD_DECODER_INTERNAL_BUFFER	2021-10-26 10:48:16 -07:00
Yann Collet	2b2a5c449a	fix minor cast warning	2021-10-26 08:38:17 -07:00
Yann Collet	518f06b281	added minimum for decoder buffer also : introduced macro BOUNDED()	2021-10-26 08:21:31 -07:00
Yann Collet	12e177cba8	Merge pull request #2830 from facebook/clevels separate compression level tables into their own file	2021-10-25 13:35:54 -07:00
Yann Collet	082d6c6775	separate compression level tables into their own files that's clearer than finding the tables somewhere in the middle of `compress.c`. Also, down the line, it may potentially allows zstd to feature adjusted tables depending on target cpu.	2021-10-25 08:49:54 -07:00
Nick Terrell	13cad3abb1	[lazy] Speed up compilation times Speed up compilation times by moving each specialized search function into its own function. This is faster because compilers can handle many smaller functions much faster than one gigantic function. The previous approach generated one giant function with `switch` statements and inlining to select the implementation. \| Compiler \| Flags \| Dev Time (s) \| PR Time (s) \| Delta \| \|----------\|-------------------------------------\|--------------\|-------------\|-------\| \| gcc \| -O3 \| 16.5 \| 5.6 \| -66% \| \| gcc \| -O3 -g -fsanitize=address,undefined \| 158.9 \| 38.2 \| -75% \| \| clang \| -O3 \| 36.5 \| 5.5 \| -85% \| \| clang \| -O3 -g -fsanitize=address,undefined \| 27.8 \| 17.5 \| -37% \| This also reduces the binary size because the search functions are no longer inlined into the main body. \| Compiler \| Dev libzstd.a Size (B) \| PR libzstd.a Size (B) \| Delta \| \|----------\|------------------------\|-----------------------\|-------\| \| gcc \| 1563868 \| 1308844 \| -16% \| \| clang \| 1924372 \| 1376020 \| -28% \| Finally, the performance is not impacted significantly by this change, in fact we generally see a small speed boost. \| Compiler \| Level \| Dev Speed (MB/s) \| PR Speed (MB/s) \| Delta \| \|----------\|-------\|------------------\|-----------------\|-------\| \| gcc \| 5 \| 110.6 \| 110.0 \| -0.5% \| \| gcc \| 7 \| 70.4 \| 72.2 \| +2.5% \| \| gcc \| 9 \| 53.2 \| 53.5 \| +0.5% \| \| gcc \| 13 \| 12.7 \| 12.9 \| +1.5% \| \| clang \| 5 \| 113.9 \| 110.4 \| -3.0% \| \| clang \| 7 \| 67.7 \| 70.6 \| +4.2% \| \| clang \| 9 \| 51.9 \| 52.2 \| +0.5% \| \| clang \| 13 \| 12.4 \| 13.3 \| +7.2% \| The compression strategy is unmodified in this PR, so the compressed size should be exactly the same. I may have a follow up PR to slightly improve the compression ratio, if it doesn't cost too much speed.	2021-10-22 13:45:26 -07:00
Yann Collet	9d62957b31	Merge pull request #2800 from animalize/fix_c89 Fix a C89 error in msvc	2021-10-18 14:32:04 -07:00
Felix Handte	23c1a2d260	Merge pull request #2774 from felixhandte/zstd-dfast-pipelined-single Pipelined Implementation of ZSTD_dfast	2021-10-13 16:38:43 -04:00
W. Felix Handte	0bfc935add	Convert Outer Control Structure to Loop	2021-10-12 13:34:17 -04:00
Nick Terrell	b77d95b053	Merge pull request #2820 from terrelln/nb-compares [binary-tree] Fix underflow of nbCompares	2021-10-11 09:59:57 -07:00
Nick Terrell	26486db9ab	Merge pull request #2819 from terrelln/ldm-hash-rate-log [ldm] Fix ZSTD_c_ldmHashRateLog bounds check	2021-10-08 14:58:29 -07:00
Nick Terrell	c6c482fe07	[binary-tree] Fix underflow of nbCompares Fix underflow of `nbCompares` by switching to an `int` and comparing `nbCompares > 0`. This is a minimal fix, because I don't want to change the logic. These loops seem to be doing `nbCompares + 1` comparisons. The bug was reported by Dan Carpenter and found by Smatch static checker. https://lore.kernel.org/all/20211008063704.GA5370@kili/	2021-10-08 13:22:55 -07:00
Nick Terrell	1bbb372e3e	[ldm] Fix ZSTD_c_ldmHashRateLog bounds check There is no minimum value check, so the parameter could be negative. Switch to the standard pattern of using `BOUNDCHECK()`. The bug was reported by Dan Carpenter and found by Smatch static checker. https://lore.kernel.org/all/20211008063704.GA5370@kili/	2021-10-08 11:17:40 -07:00
Nick Terrell	399644b1f1	[nit] Fix buggy indentation The bug was reported by Dan Carpenter and found by Smatch static checker. https://lore.kernel.org/all/20211008063704.GA5370@kili/	2021-10-08 11:13:11 -07:00
W. Felix Handte	79ca830766	Style: Add Comments to Variables and Move a Couple into the Loop	2021-10-05 16:18:09 -04:00
W. Felix Handte	62536ef7da	Simplify DMS Implementation by Removing noDict Support	2021-10-05 14:54:37 -04:00
W. Felix Handte	051b473e7e	Fall Back in _extDict to New _noDict Rather than Old Merged Impl	2021-10-05 14:54:37 -04:00
W. Felix Handte	fcab4841aa	Nit: Rename Function	2021-10-05 14:54:37 -04:00
W. Felix Handte	47fd762ecc	Nit: Unnest Blocks that Don't Declare Anything	2021-10-05 14:54:37 -04:00
W. Felix Handte	2cdfad538c	Search One Last Position	2021-10-05 14:54:37 -04:00
W. Felix Handte	6ae44c0db8	Advance Long Index Lookup (+0.5% Speed) This lookup can be advanced to before the short match check because either way we will use it (in the next loop iter or in `_search_next_long`).	2021-10-05 14:54:37 -04:00
W. Felix Handte	2ddef7c872	Write Back Advanced Hash in Long Matches as Well (+Ratio) Since we're now hashing the position ahead even if we find a long match and don't search that next position, we can write it back into the hashtable even in long matches. This seems to cost us no speed, and improves compression ratio slightly!	2021-10-05 14:54:37 -04:00
W. Felix Handte	39f2491bfc	Use Look-Ahead Hash for Next Long Check after Short Match (+0.5% Speed) This costs a little ratio, unfortunately.	2021-10-05 14:54:37 -04:00
W. Felix Handte	db4e1b5479	Hash Long One Position Ahead (+2.5% Speed) Aside from maybe a latency win in the loop, this means that when we find a short match, we've already done the hash we need to check the next long match.	2021-10-05 14:54:37 -04:00
W. Felix Handte	a1ac7205d0	Pull Match Found Stuff Out of the Loop	2021-10-05 14:54:37 -04:00
W. Felix Handte	072ffaad67	Extract Working Variables	2021-10-05 14:54:37 -04:00
W. Felix Handte	1bdf041071	Track Step Rather than Recalculating (+0.5% Speed)	2021-10-05 14:54:37 -04:00
W. Felix Handte	258c0623e1	Extract Single-Segment Variant of ZSTD_dfast	2021-10-05 14:54:37 -04:00
Sen Huang	4b7f45cb04	Pull hot loop into its own function	2021-09-28 08:19:44 -07:00
Sen Huang	ccdcbf4621	Try beginning and end of match	2021-09-28 08:19:44 -07:00
Sen Huang	b8fd6bf30c	Skip most long matches in lazy hash table update	2021-09-28 08:19:39 -07:00
Ma Lin	ae986fcdb8	Use __assume(0) for unreachable code path in msvc msvc will optimize away the condition check.	2021-09-27 19:23:57 +08:00
Ma Lin	e5ba858270	Don't initialize the first parameter of _BitScanForward* functions Like the document example, no need to initialize `r` to 0. https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64	2021-09-25 16:36:53 +08:00
Ma Lin	95f492ea17	Don't initialize the first parameter of _BitScanReverse* functions Like the document example, no need to initialize `r` to 0. https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64	2021-09-25 16:36:53 +08:00
Ma Lin	cc22042da0	Fix a C89 error in msvc Variables (r) must be declared at the beginning of a code block. This causes msvc2012 to fail to compile 64-bit build.	2021-09-25 16:32:06 +08:00
Nick Terrell	14772d97be	Merge pull request #2796 from terrelln/linux-fixes [lib] Make lib compatible with `-Wfall-through` excepting legacy	2021-09-23 16:11:53 -07:00
Nick Terrell	189e87bcbe	[lib] Make lib compatible with `-Wfall-through` excepting legacy Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported compilers this uses an attribute, otherwise it becomes a comment. This is necessary to be compatible with clang's `-Wfall-through`, and gcc's `-Wfall-through=2` which don't support comments. Without this the linux build emits a bunch of warnings. Also add a test to CI to ensure that we don't regress.	2021-09-23 10:51:18 -07:00
Yann Collet	fa2a4d77c7	constify MatchState* parameter when possible turns out, it's possible to constify MatchState* parameter in some parts of the binary tree algorithm, making it a pure read-only parameter, as opposed to a mutable state. This is supposed to be helpful for both maintenance and the compiler.	2021-09-23 08:27:44 -07:00
senhuang42	1d8143c84f	Move block splitter from stack to CCtx	2021-09-23 00:02:31 -04:00
sen	044c8b4722	Merge pull request #2779 from senhuang42/fse_fix Fix NCountWriteBound	2021-09-22 13:51:21 -04:00
sen	1e99d36361	Merge pull request #2788 from senhuang42/param_switch Use new paramSwitch enum for row matchfinder and block splitter	2021-09-22 13:27:55 -04:00
senhuang42	06f42c3bfd	Use new paramSwitch enum for LDM	2021-09-21 14:22:09 -04:00
senhuang42	b5c35d7ea3	Use new paramSwitch enum for LCM, row matchfinder, and block splitter	2021-09-21 14:22:02 -04:00
Nick Terrell	a5f2c45528	Huffman ASM	2021-09-20 14:46:43 -07:00
Nick Terrell	8bf699aa59	[build] Add support for ASM files in Make + CMake * Extract out common portion of `lib/Makefile` into `lib/libzstd.mk`. Most relevantly, the way we find library files. * Use `lib/libzstd.mk` in the other Makefiles instead of repeating the same code. * Add a test `tests/test-variants.sh` that checks that the builds of `make -C programs allVariants` are correct, and run it in Actions. * Adds support for ASM files in the CMake build. The Meson build is not updated because it lists every file in zstd, and supports ASM off the bat, so the Huffman ASM commit will just add the ASM file to the list. The Visual Studios build is not updated because I'm not adding ASM support to Visual Studios yet.	2021-09-17 14:13:53 -07:00
sen	9d2a45a705	Merge pull request #2778 from senhuang42/opt_inlining_revert Revert opt outlining change	2021-09-15 14:22:10 -04:00
Sen Huang	a7aa2c5df6	Fix NCountWriteBound	2021-09-15 09:51:42 -07:00
Sen Huang	bd84e4a9d3	Revert opt outlining change	2021-09-15 09:08:41 -07:00
Nick Terrell	2fabd370bb	Merge pull request #2777 from terrelln/oss-fuzz-fix [rsyncable] Fix test failures	2021-09-14 13:20:22 -07:00
Nick Terrell	9d9e2ed00b	[rsyncable] Fix test failures Test failures showed up on the daily cron job. They didn't show up in CI because the condition is somewhat rare, and didn't trigger during the CI tests. This PR fixes up the logic in `findSynchronizationPoint()` to correctly handle the edge case. It also un-comments an assert that helps catch the issue, and verify that rsyncable mode is calculating the correct hash. After the fix, the test that failed passes: ``` ./zstreamtest --newapi -t1 --no-big-tests -s9680 ```	2021-09-14 12:28:53 -07:00
Yann Collet	2e6f5bc0d8	Merge pull request #2771 from facebook/opt_investigation Improve optimal parser performance on small data	2021-09-14 10:36:34 -07:00
Nick Terrell	d22bbed5db	Merge pull request #2776 from terrelln/oss-fuzz-fix [rsyncable] Ensure ZSTD_compressBound() is respected	2021-09-14 09:37:43 -07:00
Yann Collet	fd94b9d1c9	Merge branch 'dev' into opt_investigation	2021-09-14 01:15:51 -07:00
Nick Terrell	a418b4e478	[rsyncable] Ensure ZSTD_compressBound() is respected In degenerate cases `--rsyncable` could create very small blocks (1 byte). This causes the compressed output to be larger than `ZSTD_compressBound()`. Fix the issue by ensuring that rsyncable mode never outputs blocks smaller than 128 KB. The minimum job size is 512 KB, so we shouldn't lose many synchronization points from skipping any that cause blocks smaller than 128 KB. And even if we do, that is fine, because we'll find the next one. This fixes the `raw_dictionary_round_trip` oss-fuzz assert. Credit to OSS-Fuzz	2021-09-13 17:14:07 -07:00
Sen Huang	1daf3c8dbc	Use 32 buckets for log2 bucketing in huffman sort	2021-09-13 12:29:16 -04:00
Yann Collet	f58e63bee7	Merge branch 'dev' into opt_investigation	2021-09-12 01:42:49 -07:00
Felix Handte	d68aa19a2f	Merge pull request #2749 from felixhandte/zstd-fast-pipelined Pipelined Implementation of ZSTD_fast (~+5% Speed)	2021-09-09 17:05:30 -04:00
Yann Collet	b7f46ebc23	use ZSTD_memcpy() for better portability notably within kernel space	2021-09-08 14:45:53 -07:00
Yann Collet	7fce9a41b5	change update rate to 12/11/11/11 better for large files, and sources with relatively "stable" entropy, like silesia.tar. slightly worse for files with rapidly changing entropy, like Calgary.tar/. Updated small files tests in fuzzer	2021-09-08 14:05:57 -07:00
Yann Collet	ef78611c26	change update rate to 11/10/10/10 better for larger blocks, very small inefficiency on small block.	2021-09-08 08:58:28 -07:00
Yann Collet	42a3ed752a	removed frequency booster for stat initialization of btultra2 used to be necessary to counter-balance the fixed-weight frequency update which has been recently changed for an adaptive rate (targeting stable starting frequency stats).	2021-09-08 07:56:43 -07:00
Yann Collet	08ceda3dfc	new statistics update policy small general compression ratio improvement for btopt+ strategies/	2021-09-04 00:52:44 -07:00
Yann Collet	23a9368c45	new starting offcode table for zstd_opt	2021-09-03 17:41:42 -07:00
Yann Collet	27a8bbe265	new initializer for ll price	2021-09-03 16:07:31 -07:00
Yann Collet	f0fc8cb3e1	Disable console notification by default within the library As a library, the default shouldn't be to write anything on console. `cover` and `fastcover` have a `g_displayLevel` variable to control this behavior. It's now set to 0 (no display) by default. Setting notification to a higher level should be an explicit operation by a console application.	2021-09-03 13:44:07 -07:00
Yann Collet	eab692211e	removed pretty-print of sizes in benchmark This is less appropriate for this mode : benchmark is about accuracy, it's important to read the exact values.	2021-09-03 12:51:02 -07:00
sen	71076b7a01	Merge pull request #2763 from senhuang42/opt_compiletime Improve compile speed and binary size in `opt`	2021-09-02 11:59:02 -04:00
Yann Collet	a8cf85ad0a	Merge pull request #2762 from facebook/level13 minor rebalancing of level 13	2021-09-01 20:32:53 -07:00
Sen Huang	d88c1d95ce	Remove inlining for opt	2021-09-01 16:58:57 -04:00
Yann Collet	70d89e5a12	minor rebalancing of level 13 This new setup is slighly better on `silesia.tar` : Ratio : 3.649 -> 3.655 Speed : 11.9 MB/s -> 12.2 MB/s At the cost of more memory : 24 MB -> 32 MB The new memory budget is a reasonable interpolation between neighboring levels 12 and 14: level 12 : 24 MB level 13 : 32 MB (increased from 24 MB) level 14 : 48 MB Window size remains unaffected (4 MB)	2021-09-01 13:05:10 -07:00
senhuang42	414e24becf	Add 8 bytes to FSE workspace	2021-09-01 15:56:33 -04:00
W. Felix Handte	d6fd7761c9	Fix VS Build: Explicitly Cast to Narrow Ints	2021-09-01 14:15:04 -04:00
W. Felix Handte	15e67bfa7e	Deduplicate Implementations This removes the old `ZSTD_compressBlock_fast_generic()` and renames the new `ZSTD_compressBlock_fast_generic_pipelined()` to replace it. This is functionally a no-op.	2021-09-01 14:15:04 -04:00
W. Felix Handte	64054dec44	Tweak Step	2021-09-01 14:15:04 -04:00
W. Felix Handte	24fcccd05c	Unroll Loop Core; Reduce Frequency of Repcode Check & Step Calc (+>1% Speed) Unrolling the loop to handle 2 positions in each iteration allows us to reduce the frequency of some operations that don't need to happen at every position. One such operation is the step calculation, which is a very rough heuristic anyways. It's fine if we do this a position later. The other operation is the repcode check. But since the repcode check already tries expanding back one position, we're really not missing much of importance by only trying it every other position. This commit also slightly reorders some operations.	2021-09-01 14:15:04 -04:00
W. Felix Handte	57a100f6dc	Add `ip1 + 128` Prefetch; Tiny Cleanup	2021-09-01 14:15:04 -04:00
W. Felix Handte	991d660ea9	Nit: Only Store 2 Hash Variables	2021-09-01 14:15:04 -04:00
W. Felix Handte	8706bc115a	Nit: Dedup idx0 and idx1	2021-09-01 14:15:04 -04:00
W. Felix Handte	7c24c3e6ce	Give Up on Searching End of Block Amusingly, it seems to be a non-trivial performance hit to add in final searches or even hash table insertions during cleanup. So let's not. It seems to not make any meaningful difference in compression ratio.	2021-09-01 14:15:03 -04:00
W. Felix Handte	35932ab2f1	Prefetch Input in Incompressible Sections (+0.25% Speed)	2021-09-01 14:15:03 -04:00
W. Felix Handte	b092dd75b7	Shrink Pipeline from 4 Positions to 3	2021-09-01 14:15:03 -04:00
W. Felix Handte	387840af79	Re-Order Operations for Slightly Better Performance	2021-09-01 14:15:03 -04:00
W. Felix Handte	bc768bccc0	Track Step Size Statefully, Rather than Recalculating Every Time	2021-09-01 14:15:03 -04:00
W. Felix Handte	80bc12b33a	Initial Pipelined Implementation for ZSTD_fast	2021-09-01 14:15:03 -04:00
Yann Collet	74b4171fb8	fix alignment condition in FSE_buildCTable 2-bytes alignment is enough for 16-bit fields	2021-08-29 19:05:04 -07:00
Yann Collet	18a20b3ad7	Merge pull request #2752 from facebook/hashLog3max make ZSTD_HASHLOG3_MAX private	2021-08-20 12:51:17 -07:00
Yann Collet	2de42174bb	make ZSTD_HASHLOG3_MAX private This is an implementation detail, it doesn't belong to public space (zstd.h).	2021-08-20 09:52:42 -07:00
sen	ae998544de	Merge pull request #2750 from senhuang42/sb_compress Improve branch misses on FSE symbol spreading	2021-08-20 12:47:24 -04:00
senhuang42	da095ed899	Improve branch misses on FSE symbol spreading	2021-08-18 10:22:22 -07:00
Sen Huang	539b3aab9b	Optimize 32-bit VecMask_next()	2021-08-04 17:14:58 -04:00
senhuang42	e411040ea1	Add 64 row entry support for lazy	2021-08-04 16:19:12 -04:00
senhuang42	31820e032c	Rebalance clevels for lazy	2021-08-04 16:18:52 -04:00
senhuang42	aa1957477b	Improve Huffman sorting algorithm	2021-08-04 12:43:34 -04:00
Nick Terrell	6ee70bae46	Merge pull request #2733 from terrelln/huf-cspeed [HUF] Improve Huffman encoding speed	2021-08-03 12:59:54 -04:00
Nick Terrell	d8a0797268	[fuzz] Add Huffman round trip fuzzer * Add a Huffman round trip fuzzer * Fix two minor bugs in Huffman that aren't exposed in zstd - Incorrect weight comparison (weights are allowed to be equal to table log). - HUF_compress1X_usingCTable_internal() can return compressed size >= source size, so the assert that `cSize <= 65535` isn't correct, and it needs to be checked instead.	2021-08-03 08:10:06 -07:00
sen	5c46f62006	Merge pull request #2677 from senhuang42/ci_overhaul_2 [CI][2/2] Migrate CI tests which (currently) fail	2021-08-02 09:55:49 -04:00
Sen Huang	5ec7897a26	Fix static analyzer warnings	2021-07-29 09:11:12 -07:00
Nick Terrell	46f2710562	[HUF] Improve Huffman encoding speed Improve Huffman encoding speed by 20% for gcc and 10% for clang. \| Compiler \| Benchmark \| Config \| Dataset \| Ratio \| Speed MB/s (dev) \| Speed MB/s (huf-cspeed) \| Speed MB/s (huf-cspeed - dev) \| \|----------\|-------------------\|---------\|-------------\|-------\|------------------\|-------------------------\|-------------------------------\| \| gcc \| compress \| level_1 \| enwik7 \| 2.43 \| 253.70 \| 258.72 \| 2.0% \| \| gcc \| compress \| level_1 \| silesia \| 2.88 \| 341.90 \| 348.15 \| 1.8% \| \| gcc \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 761.83 \| 912.76 \| 19.8% \| \| gcc \| compress_literals \| level_1 \| silesia \| 1.28 \| 754.83 \| 902.37 \| 19.5% \| \| gcc \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 502.81 \| 552.79 \| 9.9% \| \| gcc \| compress_literals \| level_7 \| silesia \| 1.11 \| 675.97 \| 776.44 \| 14.9% \| \| clang \| compress \| level_1 \| enwik7 \| 2.43 \| 277.54 \| 280.98 \| 1.2% \| \| clang \| compress \| level_1 \| silesia \| 2.88 \| 369.98 \| 375.46 \| 1.5% \| \| clang \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 828.83 \| 918.41 \| 10.8% \| \| clang \| compress_literals \| level_1 \| silesia \| 1.28 \| 815.81 \| 905.41 \| 11.0% \| \| clang \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 533.13 \| 553.30 \| 3.8% \| \| clang \| compress_literals \| level_7 \| silesia \| 1.11 \| 714.52 \| 775.38 \| 8.5% \|	2021-07-27 15:10:35 -07:00
W. Felix Handte	da58821ff2	Fix DDSS Load This PR fixes an incorrect comparison in figuring out `minChain` in `ZSTD_dedicatedDictSearch_lazy_loadDictionary()`. This incorrect comparison had been masked by the fact that `idx` was always 1, until @terrelln changed that in #2726. Credit-to: OSS-Fuzz	2021-07-27 11:49:44 -04:00
Nick Terrell	ba044bd6f1	[bug-fix] Fix a determinism bug with the DUBT The DUBT can be non-deterministic if an index is equal to `ZSTD_DUBT_UNSORTED_MARK`. Ensure that never happens by starting the indices at 2. This bug was found by the OSS-Fuzz determinism fuzzer. With this change the fuzzer test passes. And I've confirmed that this is the root cause, not just hiding the problem. Aside: This took me a long time to figure out, because I thought I had tried this first thing. But, apparantly I messed it up, because when I was going through it again with @felixhandte, I was pointing out that it wasn't the case, but it turns out it was. Credit to: OSS-Fuzz	2021-07-15 13:02:49 -07:00
binhdvo	b3e372c171	Merge pull request #2717 from binhdvo/bootcamp Proactively skip huffman compression based on sampling where non-comp…	2021-07-01 10:39:58 -04:00
Binh Vo	dc5b693f1e	Proactively skip huffman compression based on sampling where non-compressibility is suspected	2021-06-30 11:02:47 -04:00
Nick Terrell	609be382ac	Merge pull request #2719 from danlark1/danlark_iwyu Include what you use in zstd_ldm_geartab	2021-06-29 16:53:10 -07:00
Danila Kutenin	e855b78be6	Include what you use in zstd_ldm_geartab	2021-06-29 17:57:53 +01:00
sen	45d707e908	Merge pull request #2715 from senhuang42/sequence_api_3 [RFC] Add internal API for converting ZSTD_Sequence into seqStore	2021-06-24 13:02:11 -04:00
senhuang42	76466dfadf	Add simple API for converting ZSTD_Sequence into seqStore	2021-06-23 12:10:48 -04:00
Nick Terrell	05b6773fbc	[fix] Add missing bounds checks during compression * The block splitter missed a bounds check, so when the buffer is too small it passes an erroneously large size to `ZSTD_entropyCompressSeqStore()`, which can then write the compressed data past the end of the buffer. This is a new regression in v1.5.0 when the block splitter is enabled. It is either enabled explicitly, or implicitly when using the optimal parser and `ZSTD_compress2()` or `ZSTD_compressStream()`. `HUF_writeCTable_wksp()` omits a bounds check when calling `HUF_compressWeights()`. If it is called with `dstCapacity == 0` it will pass an erroneously large size to `HUF_compressWeights()`, which can then write past the end of the buffer. This bug has been present for ages. However, I believe that zstd cannot trigger the bug, because it never calls `HUF_compress*()` with `dstCapacity == 0` because of [this check][1]. Credit to: Oss-Fuzz [1]: `89127e5ee2/lib/compress/zstd_compress_literals.c (L100)`	2021-06-14 11:35:33 -07:00
sen	d5f3568c4b	Merge pull request #2697 from senhuang42/entropy_repeat_fix [bug] Fix entropy repeat mode bug	2021-06-10 16:39:17 +03:00
aqrit	dd4f6aa9e6	Flatten ZSTD_row_getMatchMask (#2681 ) * Flatten ZSTD_row_getMatchMask * Remove the SIMD abstraction layer. * Add big endian support. * Align `hashTags` within `tagRow` to a 16-byte boundary. * Switch SSE2 to use aligned reads. * Optimize scalar path using SWAR. * Optimize neon path for `n == 32` * Work around minor clang issue for NEON (https://bugs.llvm.org/show_bug.cgi?id=49577) * replace memcpy with MEM_readST * silence alignment warnings * fix neon casts * Update zstd_lazy.c * unify simd preprocessor detection (#3) * remove duplicate asserts * tweak rotates * improve endian detection * add cast there is a fun little catch-22 with gcc: result from pmovmskb has to be cast to uint32_t to avoid a zero-extension but must be uint16_t to get gcc to generate a rotate instruction.. * more casts * fix casts better work-around for the (bogus) warning: unary minus on unsigned	2021-06-09 08:50:25 +03:00
Felix Handte	8a3bdfaa7b	Merge pull request #2654 from wolfpld/dev Initialize "potentially uninitialized" pointers.	2021-06-07 13:04:19 -04:00
Sen Huang	923e5ad3f5	Fix entropy repeat mode bug	2021-06-07 00:32:03 -07:00
senhuang42	939276cd0c	Add ldm and block splitter auto-enable to old api	2021-05-24 13:09:32 -04:00
Yann Collet	02ece5d59f	Merge pull request #2653 from TrianglesPCT/dev Enable SSE2 compression path to work on MSVC	2021-05-17 11:20:50 -07:00
Dan Nelson	54f78e3df8	ZSTD_VecMask_next: fix incorrect variable name in fallback code path	2021-05-15 10:20:37 -05:00
TrianglesPCT	bee0ef5647	Update zstd_lazy.c It put the changes back when I tried to make a separate pull request, i don't understand githubs interface at all.	2021-05-14 19:23:13 -06:00
TrianglesPCT	d688ab1e0c	Add files via upload AVX2	2021-05-14 19:18:12 -06:00
TrianglesPCT	bb1cdd8c63	Update zstd_lazy.c add space	2021-05-14 19:11:28 -06:00
TrianglesPCT	a62856bf65	Update zstd_lazy.c Remove the AVX2 part	2021-05-14 19:10:24 -06:00
TrianglesPCT	8f7ea1afeb	Update zstd_lazy.c Switch to other comment style	2021-05-14 19:02:34 -06:00
TrianglesPCT	0e071214b5	Update zstd_lazy.c switch to unaligned load as I don't know if buffer will always be aligned to 32 bytes, and compilers aside from MSVC might actually use aligned loads	2021-05-14 17:03:30 -06:00
TrianglesPCT	69ac124b12	Update zstd_lazy.c	2021-05-14 16:53:19 -06:00
TrianglesPCT	0b9f4bb0ff	Update zstd_lazy.c use 8bit	2021-05-14 16:47:24 -06:00
Bartosz Taudul	7012c6e7a4	Initialize "potentially uninitialized" pointers.	2021-05-15 00:40:49 +02:00
TrianglesPCT	77d54eb3b3	Add files via upload	2021-05-14 16:40:32 -06:00
TrianglesPCT	52f44bb365	Add files via upload msvc	2021-05-14 16:33:07 -06:00
TrianglesPCT	25bda9053a	Add files via upload msvc suport avx2 path	2021-05-14 16:32:04 -06:00

1 2 3 4 5 ...

2136 Commits