Commit Graph

2136 Commits

Author SHA1 Message Date
senhuang42
e04da68157 Enable usage of ZSTD_sequenceRange for single-block compression 2020-11-16 10:49:16 -05:00
senhuang42
337fac216d Add logic to handle ZSTD_sequenceRange 2020-11-16 10:49:16 -05:00
senhuang42
85822ddd53 Add last literals handling like getSequences() 2020-11-16 10:49:16 -05:00
senhuang42
2cff8df1a2 Pull block compression out of main compressSequences() function 2020-11-16 10:49:16 -05:00
senhuang42
cfced9344a Implement ZSTD_updateSequenceRange 2020-11-16 10:49:16 -05:00
senhuang42
b116e1f211 Modify SequenceRange to have posInSequence 2020-11-16 10:49:16 -05:00
senhuang42
d99b675112 Add function definition for sequenceRange updater 2020-11-16 10:49:16 -05:00
senhuang42
74e95c05cc Add ZSTD_SequenceRange to count ranges in array of ZSTD_Sequence 2020-11-16 10:49:16 -05:00
senhuang42
89f3848310 Add support for repcodes 2020-11-16 10:49:16 -05:00
senhuang42
3e930fd044 Code cleanup, add debuglog statments 2020-11-16 10:49:16 -05:00
senhuang42
086513b5b9 Implement first pass at compressSequences() 2020-11-16 10:49:16 -05:00
senhuang42
a9327b1e9b Add initial function prototype for ZSTD_compressSequences_ext (to be renamed later) 2020-11-16 10:33:35 -05:00
animalize
52f8c07a3f Clamp compression level in ZSTD_getCParams_internal() function 2020-11-14 13:26:08 +08:00
senhuang42
9d936d61d2 Reduce number of memcpy() calls 2020-11-13 19:43:30 -05:00
senhuang42
be4ac6c5bc Use existing repcode update function to implement updates 2020-11-12 16:51:12 -05:00
senhuang42
674c9b9235 Add in proper block repcode histories 2020-11-12 15:34:37 -05:00
senhuang42
06c7f14066 Let block reps persist 2020-11-12 12:24:44 -05:00
senhuang42
396275068c Fix incorrect repcode setting 2020-11-12 11:57:01 -05:00
senhuang42
1a8af0de73 Improve unit test 2020-11-12 11:09:09 -05:00
senhuang42
4d4fd2c55f Overhaul repcode handling logic 2020-11-12 10:59:35 -05:00
sen
f62edf0fe9
Merge pull request #2381 from senhuang42/expand_sequence_extraction_api
Add enum to define ZSTD_Sequence type and update sequence extraction API
2020-11-06 13:00:31 -05:00
senhuang42
7d1dea070c Update unit tests 2020-11-06 11:10:37 -05:00
senhuang42
779df995c6 Implement mergeGeneratedSequences() 2020-11-06 10:55:46 -05:00
senhuang42
51abd58208 Rename getSequences() to generateSequences() 2020-11-06 10:53:22 -05:00
Luke Pitt
eac309c71b Add ZSTD_getDictID_fromCDict function to experimental section 2020-11-04 11:37:37 +00:00
senhuang42
f782cac3d4 Change block delimiter removing to linear time approach 2020-11-02 17:06:20 -05:00
senhuang42
3434049c1f Use ZSTD_memmove() instead of memmove() 2020-11-02 11:43:19 -05:00
senhuang42
d4d0346b40 Update name of enum, clarify documentation 2020-11-02 11:38:17 -05:00
senhuang42
e6178f837f Revert unnecessary seqCollector adjustment 2020-11-02 10:59:20 -05:00
senhuang42
e8501e00b8 Fix incorrect index increment in merge algorithm 2020-11-02 10:58:41 -05:00
senhuang42
a36fdada57 Add algorithm to remove all delimiters 2020-11-02 10:46:52 -05:00
senhuang42
435a3a0428 Update seqCollector definition 2020-11-02 10:19:26 -05:00
senhuang42
3327932609 Update ZSTD_getSequences function signature 2020-11-02 10:17:59 -05:00
Nick Terrell
7205e609a9
Merge pull request #2354 from terrelln/stable-buffer
Add ZSTD_c_stable{In,Out}Buffer and optimize when set
2020-10-30 15:06:56 -07:00
sen
c37c714ef1
Merge pull request #2376 from senhuang42/clarify_sequence_extraction_api
Refine external ZSTD_Sequence API
2020-10-30 15:47:25 -04:00
Nick Terrell
d4e021fe35 [lib] Avoid allocating the input buffer when ZSTD_c_stableInBuffer is set
We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
2020-10-30 10:55:34 -07:00
Nick Terrell
24f72789e2 [lib] Skip the input window buffer when ZSTD_c_stableInBuffer is set
Compress directly from the `ZSTD_inBuffer`. We still allocate the input
buffer. A following commit will remove that allocation.
2020-10-30 10:55:34 -07:00
Nick Terrell
6bd6b6f7d3 [cwksp] Return NULL when 0 bytes are requested
This ensures that the buffer is never used.
2020-10-30 10:55:34 -07:00
Nick Terrell
fcf81cee5e [lib] Avoid allocating output buffer when ZSTD_c_stableOutBuffer is set
We compress directly to the `ZSTD_outBuffer` so we don't need to
allocate it.
2020-10-30 10:55:34 -07:00
Nick Terrell
6d5dc93d4e [lib] Compress directly into output when ZSTD_c_stableOutBuffer is set
When we have a stable output buffer always compress directly into the
`ZSTD_outBuffer`. We are allowed to return `dstSizeTooSmall`.
2020-10-30 10:55:34 -07:00
Nick Terrell
987cb4ca6a [lib] Take the shortcut when ZSTD_c_stableOutBuffer is set
When we have a stable output buffer take the single-pass shortcut.
It is okay to return `dstSizeTooSmall` if the output buffer isn't
big enough, because we know it will never grow.
2020-10-30 10:55:34 -07:00
Nick Terrell
809b2f2071 [lib] Set ZSTD_c_stable{In,Out}Buffer in ZSTD_compress2()
Sets these parameters in ZSTD_compress2() then resets them to their
orignal values after the compression call.

An alternative design could be to add a flush mode `ZSTD_e_singlePass`
which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single
compression call, by directly setting the applied parameters. I've opted
for the smaller change, but this is open for discussion.
2020-10-30 10:55:34 -07:00
Nick Terrell
c74be3f6de [lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set
Adds the validation of the input/output buffers only. They are still
unused.
2020-10-30 10:55:34 -07:00
Nick Terrell
e3e0775cc8 [API] Add ZSTD_c_stable{In,Out}Buffer parameters
This commit adds the parameters and sets the value in the CCtxParams
but it does not do anything with the value.
2020-10-30 10:54:39 -07:00
Nick Terrell
e2581d9572 [lib] Set appliedParams in zstdmt mode
Previously only `nbWorkers` was set. Set all parameters, because that is
what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer`
parameters.
2020-10-30 10:54:38 -07:00
senhuang42
536e89c723 Sequence extractor should update CBlockState 2020-10-30 12:13:19 -04:00
senhuang42
32cac2627a Emit last literals of 0 size as well, to indicate block boundary 2020-10-29 16:41:17 -04:00
senhuang42
69bd5f0654 Correct literalsRead calculation to include longLength 2020-10-29 14:49:37 -04:00
senhuang42
59624f3163 Remove implicit typecast to appease appVeyor windows build 2020-10-28 16:25:09 -04:00
senhuang42
3ed5d053d8 Clarify comments in zstd.h some more 2020-10-28 09:53:09 -04:00
Nick Terrell
599ff58e08
Merge pull request #2339 from terrelln/zstdmt-stability
Fix zstdmt stability issues and clean up the zstdmt code
2020-10-27 19:43:13 -07:00
sen
17b700d78a
Merge pull request #2366 from senhuang42/enable_ldm_by_default
Enable LDM by default if window size >= 128MB and strategy uses opt parser
2020-10-27 14:59:28 -04:00
Nick Terrell
0953645837
Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue
Fix long distance matcher OSS-fuzz issue
2020-10-27 11:13:03 -07:00
senhuang42
3163909d14 Remove unused variable position 2020-10-27 12:58:12 -04:00
senhuang42
dc448563e9 Add test compatibility with last literals in sequences 2020-10-27 12:35:28 -04:00
senhuang42
1d221ecc03 Add support for representing last literals in the extracted seqs 2020-10-27 11:19:48 -04:00
senhuang42
9171f920cd Improve documentation of seqStore_t 2020-10-27 10:50:22 -04:00
senhuang42
96b0ff7886 Improve documentation regarding various operations in copyBlockSequences 2020-10-27 10:36:06 -04:00
senhuang42
3a11c7eb03 Modify ZSTD_copyBlockSequences to agree with new API 2020-10-27 10:31:40 -04:00
senhuang42
8bdb32aebe Add a function for LDM enable check 2020-10-20 13:46:02 -04:00
senhuang42
578e889ec1 Move ldm enable to compressStream2() 2020-10-20 13:04:45 -04:00
senhuang42
d28d8a1d72 Include LDM tables size for CCtx size estimation where relevant 2020-10-20 09:21:30 -04:00
senhuang42
b1c7fc5768 Add compatibility for multithreading 2020-10-19 12:07:06 -04:00
senhuang42
590f7f55f0 Add ldm enable condition in ZSTD_resetCCtx_internal 2020-10-19 10:26:17 -04:00
senhuang42
4d01979b62 Expose and call ZSTD_ldm_skipRawSeqStoreBytes() 2020-10-16 20:30:00 -04:00
Yann Collet
a0ec50c2dc
Merge pull request #2355 from senhuang42/change_ldm_mt_config
Reduce --long mode MT jobsize at higher levels
2020-10-16 13:35:50 -07:00
senhuang42
f49926edf4 Change cycleLog adjustment to +3 from +4 2020-10-15 09:56:05 -04:00
senhuang42
ee84817fe7 Reset posInSequence when using ZSTD_referenceExternalSequences() 2020-10-14 22:06:08 -04:00
senhuang42
d0550bb18f Clarify argument names, fix DEBUGLOG() statements 2020-10-14 15:45:43 -04:00
senhuang42
3f99c9b38d Adjust match backwards count args 2020-10-14 15:23:03 -04:00
senhuang42
bf0d559449 Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments() 2020-10-14 12:58:06 -04:00
senhuang42
467e4383b0 Merge branch 'dev' of github.com:senhuang42/zstd into change_ldm_mt_config 2020-10-14 10:17:50 -04:00
Yann Collet
f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell
7e6f91ed84 [minor] Improve docs and add an assert in response to review 2020-10-12 16:43:17 -07:00
senhuang42
354b5f1c0a Use cycleLog instead of chainLog to determine LDM jobLog 2020-10-12 16:09:59 -04:00
Nick Terrell
441ce4178f [zstdmt] Clarify a comment 2020-10-12 12:58:13 -07:00
Nick Terrell
efff5d8b2d [zstdmt] Fix determinism issue with rsyncable mode
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
   for the next sync point.

This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.

Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
2020-10-12 12:55:17 -07:00
Nick Terrell
ede4f97153 [zstdmt] Fix bug where extra empty blocks are emitted
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.

The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
2020-10-12 12:55:17 -07:00
Nick Terrell
c51a9e79b9 [zstdmt] Rip out the zstdmt API
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
2020-10-12 12:55:16 -07:00
Nick Terrell
1784c4b4ab [zstdmt] Remove single-pass shortcut
Simplifies the code and removes blocking from zstdmt.

At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.

Fixes #2327.
2020-10-12 12:53:26 -07:00
Nick Terrell
b55ae009ac [zstdmt] Remove singleBlockingThread mode
This is already handled by zstd, so this logic is never used.
2020-10-12 12:53:26 -07:00
Nick Terrell
d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Nick Terrell
fadaab8c7c [minor improvement] Pass 0 as the content size in the DDS
The DDS structure can't be copied into the working tables like the DMS.
So it doesn't need to account for the source size when sizing its
parameters, just the dictionary size.
2020-10-12 12:47:21 -07:00
Nick Terrell
48ef15fb47 [minor improvement] Pass dictSize when selecting parameters
When selecting parameters in streaming compression with a dictionary use
the dictionary size to select the parameters.
2020-10-12 12:47:19 -07:00
Nick Terrell
012818df99 [refactor] Remove ZSTD_resetCStream_internal()
This function is only called in one place. It isn't a logical separation
of duties, and it was only obsfucating the code now, so inline it.
2020-10-12 12:46:10 -07:00
Nick Terrell
7083f79008 [bug] Fix dictContentType when reprocessing cdict
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).

Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.

Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.

Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
2020-10-12 12:46:10 -07:00
senhuang42
d6911b86be Require LDM matches to be strictly greater in length 2020-10-09 12:56:18 -04:00
Yann Collet
12541931fa
Merge pull request #2328 from marxin/zstd-pool-api
Allow external creation of POOLs that can be shared.
2020-10-09 01:00:50 -07:00
Yann Collet
6fdb0cb8d9
Merge pull request #2303 from senhuang42/let_cdict_take_clevel_priority
For ZSTD_compressStream2(), let cdict take compression level priority
2020-10-09 00:48:30 -07:00
senhuang42
b9c8033cde Define kNullRawSeqStore for every file 2020-10-07 19:02:41 -04:00
senhuang42
a6165c1b28 Change matchState_t::ldmSeqStore to pointer 2020-10-07 14:13:57 -04:00
senhuang42
abce708a56 Move posInSequence correction to correct location 2020-10-07 13:56:25 -04:00
senhuang42
0c515590d8 Replace offCode of largest match if ldm's offCode is superior 2020-10-07 13:56:25 -04:00
senhuang42
0fac8e07e1 Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes 2020-10-07 13:56:25 -04:00
senhuang42
a5500cf2af Refactor separate ldm variables all into one struct 2020-10-07 13:56:25 -04:00
senhuang42
0731b94e7c Use kNullRawSeqStore constant in zstdmt_compress.c 2020-10-07 13:56:25 -04:00
senhuang42
0325d878f2 Remove bubbling down matches with longer offCode and same matchLen 2020-10-07 13:56:25 -04:00
senhuang42
031b7ec15f Disable LDM minMatch adjustment when using opt parser 2020-10-07 13:56:25 -04:00
senhuang42
ddf8a3f1b9 Enable inclusion of mid-flight LDMs in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
88f72ed942 Correct incorrect offcode calculation 2020-10-07 13:56:25 -04:00
senhuang42
d8b43a4202 Add explicit conversion of size_t to U32 2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
senhuang42
c87d2e5866 Prefix new static ldm helpers with ZSTD_opt 2020-10-07 13:56:25 -04:00
senhuang42
429dec4f42 Add DEBUGLOG() calls in ldm helpers 2020-10-07 13:56:25 -04:00
senhuang42
10647924f1 Make function descriptions more accurate 2020-10-07 13:56:25 -04:00
senhuang42
1a687b3fcb Improve documentation of relevant structs 2020-10-07 13:56:25 -04:00
senhuang42
37617e23d7 Correct matchLength calculation and remove unnecessary functions 2020-10-07 13:56:25 -04:00
senhuang42
7dee62c287 Reset ldmSeqStore after initStats_ultra() pass for btultra2 2020-10-07 13:56:25 -04:00
senhuang42
0718aa70df Refactor existing functions to use posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
7348b40a87 Adjustments to ldm_calculateMatchRange() to calculate bounds correctly 2020-10-07 13:56:25 -04:00
senhuang42
a1ef2db5b2 Add ldm_calculateMatchRange() function 2020-10-07 13:56:25 -04:00
senhuang42
ef823e0299 Remove rawSeqStore.base and add rawSeqStore.posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
4793ae3b84 Prevent duplicate LDMs from being inserted 2020-10-07 13:56:25 -04:00
senhuang42
65f9cfeeec Add extra bounds check to prevent heap access after free ASAN error 2020-10-07 13:56:25 -04:00
senhuang42
bff5785fd5 Address mixed variables C90 warning 2020-10-07 13:56:25 -04:00
senhuang42
724b94ed18 ldm_getNextMatch fixed return values 2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68 Cleanups, add comments and explanations 2020-10-07 13:56:25 -04:00
senhuang42
78da2e1808 Fixed sifting algorithm 2020-10-07 13:56:25 -04:00
senhuang42
6ccd97fc96 Fixed end of match boundary update issues 2020-10-07 13:56:25 -04:00
senhuang42
28394b64f2 Add proper bounds check on adding ldms 2020-10-07 13:56:25 -04:00
senhuang42
a2f2b58d04 Add a function ldm_voidSequences() 2020-10-07 13:56:25 -04:00
senhuang42
9c3c7cd20e Fix function argument to getNextMatch() 2020-10-07 13:56:25 -04:00
senhuang42
c8b8572b38 Adjustments to no longer segfault on nci 2020-10-07 13:56:25 -04:00
senhuang42
f57c7e6bbf Add base adjustment correction 2020-10-07 13:56:25 -04:00
senhuang42
5df9b5e05f Add initial getNextMatch() in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
f8ce7cabc3 Added more debugging 2020-10-07 13:56:25 -04:00
senhuang42
84009a076a Add re-copying of ldmSeqStore after processing 2020-10-07 13:56:25 -04:00
senhuang42
42395a70c2 Add debug statements, flesh out functions 2020-10-07 13:56:25 -04:00
senhuang42
dd3dd199bb Get zstd to build with new functions and callsites, fix arguments 2020-10-07 13:56:25 -04:00
senhuang42
766c4a8c28 Implement part of ldm_maybeAddLdm() 2020-10-07 13:56:25 -04:00
senhuang42
84777059d2 Implement ldm_getNextMatch() 2020-10-07 13:56:24 -04:00
senhuang42
28c74bf591 Implement basic splitSequence and skipSequence functions 2020-10-07 13:56:24 -04:00
senhuang42
634ab7830d Flesh out required args for ldm_handleLdm() 2020-10-07 13:56:24 -04:00
senhuang42
db70761032 Add callsites to appropriate locations in ..opt_generic() 2020-10-07 13:56:24 -04:00
senhuang42
aea61e3c91 Add ldm helper function declarations into opt parser 2020-10-07 13:56:24 -04:00
senhuang42
35d9f488f5 Modify codepath to use opt parser exclusively if the compression level is high enough 2020-10-07 13:56:24 -04:00
senhuang42
e1ae398ad5 Add rawSeqStore to match state 2020-10-07 13:56:24 -04:00
Martin Liska
b684900a4a Allow external creation of POOLs that can be shared. 2020-10-07 12:44:33 +02:00
Nick Terrell
27c969ed07 Add comments to ZSTD_getLowest{Match,Prefix}Index()
Clarify how we handle dictionaries in each case.
2020-10-01 13:21:46 -07:00
Yann Collet
cc88eb7594
Merge pull request #2317 from animalize/msvc_inline
Let MSVC force inline ZSTD_hashPtr() function
2020-09-30 08:27:53 -07:00
Nick Terrell
f1cbeec039 [superblock] Reduce stack usage by correctly sizing header buffers 2020-09-24 19:42:04 -07:00
Nick Terrell
6a1e526ea7 [lib] Add ZSTD_COMPRESS_HEAPMODE tuning parameter 2020-09-24 19:42:04 -07:00
Nick Terrell
b841387218 [freestanding] Improve macro resolution to handle #if X 2020-09-24 19:42:04 -07:00
Nick Terrell
caecd8c211 Allow user to override ASAN/MSAN detection
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
2020-09-24 19:42:04 -07:00
Nick Terrell
88fac5d514 Remove call to memset
The previous commit fixes the test so it errors on calls to mem*()
functions from <string.h>.
2020-09-24 19:42:04 -07:00
Nick Terrell
9261476b7d [lib] Wrap customMem xor checks in parens for readability
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
2020-09-23 23:26:07 -07:00
animalize
2e5d73dd72 Use MEM_STATIC FORCE_INLINE_ATTR instead of FORCE_INLINE_TEMPLATE
It adds `__attribute__((unused))` for __GNUC__, to eliminate `-Werror=unused-function` error.
2020-09-21 13:26:38 +08:00
animalize
0a69a6b1ca Let MSVC force inline ZSTD_hashPtr() function
ZSTD_hashPtr() function was not expanded by MSVC, led to low performance compared to GCC.
2020-09-21 10:38:55 +08:00
Felix Handte
200c960f1d
Merge pull request #2311 from felixhandte/ddss-fix-cparam-derivation
Fix Compression Parameter Derivation Bugs Introduced by DDSS Changes
2020-09-18 14:02:14 -04:00
W. Felix Handte
8930c6e551 Use ZSTD_CCtxParams_init() to Init CCtxParams, not memset()
Even if the discrepancies are at the moment benign, it's probably better to
standardize on using the one true initializer, rather than trying (and failing)
to correctly duplicate its behavior.
2020-09-17 12:15:33 -04:00
W. Felix Handte
e8a44326fa Avoid Redundancy in ZSTD_initCDict_internal() Args; Don't Take CParams + CCtxParams 2020-09-17 12:08:36 -04:00
W. Felix Handte
eee51a664a Fall Back if Derived CParams are Incompatible with DDSS; Refactor CDict Creation
Rewrite ZSTD_createCDict_advanced() as a wrapper around
ZSTD_createCDict_advanced2(). Evaluate whether to use DDSS mode *after* fully
resolving cparams. If not, fall back.
2020-09-15 18:01:08 -04:00
W. Felix Handte
bc6521a6f6 Make ZSTD_createCDict_advanced2() cctxParams Arg Const 2020-09-15 14:06:10 -04:00
W. Felix Handte
26a96a5b35 Do More Complete CParams Deduction in Non-DDSS Path of ZSTD_createCDict_advanced2
Call ZSTD_getCParamsFromCCtxParams() instead of ZSTD_getCParams_internal().
2020-09-15 13:57:43 -04:00
W. Felix Handte
a2af804129 Pull CParam Override Logic into Helper 2020-09-15 13:38:05 -04:00
Yann Collet
c91a0855f8 check endDirective in ZSTD_compressStream2()
fix #2297
also :
- `assert()` `endDirective` in `ZSTD_compressStream_internal()`, for debug mode
- add relevant tests
2020-09-14 10:56:08 -07:00
senhuang42
17b56f934e Coding style cleanup 2020-09-11 11:42:12 -04:00
senhuang42
801513b5e7 Modify params rather than cctx->requestedParams 2020-09-11 11:41:10 -04:00
W. Felix Handte
c5fab8848a Document searchFuncs Table 2020-09-10 22:10:02 -04:00
W. Felix Handte
85a95840e4 Further Consolidate Dict Mode Checks 2020-09-10 22:10:02 -04:00
W. Felix Handte
0faefbf1b3 Make DDSS Selection Override ForceCopy Directive 2020-09-10 22:10:02 -04:00
W. Felix Handte
efa33861f2 Attempt to Fix MSVC Warnings 2020-09-10 22:10:02 -04:00
W. Felix Handte
ed43832770 Simplify Match Limit Checks
Seems like a ~1.25% speedup.
2020-09-10 22:10:02 -04:00
W. Felix Handte
06d240b8a7 Use All Available Space in the Hash Table to Extent Chain Table Reach
Rather than restrict our temp chain table to 2 ** chainLog entries, this
commit uses all available space to reach further back to gather longer
chains to pack into the DDSS chain table.
2020-09-10 22:10:02 -04:00
W. Felix Handte
b2b0641ea0 Rewrite Table Fill to Retain Cache Entries Beyond Chain Window 2020-09-10 22:10:02 -04:00
W. Felix Handte
916238d9dc Avoid Malloc in Table Fill; Pack Tmp Structure into Hash Table 2020-09-10 22:10:02 -04:00
W. Felix Handte
f42c5bddd9 Truncate Chain at Last Possible Attempt
Make the chain table denser?
2020-09-10 22:10:02 -04:00
W. Felix Handte
20a020edbc Prefetch Chain Table Matches 2020-09-10 22:10:02 -04:00
W. Felix Handte
9b9feb84f2 Lay Out Chain Table Chains Contiguously
Rather than interleave all of the chain table entries, tying each entry's
position to the corresponding position in the input, this commit changes the
layout so that all the entries in a single chain are laid out next to each
other. The last entry in the hash table's bucket for this hash is now a packed
pointer of position + length of this chain.

This cannot be merged as written, since it allocates temporary memory inside
ZSTD_dedicatedDictSearch_lazy_loadDictionary().
2020-09-10 22:10:02 -04:00
W. Felix Handte
66509c7bf4 Only Insert Positions Inside the Chain Window 2020-09-10 22:10:02 -04:00
W. Felix Handte
13c5ec3e41 Only Allow Dedicated Dict Search for Dicts Loaded in 1 Chunk
The load algorithm requires we do it all in one go.
2020-09-10 22:10:02 -04:00
W. Felix Handte
07793547e6 Fix Bug: Only Use DDSS Insertion on CDict MatchStates
Previously, if DDSS was enabled on a CCtx and a dictionary was inserted into
the CCtx, the CCtx MatchState would be filled as a DDSS struct, causing
segfaults etc. This changes the check to use whether the MatchState is marked
as using the DDSS (which is only ever set for CDict MatchStates), rather than
looking at the CCtxParams.
2020-09-10 18:51:52 -04:00
W. Felix Handte
d214d8c859 Shorten Dict Mode Conditionals in Order to Improve Readability 2020-09-10 18:51:52 -04:00
W. Felix Handte
f49c1563ff Force-Inline ZSTD_insertAndFindFirstIndex_internal()
Without this, gcc was declining to inline the function in `ZSTD_noDict` mode,
resulting in a ~10% slowdown.
2020-09-10 18:51:52 -04:00
W. Felix Handte
cab86b074f Clean Up Search Function Selection 2020-09-10 18:51:52 -04:00
W. Felix Handte
2ffbde0d95 Fix -Wshorten-64-to-32 Error 2020-09-10 18:51:52 -04:00
W. Felix Handte
7b5d2f72ea Adjust Working Context Table Sizes Back Down 2020-09-10 18:51:52 -04:00
W. Felix Handte
d332f57897 Permit Matching Against Lowest Valid Position
This comparison was previously faulty: the lowest valid position is itself
valid, and we should therefore be allowed to match against it.
2020-09-10 18:51:52 -04:00
W. Felix Handte
a3659fe1ef Make ZSTD_dedicatedDictSearch_getCParams Wrap ZSTD_getCParams
Fixes up bounds-checking, and lets us clean up what is at the moment an
unnecessary duplication of the default cparams tables.
2020-09-10 18:51:52 -04:00
W. Felix Handte
7b9a755ac9 Remove Chain Limit on Hash Cache Entries; Slightly Improve Compression
Entries in the hashTable chain cache aren't subject to the same aliasing that
the circular chain table is subject to. As such, we don't need to stop when we
cross the chain limit. We can delve deeper. :)
2020-09-10 18:51:52 -04:00
W. Felix Handte
e8b4011b52 Split Lookups in Hash Cache and Chain Table into Two Loops
Sliiiight speedup.
2020-09-10 18:51:52 -04:00
W. Felix Handte
9e83c782f8 Simplify DDS Hash Table Construction
No need to walk the chainTable; we can just keep shifting the entries in the
hashTable.
2020-09-10 18:51:52 -04:00
W. Felix Handte
5390fee4f7 Rename and Move DD_BLOG Constant to ZSTD_LAZY_DDSS_BUCKET_LOG 2020-09-10 18:51:52 -04:00
W. Felix Handte
5e91ae27eb Prefetch First Batch of Match Positions; +11% Speed in Level 5 w/ 1 Dict 2020-09-10 18:51:52 -04:00
W. Felix Handte
df386b3d8d Fix Off-By-One Error in Counting DDS Search Attempts
This caused us to double-search the first position and fail to search the
last position in the chain, slowing down search and making it less effective.
2020-09-10 18:51:52 -04:00
W. Felix Handte
914bfe7ee4 Init CCtx's Local Dict with CCtxParams 2020-09-10 18:51:52 -04:00
W. Felix Handte
db2aa25252 Decision for Whether to Attach Should be Based on CDict Config, not CCtx 2020-09-10 18:51:52 -04:00
W. Felix Handte
a494111385 Move Prefetch Before Insertion; Speed Up ~6% 2020-09-10 18:51:52 -04:00
W. Felix Handte
eede46a47e Misc Refactor of DDS Search Code 2020-09-10 18:51:52 -04:00
W. Felix Handte
f1b428fdac Rename enableDedicatedDictSearch to dedicatedDictSearch in MatchState
This makes it clear that not only is the feature allowed here, we're actually
using it, as opposed to the CCtxParam field, in which it's enabled, but we may
or may not be using it.
2020-09-10 18:51:52 -04:00
W. Felix Handte
41012193ad Always Init CDict's enableDedicatedDictSearch Field 2020-09-10 18:51:52 -04:00
W. Felix Handte
34b545acb0 Add a ZSTD_dedicatedDictSearch ZSTD_dictMode_e to Allow Const Propagation
Speed +1.5%.
2020-09-10 18:51:52 -04:00
W. Felix Handte
beefdb0d3d Fix ZSTD_c_forceAttachDict Bounds 2020-09-10 18:51:52 -04:00
W. Felix Handte
def62e2d3e Fix Compilation Warnings 2020-09-10 18:51:52 -04:00
Bimba Shrestha
9c628238d3 creating ZSTD_createCDict_advanced_internal 2020-09-10 18:51:52 -04:00
Bimba Shrestha
0a9787c3e1 changing to int for consistency 2020-09-10 18:51:52 -04:00
Bimba Shrestha
e29bc3a009 using dict mls instead of src mls 2020-09-10 18:51:52 -04:00
Bimba Shrestha
145c2d12f9 add hashtable head prefetching 2020-09-10 18:51:52 -04:00
Bimba Shrestha
5d5507788d change method name for consistency 2020-09-10 18:51:52 -04:00
Bimba Shrestha
b30f71becf pass correct cparams 2020-09-10 18:51:52 -04:00
Bimba Shrestha
71fda0362f making cctxParams a pointer 2020-09-10 18:51:52 -04:00
Bimba Shrestha
628559d0e4 loading dict using new algorithm 2020-09-10 18:51:52 -04:00
Bimba Shrestha
22705f0c93 adding dedicatedDictSearch algorithm 2020-09-10 18:51:52 -04:00
Bimba Shrestha
31e581bf65 adding enableDedicatedDictSearch to matchState_t 2020-09-10 18:51:52 -04:00
Bimba Shrestha
50550a14ad adding dedicated dict load method to lazy 2020-09-10 18:51:52 -04:00
Bimba Shrestha
75b6360036 adding ZSTD_createCDict_advanced2 to zstd.h 2020-09-10 18:51:52 -04:00
Bimba Shrestha
b7dddbe89b always attach dict when using dedicatedDictSearch 2020-09-10 18:51:52 -04:00
Bimba Shrestha
e36a373df4 adding dedicatedDictSearch cParams helper methods 2020-09-10 18:51:52 -04:00
Bimba Shrestha
f10d4e313c adding ZSTD_dedicatedDictSearch_defaultCParameters variable 2020-09-10 18:51:52 -04:00
Bimba Shrestha
c497cb6716 Add ZSTD_c_enableDedicatedDictSearch Param 2020-09-10 18:51:52 -04:00
senhuang42
64bd68e44b Adjust ZSTD_createCDict_byReference() function, and check for cdict when using compressStream2 2020-09-10 13:42:26 -04:00
Nick Terrell
79ded1b4a9 [lib] Add ZSTD_NO_UNUSED_FUNCTIONS macro to hide unused functions
The unused function definitions are hidden behind a
`#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check.

Initially hiding all functions which are unused and take up more than
2KB of stack space, because these will show up as warnings in the
Linux Kernel build system.
2020-09-09 14:35:39 -07:00
Nick Terrell
ac3a136b0a [lib] Replace 64-bit divisions with ZSTD_div64() 2020-09-09 14:35:39 -07:00
Nick Terrell
a90779397a [lib] Reduce zstd stack usage by 1KB 2020-09-09 14:35:39 -07:00
Nick Terrell
046aca190f Fix ZSTD_initCStream_advanced() with no dictionary and static allocation 2020-09-09 14:35:39 -07:00
Nick Terrell
f91ed5c766 [lib] s/current/curr because it collides with Linux Kernel macro 2020-09-09 14:35:39 -07:00
Nick Terrell
5e4efd22d4
Merge pull request #2291 from i-do-cpp/fix-compression-level-default
Fix setParameter not falling back to default compression level
2020-09-08 16:42:34 -07:00
Nick Terrell
6da8acd231
Merge pull request #2293 from allanjude/coverity
Resolve Coverity 1432392 Unintentional integer overflow
2020-09-03 13:58:45 -07:00
Allan Jude
8665793164 Resolve Coverity 1432392 Unintentional integer overflow
Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression:
cdict->dictContentSize * 6U
with type unsigned int (32 bits, unsigned) is evaluated using 32-bit
arithmetic, and then used in a context that expects an expression of
type U64 (64 bits, unsigned).
2020-09-03 19:31:50 +00:00
i-do-cpp
aec8b27fff
Update zstd_compress.c 2020-08-31 09:34:08 +02:00
i-do-cpp
d514281e73
Fix setParameter not falling back to default compression level on 0 value
See documentation for `ZSTD_c_compressionLevel`: `Special: value 0 means default, which is controlled by ZSTD_CLEVEL_DEFAULT`
2020-08-31 09:25:43 +02:00
Nick Terrell
c465f24457 ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free 2020-08-26 12:26:03 -07:00
Nick Terrell
a686d306d2 Rename ZSTD_{malloc,calloc,free} to ZSTD_custom{Malloc,Calloc,Free} 2020-08-26 12:25:08 -07:00
Nick Terrell
80f577baa2 Move standard includes to zstd_deps.h 2020-08-26 12:25:08 -07:00
Nick Terrell
614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
Nick Terrell
8def0e5fd3 Fix up code after reading through 2020-08-24 12:24:45 -07:00
Nick Terrell
575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Nick Terrell
ba1fd17a9f speed up literal header decoding 2020-08-17 12:17:53 -07:00
Nick Terrell
6004c1117f speed up small blocks 2020-08-16 23:03:38 -07:00
Yann Collet
38e38546a4
Merge pull request #2258 from Niadb/dev
Added STATIC_BMI2 for compile time detection of BMI2 on MSVC, when enabled various intrinsics are used
2020-08-04 09:43:59 -07:00
Niadb
216a63dcf7
Add files via upload 2020-07-28 02:52:52 -06:00
Yann Collet
8b9cdd2597 fixed overlapping count & workspace special case 2020-07-26 22:40:21 -07:00
Yann Collet
051232223f optimized histogram
new version easier to vectorize
leads to smaller code and faster execution
notably at the last recombination stage
(basically, fixed cost per block).

Assembly inspected with godbolt

On my laptop, with `clang` and `-mavx2` :
2K block : 1280 MB/s -> 1550 MB/s
8K block : 1750 MB/s -> 1860 MB/s
2020-07-26 22:24:22 -07:00
Yann Collet
c224367ede ensure workspace is large enough
even when MAX_TABLELOG is reduced
2020-07-16 20:33:50 -07:00
Nick Terrell
1047097dad [superblock] Add defensive assert and bounds check
The bound check condition should always be met because we selected `set_basic` as
our encoding type. But that code is very far away, so assert it is true so if it is
ever false we can catch it, and add a bounds check.

Fixes #2213.
2020-06-22 10:21:38 -07:00
Nick Terrell
08981d2638 [lib] Allow compression dictionaries with missing symbols
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.

Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.

Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.

Fixes #2174.
2020-06-12 17:57:19 -07:00
Felix Handte
2af4e07326
Merge pull request #2133 from felixhandte/single-size-calculation
Consolidate CCtx Size Estimation Code
2020-05-28 13:07:18 -04:00
Nick Terrell
3cc227e90e [ldm][mt] Fix loadedDictEnd 2020-05-19 15:55:03 -07:00
Yann Collet
fdc56baa42
fix 22294 (#2151) 2020-05-18 21:05:10 -07:00
Nick Terrell
b2092c6dc4 [ldm] Reset loadedDictEnd when the context is reset 2020-05-18 12:35:44 -07:00
Nick Terrell
add7ed2d4a [lib] Fix bug in loading LDM dictionary in MT mode
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.

Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```

TODO: Add an explicit test that loads a small dictionary in MT mode
2020-05-14 11:52:28 -07:00
W. Felix Handte
3bb7992350 Fix Size Estimate for LDM Seq Space 2020-05-14 13:50:53 -04:00
Nick Terrell
70c80e19e6 [greedy] Fix performance instability 2020-05-12 17:51:16 -07:00
Nick Terrell
c3e921c639
Merge pull request #2131 from terrelln/raw-dict-fuzzer
Fix rare scenario with lazy parser, dictionary, and repcodes
2020-05-12 17:44:31 -07:00
W. Felix Handte
d9a1e37aec Nit: Fix Size Type for 32-bit 2020-05-12 18:03:31 -04:00
W. Felix Handte
1aa6c7ccce Assert We Allocated Approximately What We Expected To 2020-05-12 16:55:03 -04:00
W. Felix Handte
27e2482217 Minor Refactor 2020-05-12 16:55:03 -04:00
W. Felix Handte
afc2488973 Handle Non-Static CCtxes in Estimation 2020-05-12 16:54:33 -04:00
W. Felix Handte
7ed996f5a0 Consolidate CCtx Size Estimation Code
This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams`
into a helper. It then migrates two other callsites to use that helper,
a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which
folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`,
which is more invasive.

This attempts to guarantee that the estimates returned to users are always
correct.
2020-05-12 16:26:53 -04:00
Nick Terrell
3c1eba4d99 [lib] Fix lazy repcode validity checks 2020-05-12 12:25:06 -07:00
Nick Terrell
4e0515916d [lib] Fix repcode validation in no dict mode 2020-05-12 11:57:15 -07:00
Nick Terrell
6d687a8816 [lib] Fix dictionary + repcodes + optimal parser 2020-05-12 10:36:53 -07:00
Nick Terrell
4b88bd3ee0 [lib][fuzz] Assert sequences are valid in round trip tests 2020-05-11 20:38:49 -07:00
Nick Terrell
80d3585e31 [lib] Fix lazy parser with dictionary + repcodes 2020-05-11 19:04:30 -07:00
Yann Collet
608f1bfc4c fixed context downsize with initStatic
When context is created using initStatic,
no resize is possible.

fix : only bump oversizeDuration when !initStatic
2020-05-11 18:16:38 -07:00
W. Felix Handte
c6636afbbb Fix ZSTD_estimateCCtxSize() Under ASAN
`ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which
is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the
buffers, assuming they'll be zero. However, the actual workspace allocation
logic always allocates those buffers, and when running under ASAN, the
workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized
buffers end up consuming 512 bytes of space, which is accounted for in the
actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't
in the estimation path, since it ignores the buffers entirely.

This commit fixes this.
2020-05-11 18:58:19 -04:00
Yann Collet
54144285fd small speed improvement for strategy fast
gcc 9.3.0 :
kennedy : 459 -> 466
silesia : 360 -> 365
enwik8  : 267 -> 269

clang 10.0.0 :
kennedy : 436 -> 441
silesia : 364 -> 366
enwik8  : 271 -> 272
2020-05-07 06:15:58 -07:00
Felix Handte
ad8dbae1b7
Merge pull request #2103 from felixhandte/relative-includes
Migrate Includes to Relative Paths
2020-05-06 09:42:23 -07:00
Yann Collet
c29fd7cd8b some more conversion warnings
hunting down some static analyzer warnings
2020-05-05 10:16:59 -07:00
Yann Collet
c1b836f4c3 fix minor conversion warnings 2020-05-04 14:43:09 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
Felix Handte
7e9aabd652
Merge pull request #2099 from felixhandte/compile-under-pedantic
Compile Under `-pedantic -Werror` and `-std=c90`
2020-05-04 10:07:13 -07:00
Felix Handte
816ed80774
Merge pull request #1984 from MeghnaM/1636-Reduce-stack-usage-of-HUF_sort
Reduce stack usage of HUF_sort()
2020-05-04 08:15:31 -07:00
W. Felix Handte
c7da66c9cf Purge C++-Style Comments (// ...), Make Compilation Succeed Under C90 2020-05-04 10:59:15 -04:00
W. Felix Handte
6696933b32 Make All Invocations Start With Literal Format String 2020-05-04 10:59:15 -04:00
W. Felix Handte
5e5f262612 Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations 2020-05-04 10:58:55 -04:00
Nick Terrell
e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Meghna Malhotra
0adfc8dfce Fix broken CI; make changes in response to the comments 2020-05-01 13:45:48 -07:00
Meghna Malhotra
53d76dc20f Remove magic constant and made other changes addressing the comments 2020-05-01 13:45:48 -07:00
Meghna Malhotra
fe8402b522 WIP: Still getting an error 2020-05-01 13:45:48 -07:00
Meghna Malhotra
a084d959bd WIP: Increased wksp size, but it's segfaulting 2020-05-01 13:45:48 -07:00
Meghna Malhotra
fdb2780c47 Move rank table into HUF_buildCTable_wksp() 2020-05-01 13:45:48 -07:00
Bimba Shrestha
1875f616ce passing dictContentType instead of rawContent every time 2020-04-21 22:29:35 -07:00
Bimba Shrestha
5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Nick Terrell
5fcbc484c8
Merge pull request #2040 from caoyzh/dev-2
Optimize by prefetching on aarch64
2020-04-08 13:14:47 -07:00
Bimba Shrestha
c0d4b2b5a3
Merge pull request #2075 from bimbashrestha/dict_fuzzer_ref
[bug] handling case where prefix is NULL or 0 sized in refPrefix_advanced
2020-04-07 17:37:19 -05:00
Bimba Shrestha
1658ae75cd handling nil case for refprefix 2020-04-07 14:41:53 -07:00
Carl Woffenden
a93fadfcd9 Further replication removed
`CHECK_F` is now in `error_private.h`. Minor tidy.
2020-04-07 11:25:16 +02:00
Carl Woffenden
7af7735fa3 Merge remote-tracking branch 'upstream/dev' into single-file-lib 2020-04-07 11:13:02 +02:00
Carl Woffenden
edd9a07322 Code replicated in compression and decompression moved to shared headers
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
2020-04-07 11:02:06 +02:00
Bimba Shrestha
0154866749 moving consts to zstd_internal and reusing them 2020-04-03 14:26:15 -07:00
Carl Woffenden
7c420344d2 Single-file decoder script can now (optionally) create an encoder
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
2020-04-03 19:07:46 +02:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell
d34204a7b7
Merge pull request #2029 from terrelln/minor-opt
[opt] Update repcodes less often
2020-03-23 18:12:32 -07:00
caoyzh
7201980650 Optimize by prefetching on aarch64 2020-03-14 15:25:59 +08:00
Bimba Shrestha
66607d0eac
Merge pull request #2033 from bimbashrestha/icc
[opt] Small icc level 1 compression speed gain using #pragma vector
2020-03-10 20:42:19 -05:00
Bimba Shrestha
a89c45bdbd Typo 2020-03-10 15:19:48 -05:00
Bimba Shrestha
43fc88f443 Adding comment and remvoing ivdep 2020-03-10 14:57:27 -05:00
Bimba Shrestha
dba3abc95a Missed returns 2020-03-05 12:20:59 -08:00
Bimba Shrestha
a75e5f2ffc bitscan add undef check 2020-03-05 11:52:15 -08:00
Bimba Shrestha
4c72a1a9c2 adding vector to main loop 2020-03-05 09:55:38 -08:00
Nick Terrell
81fda0419e [opt] Only update repcodes upon arrival 2020-03-04 17:57:15 -08:00
Nick Terrell
04744e52dc
Merge pull request #2028 from terrelln/minor-opt
[opt] Don't recompute initial literals price
2020-03-04 17:40:59 -08:00
Nick Terrell
0f9882deb9 [opt] Don't recompute repcodes while emitting sequences 2020-03-04 17:23:00 -08:00
Nick Terrell
c6caa2d04e [opt] Delete ZSTD_litLengthContribution 2020-03-04 16:35:26 -08:00
Nick Terrell
610171ed86 [opt] Explain why we don't include literals price 2020-03-04 16:29:19 -08:00
Nick Terrell
5f49578be7 [opt] Don't recompute initial literals price 2020-03-04 16:27:17 -08:00
Nick Terrell
c836992be1 Dont log errors when ZSTD_fseBitCost() returns an error 2020-03-02 11:13:18 -08:00
Bimba Shrestha
80c26117a9 Line-wrapping 2020-02-03 09:38:16 -08:00
Bimba Shrestha
ee8a712af3 Using appliedParams instead of supplied params 2020-01-31 15:49:07 -08:00
Nick Terrell
a11a9271d6 Fix lowLimit underflow in overflow correction 2020-01-17 12:10:18 -08:00
Nick Terrell
036b30b555
Fix super block compression and stream raw blocks in decompression (#1947)
Super blocks must never violate the zstd block bound of input_size + ZSTD_blockHeaderSize. The individual sub-blocks may, but not the super block. If the superblock violates the block bound we are liable to violate ZSTD_compressBound(), which we must not do. Whenever the super block violates the block bound we instead emit an uncompressed block.

This means we increase the latency because of the single uncompressed block. I fix this by enabling streaming an uncompressed block, so the latency of an uncompressed block is 1 byte. This doesn't reduce the latency of the buffer-less API, but I don't think we really care.

* I added a test case that verifies that the decompression has 1 byte latency.
* I rely on existing zstreamtest / fuzzer / libfuzzer regression tests for correctness. During development I had several correctness bugs, and they easily caught them.
* The added assert that the superblock doesn't violate the block bound will help us discover any missed conditions (though I think I got them all).

Credit to OSS-Fuzz.
2020-01-10 18:02:11 -08:00
Nick Terrell
d1cc9d2797
[fuzz] Allow zero sized buffers for streaming fuzzers (#1945)
* Allow zero sized buffers in `stream_decompress`. Ensure that we never have two
  zero sized buffers in a row so we guarantee forwards progress.
* Make case 4 in `stream_round_trip` do a zero sized buffers call followed by
  a full call to guarantee forwards progress.
* Fix `limitCopy()` in legacy decoders.
* Fix memcpy in `zstdmt_compress.c`.

Catches the bug fixed in PR #1939
2020-01-09 11:38:50 -08:00
Bimba Shrestha
b1f53b1a10 [fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936)
* Adding fail logging for superblock flow

* Dividing by targetCBlockSize instead of blockSize

* Adding new const and using more acurate formula for nbBlocks

* Only do dstCapacity check if using superblock

* Remvoing disabling logic

* Updating test to make it catch more extreme case of previou bug

* Also updating comment

* Only taking compressEnd shortcut on non-superblock
2020-01-03 16:53:51 -08:00
Bimba Shrestha
56415efc76 Constifying, malloc check and naming nit 2019-12-17 17:16:51 -08:00
Bimba Shrestha
5225dcfc0f Adding bool to check if enough room left for noCompress superblocks 2019-12-13 15:47:28 -08:00
Yann Collet
d73e2fb465
Merge pull request #1891 from bimbashrestha/oss
[fuzz] Superblock fuzz issues
2019-12-10 13:17:00 -08:00
Bimba Shrestha
e1913dc87f Making const, removing unnecessary indent, changing parameter order 2019-12-04 15:51:17 -08:00
Bimba Shrestha
2ec556fec2 Moving init/end functions, moving compressSuperBlock inside body() 2019-12-04 15:23:13 -08:00
Bimba Shrestha
ffb0463041 Refactor 2019-12-04 14:52:27 -08:00
Bimba Shrestha
49c6d49247 [fuzz] msan uninitialized unsigned value (#1908)
Fixes new fuzz issue

Credit to OSS-Fuzz

* Initializing unsigned value

* Initialilzing to 1 instead of 0 because its more conservative

* Unconditionoally setting to check first and then checking zero

* Moving bool to before block for c90

* Move check set before block
2019-12-04 10:02:17 -08:00
Bimba Shrestha
1fc9352f81 Using bss var instead of creating new bool 2019-12-02 21:39:06 -08:00
Bimba Shrestha
1f681d8592 Merge branch 'oss' of https://github.com/bimbashrestha/zstd into oss 2019-11-27 10:56:54 -08:00
Bimba Shrestha
a3a3c62b81 [fuzz] Only set HUF_repeat_valid if loaded table has all non-zero weights (#1898)
Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table.

* Only setting loaded dict huf table to valid on non-zero

* Adding hasNoZeroWeights test to fse tables

* Forbiding nbBits != 0 when weight == 0

* Reverting the last commit

* Setting table log to 0 when weight == 0

* Small (invalid) zero weight dict test

* Small (valid) zero weight dict test

* Initializing repeatMode vars to check before zero check

* Removing FSE changes to seperate pr

* Reverting accidentally changed file

* Negating bool, using unsigned, optimization nit
2019-11-26 12:24:19 -08:00
Bimba Shrestha
d4e17d0776 Negating bool, updating bool on inner branches 2019-11-26 12:17:43 -08:00
Bimba Shrestha
826b555463
Merge branch 'dev' into oss 2019-11-22 17:29:33 -08:00
Bimba Shrestha
10bce1919e Mixed declration fix 2019-11-21 13:08:27 -08:00
Bimba Shrestha
0451accab1 Checking noCompressBlock explicitly for rep code confirmation 2019-11-21 13:06:26 -08:00
Nick Terrell
659e9f05cf Fix null pointer addition 2019-11-20 18:36:04 -08:00
Bimba Shrestha
8f0c2d04c8 Going back to original flow but removing else return 2019-11-19 10:03:07 -08:00
Nick Terrell
a839d6852c
Merge pull request #1888 from senhuang42/superblocks_fixed
RLE test and re-enable RLE in main compression loop
2019-11-18 16:09:33 -08:00
Bimba Shrestha
80586f5e80 Reversing condition order and forwarding error 2019-11-18 13:53:55 -08:00
Bimba Shrestha
dade64428f Output regular uncompressed block when compressSequences fails 2019-11-18 08:43:14 -08:00
Bimba Shrestha
2d5d961a60 Typo in comment 2019-11-15 19:00:53 -08:00
Bimba Shrestha
dba767c0bb Leaving room for checksum 2019-11-15 18:44:51 -08:00
Sen Huang
d9646dcbb5 Fixed main compression logic changes 2019-11-14 19:39:09 -05:00
Yann Collet
4b1ac69f19
Merge pull request #1868 from senhuang42/superblocks_fixed
Superblocks rebased for merge
2019-11-14 13:31:34 -08:00
Sen Huang
c26d32c91c Change superblock #include to be last 2019-11-14 13:12:17 -05:00
Yann Collet
d67742bc5d
Merge pull request #1858 from senhuang42/dictionary_header_size
Method to get dictionary header size
2019-11-14 09:44:07 -08:00
Sen Huang
d9c475f3b3 Fix static analyze error, use proper bounds for dictEnd 2019-11-08 13:57:26 -05:00
Sen Huang
d06b90692b Move asserts to loadZstdDictionary() 2019-11-08 13:57:26 -05:00
Sen Huang
b39149e156 Expose ZSTD_reset_compressedBlockState() to shared API 2019-11-08 13:57:26 -05:00
Sen Huang
6ce335371b Add error forwarding to loadCEntropy(), make check for dictSize >= 8 from bad merge 2019-11-08 13:57:26 -05:00
Sen Huang
c787b351ea Use ZSTD Error codes, improve explanation of ZSTD_loadCEntropy() and ZSTD_loadDEntropy() 2019-11-08 13:57:26 -05:00
Sen Huang
04fb42b4f3 Integrated refactor into getDictHeaderSize, now passes tests 2019-11-08 13:57:26 -05:00
Sen Huang
0bcaf6db08 First working pass at refactor of loadZstdDictionary() 2019-11-08 13:57:26 -05:00
Nick Terrell
8c474f9845 Fix parameter selection and adjustment with srcSize == 0 2019-11-07 08:58:43 -08:00
Felix Handte
5688447758
Merge pull request #1873 from felixhandte/make-overlap-log-multithread-only
Fix #1861: Restrict overlapLog Parameter When Not Built With Multithreading
2019-11-06 16:56:37 -05:00
Felix Handte
ba4613602f
Merge pull request #1843 from moozzyk/issue-1637
Take ZSTD_parameters as a const pointer
2019-11-06 16:56:14 -05:00
W. Felix Handte
c13f81905a Fix #1861: Restrict overlapLog Parameter When Not Built With Multithreading
This parameter is unused in single-threaded compression. We should make it
behave like the other multithread-only parameters, for which we only accept
zero when we are not built with multithreading.
2019-11-06 16:05:02 -05:00
Sen Huang
13bb7500e8 Fix frame argument to compression 2019-11-05 16:15:55 -05:00
Sen Huang
f2932fb5eb Fix more merge conflicts 2019-11-05 15:54:05 -05:00
Sen Huang
7ce891870c Fix merge conflicts 2019-11-05 15:51:25 -05:00
Bimba Shrestha
3fb5b106da Replacing some literals with constants 2019-11-05 10:26:57 -08:00
Nick Terrell
60205fec02 Fix 2 bugs in dictionary loading
* Silently skip dictionaries less than 8 bytes, unless using `ZSTD_dct_fullDict`.
  This changes the compressor, which silently skips dictionaries <= 8 bytes.
* Allow repcodes that are equal to the dictionary content size, since it is in bounds.
2019-11-01 16:52:07 -07:00
Sen Huang
b9ede1c8c2 Make sure contentsize is known 2019-10-30 16:03:58 -04:00
Yann Collet
a9a216a846
Merge pull request #1824 from senhuang42/new_path_for_cdict
Avoid using CDict params when input is large.
2019-10-23 12:04:40 -07:00
moozzyk
eda7946a36 Take ZSTD_parameters as a const pointer
Fixes: #1637
2019-10-22 23:21:54 -07:00
Yann Collet
5d5c895b18 fix initCStream_advanced() for fast strategies
Compression ratio of fast strategies (levels 1 & 2)
was seriously reduced, due to accidental disabling of Literals compression.

Credit to @QrczakMK, which perfectly described the issue, and implementation details,
making the fix straightforward.

Example : initCStream with level 1 on synthetic sample P50 :
Before : 5,273,976 bytes
After  : 3,154,678 bytes
ZSTD_compress (for comparison) : 3,154,550

Fix #1787.

To follow : refactor the test which was supposed to catch this issue (and failed)
2019-10-22 15:01:38 -07:00
Sen Huang
c2e1e54f24 ((x or y) or z) == (x or y or z), remove brackets 2019-10-21 19:16:50 -04:00
Sen Huang
0c00455ea6 Merge branch 'dev' of github.com:senhuang42/zstd into new_path_for_cdict 2019-10-21 19:06:51 -04:00
Sen Huang
5b2f4ac1a8 merge 2019-10-21 19:02:52 -04:00
Sen Huang
2ab484a5f9 Fix bad merge 2019-10-21 18:55:17 -04:00
Nick Terrell
919d1d8e93
Merge pull request #1831 from terrelln/zstdmt-bad-memset
[zstdmt] Don't memset the jobDescription
2019-10-21 15:53:57 -07:00
Sen Huang
b6c3459d50 merge 2019-10-21 18:46:17 -04:00
Yann Collet
6cf04c0344
Merge pull request #1834 from facebook/winFix
Windows fixes
2019-10-21 13:45:17 -07:00
Sen Huang
676f89902a Added multiplier, renamed new enum to something more useful 2019-10-21 15:36:12 -04:00
Sen Huang
1f3a51fb52 Updated forceAttachDict param bounds 2019-10-21 15:36:12 -04:00
Sen Huang
8f69c47643 Add enum to decision process 2019-10-21 15:36:12 -04:00
Sen Huang
e4de8b098a Added support for forcing new CDict behavior and updated enum 2019-10-21 15:36:12 -04:00
Sen Huang
9294f4826b Changed to int from BYTE 2019-10-21 15:36:12 -04:00
Sen Huang
f0fccc8847 Changed to int from BYTE 2019-10-21 15:36:12 -04:00
Sen Huang
bb2df8c499 Trailing whitespace 2019-10-21 15:36:12 -04:00
Sen Huang
cf51501d2f Fix test 2019-10-21 15:36:12 -04:00
Sen Huang
ea3cb6988f Cast to BYTE to appease appveyor 2019-10-21 15:36:12 -04:00
Sen Huang
a727a85a7e merge conflicts round 2 2019-10-21 15:36:12 -04:00
Sen Huang
053a35fd64 formatting 2019-10-21 15:35:33 -04:00
Sen Huang
3fa4daaa55 Fix error 2019-10-21 15:35:33 -04:00
Sen Huang
3328348c63 Add compressionlevel to cdict 2019-10-21 15:32:39 -04:00
Felix Handte
cf725630a6
Merge pull request #1795 from felixhandte/workspace-asan
Add Poisoned Redzones to the Workspace When Compiling with ASAN
2019-10-21 12:15:17 -04:00
Sen Huang
e8aa3e486d Updated forceAttachDict param bounds 2019-10-20 22:01:08 -04:00
Sen Huang
6d297265f9 Add enum to decision process 2019-10-20 19:02:47 -04:00
Sen Huang
1daa898c93 Added support for forcing new CDict behavior and updated enum 2019-10-20 14:03:09 -04:00
Nick Terrell
0bc39bc3a0 [zstdmt] Don't memset the jobDescription 2019-10-18 15:05:51 -07:00
Yann Collet
1795133c45 refactored FIO_compressMultipleFilenames() prototype
for consistency
2019-10-17 15:32:03 -07:00
Yann Collet
6323966e53 updated erroneous comments using ZSTD_dm_*
instead of the current ZSTD_dct_*,
reported by @nigeltao (#1822)
2019-10-16 16:14:04 -07:00
Sen Huang
4455f00cb8 Changed to int from BYTE 2019-10-16 15:06:02 -04:00
Sen Huang
4f7d26b0ee Changed to int from BYTE 2019-10-16 15:05:29 -04:00
Sen Huang
cf00ea367a Trailing whitespace 2019-10-16 10:31:27 -04:00
Sen Huang
8cb2174446 Fix test 2019-10-16 10:29:31 -04:00
Sen Huang
5e901b6f32 Cast to BYTE to appease appveyor 2019-10-15 13:58:44 -04:00
Sen Huang
5c010c9d2d merge conflicts round 2 2019-10-15 13:10:05 -04:00
Sen Huang
a06b51879c merge conflict 2019-10-15 12:58:50 -04:00
Sen Huang
23dac23a49 formatting 2019-10-15 12:44:48 -04:00
Sen Huang
0c8df5c928 Fix error 2019-10-15 12:28:23 -04:00
Sen Huang
a65eb39f9d Add compressionlevel to cdict 2019-10-15 10:22:06 -04:00
Yann Collet
fb77afc626
Merge pull request #1760 from bimbashrestha/extract_sequences_api
Adding api for extracting sequences from seqstore
2019-10-10 13:11:18 -07:00
W. Felix Handte
ede31da2ea Fix CCtx Size Estimation 2019-10-10 15:02:08 -04:00
W. Felix Handte
bd6a20b8a0 Expand Default Redzone Size 2019-10-10 13:45:55 -04:00
W. Felix Handte
2c80a9f8ac Check if CCtx in Workspace after Null Check 2019-10-10 13:40:16 -04:00
W. Felix Handte
0ffae7e440 Stop Allocating Extra Space for Table Redzones 2019-10-10 13:40:16 -04:00
W. Felix Handte
a07037b784 Don't Try to Redzone the Tables 2019-10-10 13:40:16 -04:00
W. Felix Handte
0cc481ef66 Fix Workspace Size Calculation 2019-10-10 13:40:16 -04:00
W. Felix Handte
b6c0a02a17 Fix ZSTD_sizeof_matchState() Calculation 2019-10-10 13:40:16 -04:00
W. Felix Handte
8cffd6ed08 Avoid ASAN Failure in ZSTD_cwksp_free() 2019-10-10 13:40:16 -04:00
W. Felix Handte
ef0b5707c5 Refactor Freeing CCtxes / CDicts Inside Workspaces 2019-10-10 13:40:16 -04:00
W. Felix Handte
143b296cf6 Surround Workspace Allocs with Dead Zone 2019-10-10 13:40:16 -04:00
W. Felix Handte
19a0955ec9 Add ZSTD_cwksp_alloc_size() to Help Calculate Needed Workspace Size 2019-10-10 13:40:16 -04:00
W. Felix Handte
da88c35d41 Stop Assuming Tables are Adjacent 2019-10-10 13:40:16 -04:00
W. Felix Handte
35c30d6ca7 Poison Unused Workspace Memory 2019-10-10 13:40:16 -04:00
Bimba Shrestha
36528b96c4 Manually moving instead of memcpy on decoder and using genBuffer() 2019-10-03 09:26:51 -07:00
Bimba Shrestha
61ec4c2e7f Cleaning sequence parsing logic 2019-10-03 06:42:40 -07:00
Bimba Shrestha
c04245b257 Replacing assert with memory_allocation error code throw 2019-09-23 15:42:16 -07:00
Bimba Shrestha
be0bebd24e Adding test and null check for malloc 2019-09-23 15:08:18 -07:00
Nick Terrell
5cb7615f1f Add UNUSED_ATTR to ZSTD_storeSeq() 2019-09-20 21:37:13 -07:00
Nick Terrell
5dc0a1d659 HINT_INLINE ZSTD_storeSeq()
Clang on Mac wasn't inlining `ZSTD_storeSeq()` in level 1, which was
causing a 5% performance regression. This fixes it.
2019-09-20 16:39:27 -07:00
Bimba Shrestha
f3c4fd17e3 Passing in dummy dst buffer of compressbound(srcSize) 2019-09-20 15:50:58 -07:00
Nick Terrell
44c65da97e Remove literals overread in ZSTD_storeSeq() for ~neutral perf 2019-09-20 12:23:25 -07:00
Nick Terrell
fde217df04 Fix bounds check in ZSTD_storeSeq() 2019-09-20 08:25:12 -07:00
Nick Terrell
67b1f5fc72 Fix too strict assert 2019-09-20 01:23:35 -07:00
Nick Terrell
ddab2a94e8 Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy() 2019-09-20 00:56:20 -07:00
Nick Terrell
efd37a64ea Optimize decompression and fix wildcopy overread
* Bump `WILDCOPY_OVERLENGTH` to 16 to fix the wildcopy overread.
* Optimize `ZSTD_wildcopy()` by removing unnecessary branches and
  unrolling the loop.
* Extract `ZSTD_overlapCopy8()` into its own function.
* Add `ZSTD_safecopy()` for `ZSTD_execSequenceEnd()`. It is
  optimized for single long sequences, since that is the important
  case that can end up in `ZSTD_execSequenceEnd()`. Without this
  optimization, decompressing a block with 1 long match goes
  from 5.7 GB/s to 800 MB/s.
* Refactor `ZSTD_execSequenceEnd()`.
* Increase the literal copy shortcut to 16.
* Add a shortcut for offset >= 16.
* Simplify `ZSTD_execSequence()` by pushing more cases into
  `ZSTD_execSequenceEnd()`.
* Delete `ZSTD_execSequenceLong()` since it is exactly the
  same as `ZSTD_execSequence()`.

clang-8 seeds +17.5% on silesia and +21.8% on enwik8.
gcc-9 sees +12% on silesia and +15.5% on enwik8.

TODO: More detailed measurements, and on more datasets.

Crdit to OSS-Fuzz for finding the wildcopy overread.
2019-09-19 21:07:14 -07:00
Bimba Shrestha
ae6d0e64ae Addressing comments 2019-09-19 15:25:20 -07:00
Yann Collet
bfff5b30a4
Merge pull request #1756 from mgrice/dev
Improvements in zstd decode performance
2019-09-18 11:35:50 -07:00
Yann Collet
243200e5bf minor refactor of ZSTD_fast
- reduced variables lifetime
- more accurate code comments
2019-09-17 14:02:57 -07:00
Bimba Shrestha
76fea3fb99 Resolving appveyor test failure implicit conversion 2019-09-16 14:02:23 -07:00
Bimba Shrestha
a874435478
Merge branch 'dev' into extract_sequences_api 2019-09-16 13:29:59 -07:00
Bimba Shrestha
bff6072e3a Bailing early when collecting sequences and documentation 2019-09-16 08:26:21 -07:00
W. Felix Handte
20c69077d1 Shrink Table Valid End During Alloc Alignment / Phase Change 2019-09-11 17:14:59 -04:00
W. Felix Handte
51d90668ba Add Assertions to Confirm that Workspace Pointers are Correctly Ordered 2019-09-11 17:14:59 -04:00
W. Felix Handte
a10c191613 __msan_poison() Workspace When Preparing for Re-Use 2019-09-11 17:14:45 -04:00
W. Felix Handte
7c57e2b9ca Zero h3size When h3log is 0
This led to a nasty edgecase, where index reduction for modes that don't use
the h3 table would have a degenerate table (size 4) allocated and marked clean,
but which would not be re-indexed.
2019-09-11 13:14:26 -04:00
W. Felix Handte
bc020eec92 Also Shrink Clean Table Area When Reducing Indices 2019-09-11 11:40:57 -04:00
W. Felix Handte
1999b2ed9b Update DEBUGLOG Statements 2019-09-11 11:21:00 -04:00
W. Felix Handte
13e29a56de Shrink Clean Table Area When Copying Table Contents into Context
The source matchState is potentially at a lower current index, which means
that any extra table space not overwritten by the copy may now contain
invalid indices. The simple solution is to unconditionally shrink the valid
table area to just the area overwritten.
2019-09-11 11:18:45 -04:00
W. Felix Handte
edb3ad053e Comments 2019-09-10 18:25:45 -04:00
W. Felix Handte
f31ef28ff8 Only Reset Indexing in ZSTD_resetCCtx_internal() When Necessary 2019-09-10 18:25:45 -04:00
W. Felix Handte
9968a53e91 Remove No-Longer-Used Continuation Functions 2019-09-10 18:25:45 -04:00
W. Felix Handte
1b28e80416 Remove Fast Continue Path in ZSTD_resetCCtx_internal() 2019-09-10 18:25:45 -04:00
W. Felix Handte
ad16eda5e4 ZSTD_reset_matchState Optionally Doesn't Restart Indexing 2019-09-10 18:25:45 -04:00
W. Felix Handte
5b10bb5ec3 Rename ZSTD_compResetPolicy_e Values and Add Comment 2019-09-10 18:25:45 -04:00
W. Felix Handte
0492b9a9ec Accept ZSTD_indexResetPolicy_e Param in ZSTD_reset_matchState() 2019-09-10 18:25:45 -04:00
W. Felix Handte
14c5471d5e Introduce ZSTD_indexResetPolicy_e Enum 2019-09-10 18:25:45 -04:00
W. Felix Handte
17b6da2e0f Track Usable Table Space in Compression Workspace 2019-09-10 18:25:37 -04:00
Yann Collet
22bd158e0f
Merge pull request #1712 from felixhandte/workspace-efficiency-2
Allocate Internal Buffers via Workspace Abstraction
2019-09-10 15:20:29 -07:00
Bimba Shrestha
1407919d13 Addressing comments on parsing 2019-09-10 15:10:50 -07:00
Bimba Shrestha
47199480da Cleaning up parsing per suggestion 2019-09-10 13:18:59 -07:00
W. Felix Handte
a9d373f093 Remove Empty lib/compress/zstd_cwksp.c 2019-09-10 16:03:13 -04:00
Yann Collet
41416f0927
Merge pull request #1773 from bimbashrestha/rle_first_block_decompression_fix
Removing redundant condition in decompression, making first block rle…
2019-09-10 11:17:29 -07:00
Bimba Shrestha
e3c5825918 Fizing litLength == 0 case 2019-09-10 10:38:13 -07:00
Bimba Shrestha
9e7bb55e14 Addressing comments 2019-09-09 20:04:46 -07:00
W. Felix Handte
81208fd7c2 Forward Declare ZSTD_cwksp_available_space to Fix Build 2019-09-09 19:10:09 -04:00
W. Felix Handte
91bf1babd1 Inline Workspace Functions 2019-09-09 18:53:53 -04:00
W. Felix Handte
0db3ffe7ee Forward resetCCtx Errors when Using CDict 2019-09-09 16:47:19 -04:00
W. Felix Handte
eb6f69d978 Fix sizeof_CCtx and sizeof_CDict Calculations for Statically Init'ed Objects 2019-09-09 16:45:17 -04:00
W. Felix Handte
e3703825a8 Fix workspaceTooSmall Calculation 2019-09-09 15:12:14 -04:00
W. Felix Handte
0a65a67901 Shorten &zc->workspace -> ws in ZSTD_resetCCtx_internal() 2019-09-09 14:59:09 -04:00
W. Felix Handte
1120e4d962 Clean Up TODOs and Comments pt. II 2019-09-09 14:04:39 -04:00
W. Felix Handte
c60e1c3be5 Nit 2019-09-09 13:34:08 -04:00
W. Felix Handte
7d7b665c90 Pull Phase Advance Logic Out into Internal Function 2019-09-09 13:34:08 -04:00