Commit Graph

2136 Commits

Author SHA1 Message Date
Nick Terrell
599ff58e08
Merge pull request #2339 from terrelln/zstdmt-stability
Fix zstdmt stability issues and clean up the zstdmt code
2020-10-27 19:43:13 -07:00
sen
17b700d78a
Merge pull request #2366 from senhuang42/enable_ldm_by_default
Enable LDM by default if window size >= 128MB and strategy uses opt parser
2020-10-27 14:59:28 -04:00
Nick Terrell
0953645837
Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue
Fix long distance matcher OSS-fuzz issue
2020-10-27 11:13:03 -07:00
senhuang42
3163909d14 Remove unused variable position 2020-10-27 12:58:12 -04:00
senhuang42
dc448563e9 Add test compatibility with last literals in sequences 2020-10-27 12:35:28 -04:00
senhuang42
1d221ecc03 Add support for representing last literals in the extracted seqs 2020-10-27 11:19:48 -04:00
senhuang42
9171f920cd Improve documentation of seqStore_t 2020-10-27 10:50:22 -04:00
senhuang42
96b0ff7886 Improve documentation regarding various operations in copyBlockSequences 2020-10-27 10:36:06 -04:00
senhuang42
3a11c7eb03 Modify ZSTD_copyBlockSequences to agree with new API 2020-10-27 10:31:40 -04:00
senhuang42
8bdb32aebe Add a function for LDM enable check 2020-10-20 13:46:02 -04:00
senhuang42
578e889ec1 Move ldm enable to compressStream2() 2020-10-20 13:04:45 -04:00
senhuang42
d28d8a1d72 Include LDM tables size for CCtx size estimation where relevant 2020-10-20 09:21:30 -04:00
senhuang42
b1c7fc5768 Add compatibility for multithreading 2020-10-19 12:07:06 -04:00
senhuang42
590f7f55f0 Add ldm enable condition in ZSTD_resetCCtx_internal 2020-10-19 10:26:17 -04:00
senhuang42
4d01979b62 Expose and call ZSTD_ldm_skipRawSeqStoreBytes() 2020-10-16 20:30:00 -04:00
Yann Collet
a0ec50c2dc
Merge pull request #2355 from senhuang42/change_ldm_mt_config
Reduce --long mode MT jobsize at higher levels
2020-10-16 13:35:50 -07:00
senhuang42
f49926edf4 Change cycleLog adjustment to +3 from +4 2020-10-15 09:56:05 -04:00
senhuang42
ee84817fe7 Reset posInSequence when using ZSTD_referenceExternalSequences() 2020-10-14 22:06:08 -04:00
senhuang42
d0550bb18f Clarify argument names, fix DEBUGLOG() statements 2020-10-14 15:45:43 -04:00
senhuang42
3f99c9b38d Adjust match backwards count args 2020-10-14 15:23:03 -04:00
senhuang42
bf0d559449 Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments() 2020-10-14 12:58:06 -04:00
senhuang42
467e4383b0 Merge branch 'dev' of github.com:senhuang42/zstd into change_ldm_mt_config 2020-10-14 10:17:50 -04:00
Yann Collet
f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell
7e6f91ed84 [minor] Improve docs and add an assert in response to review 2020-10-12 16:43:17 -07:00
senhuang42
354b5f1c0a Use cycleLog instead of chainLog to determine LDM jobLog 2020-10-12 16:09:59 -04:00
Nick Terrell
441ce4178f [zstdmt] Clarify a comment 2020-10-12 12:58:13 -07:00
Nick Terrell
efff5d8b2d [zstdmt] Fix determinism issue with rsyncable mode
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
   for the next sync point.

This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.

Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
2020-10-12 12:55:17 -07:00
Nick Terrell
ede4f97153 [zstdmt] Fix bug where extra empty blocks are emitted
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.

The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
2020-10-12 12:55:17 -07:00
Nick Terrell
c51a9e79b9 [zstdmt] Rip out the zstdmt API
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
2020-10-12 12:55:16 -07:00
Nick Terrell
1784c4b4ab [zstdmt] Remove single-pass shortcut
Simplifies the code and removes blocking from zstdmt.

At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.

Fixes #2327.
2020-10-12 12:53:26 -07:00
Nick Terrell
b55ae009ac [zstdmt] Remove singleBlockingThread mode
This is already handled by zstd, so this logic is never used.
2020-10-12 12:53:26 -07:00
Nick Terrell
d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Nick Terrell
fadaab8c7c [minor improvement] Pass 0 as the content size in the DDS
The DDS structure can't be copied into the working tables like the DMS.
So it doesn't need to account for the source size when sizing its
parameters, just the dictionary size.
2020-10-12 12:47:21 -07:00
Nick Terrell
48ef15fb47 [minor improvement] Pass dictSize when selecting parameters
When selecting parameters in streaming compression with a dictionary use
the dictionary size to select the parameters.
2020-10-12 12:47:19 -07:00
Nick Terrell
012818df99 [refactor] Remove ZSTD_resetCStream_internal()
This function is only called in one place. It isn't a logical separation
of duties, and it was only obsfucating the code now, so inline it.
2020-10-12 12:46:10 -07:00
Nick Terrell
7083f79008 [bug] Fix dictContentType when reprocessing cdict
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).

Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.

Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.

Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
2020-10-12 12:46:10 -07:00
senhuang42
d6911b86be Require LDM matches to be strictly greater in length 2020-10-09 12:56:18 -04:00
Yann Collet
12541931fa
Merge pull request #2328 from marxin/zstd-pool-api
Allow external creation of POOLs that can be shared.
2020-10-09 01:00:50 -07:00
Yann Collet
6fdb0cb8d9
Merge pull request #2303 from senhuang42/let_cdict_take_clevel_priority
For ZSTD_compressStream2(), let cdict take compression level priority
2020-10-09 00:48:30 -07:00
senhuang42
b9c8033cde Define kNullRawSeqStore for every file 2020-10-07 19:02:41 -04:00
senhuang42
a6165c1b28 Change matchState_t::ldmSeqStore to pointer 2020-10-07 14:13:57 -04:00
senhuang42
abce708a56 Move posInSequence correction to correct location 2020-10-07 13:56:25 -04:00
senhuang42
0c515590d8 Replace offCode of largest match if ldm's offCode is superior 2020-10-07 13:56:25 -04:00
senhuang42
0fac8e07e1 Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes 2020-10-07 13:56:25 -04:00
senhuang42
a5500cf2af Refactor separate ldm variables all into one struct 2020-10-07 13:56:25 -04:00
senhuang42
0731b94e7c Use kNullRawSeqStore constant in zstdmt_compress.c 2020-10-07 13:56:25 -04:00
senhuang42
0325d878f2 Remove bubbling down matches with longer offCode and same matchLen 2020-10-07 13:56:25 -04:00
senhuang42
031b7ec15f Disable LDM minMatch adjustment when using opt parser 2020-10-07 13:56:25 -04:00
senhuang42
ddf8a3f1b9 Enable inclusion of mid-flight LDMs in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
88f72ed942 Correct incorrect offcode calculation 2020-10-07 13:56:25 -04:00
senhuang42
d8b43a4202 Add explicit conversion of size_t to U32 2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
senhuang42
c87d2e5866 Prefix new static ldm helpers with ZSTD_opt 2020-10-07 13:56:25 -04:00
senhuang42
429dec4f42 Add DEBUGLOG() calls in ldm helpers 2020-10-07 13:56:25 -04:00
senhuang42
10647924f1 Make function descriptions more accurate 2020-10-07 13:56:25 -04:00
senhuang42
1a687b3fcb Improve documentation of relevant structs 2020-10-07 13:56:25 -04:00
senhuang42
37617e23d7 Correct matchLength calculation and remove unnecessary functions 2020-10-07 13:56:25 -04:00
senhuang42
7dee62c287 Reset ldmSeqStore after initStats_ultra() pass for btultra2 2020-10-07 13:56:25 -04:00
senhuang42
0718aa70df Refactor existing functions to use posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
7348b40a87 Adjustments to ldm_calculateMatchRange() to calculate bounds correctly 2020-10-07 13:56:25 -04:00
senhuang42
a1ef2db5b2 Add ldm_calculateMatchRange() function 2020-10-07 13:56:25 -04:00
senhuang42
ef823e0299 Remove rawSeqStore.base and add rawSeqStore.posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
4793ae3b84 Prevent duplicate LDMs from being inserted 2020-10-07 13:56:25 -04:00
senhuang42
65f9cfeeec Add extra bounds check to prevent heap access after free ASAN error 2020-10-07 13:56:25 -04:00
senhuang42
bff5785fd5 Address mixed variables C90 warning 2020-10-07 13:56:25 -04:00
senhuang42
724b94ed18 ldm_getNextMatch fixed return values 2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68 Cleanups, add comments and explanations 2020-10-07 13:56:25 -04:00
senhuang42
78da2e1808 Fixed sifting algorithm 2020-10-07 13:56:25 -04:00
senhuang42
6ccd97fc96 Fixed end of match boundary update issues 2020-10-07 13:56:25 -04:00
senhuang42
28394b64f2 Add proper bounds check on adding ldms 2020-10-07 13:56:25 -04:00
senhuang42
a2f2b58d04 Add a function ldm_voidSequences() 2020-10-07 13:56:25 -04:00
senhuang42
9c3c7cd20e Fix function argument to getNextMatch() 2020-10-07 13:56:25 -04:00
senhuang42
c8b8572b38 Adjustments to no longer segfault on nci 2020-10-07 13:56:25 -04:00
senhuang42
f57c7e6bbf Add base adjustment correction 2020-10-07 13:56:25 -04:00
senhuang42
5df9b5e05f Add initial getNextMatch() in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
f8ce7cabc3 Added more debugging 2020-10-07 13:56:25 -04:00
senhuang42
84009a076a Add re-copying of ldmSeqStore after processing 2020-10-07 13:56:25 -04:00
senhuang42
42395a70c2 Add debug statements, flesh out functions 2020-10-07 13:56:25 -04:00
senhuang42
dd3dd199bb Get zstd to build with new functions and callsites, fix arguments 2020-10-07 13:56:25 -04:00
senhuang42
766c4a8c28 Implement part of ldm_maybeAddLdm() 2020-10-07 13:56:25 -04:00
senhuang42
84777059d2 Implement ldm_getNextMatch() 2020-10-07 13:56:24 -04:00
senhuang42
28c74bf591 Implement basic splitSequence and skipSequence functions 2020-10-07 13:56:24 -04:00
senhuang42
634ab7830d Flesh out required args for ldm_handleLdm() 2020-10-07 13:56:24 -04:00
senhuang42
db70761032 Add callsites to appropriate locations in ..opt_generic() 2020-10-07 13:56:24 -04:00
senhuang42
aea61e3c91 Add ldm helper function declarations into opt parser 2020-10-07 13:56:24 -04:00
senhuang42
35d9f488f5 Modify codepath to use opt parser exclusively if the compression level is high enough 2020-10-07 13:56:24 -04:00
senhuang42
e1ae398ad5 Add rawSeqStore to match state 2020-10-07 13:56:24 -04:00
Martin Liska
b684900a4a Allow external creation of POOLs that can be shared. 2020-10-07 12:44:33 +02:00
Nick Terrell
27c969ed07 Add comments to ZSTD_getLowest{Match,Prefix}Index()
Clarify how we handle dictionaries in each case.
2020-10-01 13:21:46 -07:00
Yann Collet
cc88eb7594
Merge pull request #2317 from animalize/msvc_inline
Let MSVC force inline ZSTD_hashPtr() function
2020-09-30 08:27:53 -07:00
Nick Terrell
f1cbeec039 [superblock] Reduce stack usage by correctly sizing header buffers 2020-09-24 19:42:04 -07:00
Nick Terrell
6a1e526ea7 [lib] Add ZSTD_COMPRESS_HEAPMODE tuning parameter 2020-09-24 19:42:04 -07:00
Nick Terrell
b841387218 [freestanding] Improve macro resolution to handle #if X 2020-09-24 19:42:04 -07:00
Nick Terrell
caecd8c211 Allow user to override ASAN/MSAN detection
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
2020-09-24 19:42:04 -07:00
Nick Terrell
88fac5d514 Remove call to memset
The previous commit fixes the test so it errors on calls to mem*()
functions from <string.h>.
2020-09-24 19:42:04 -07:00
Nick Terrell
9261476b7d [lib] Wrap customMem xor checks in parens for readability
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
2020-09-23 23:26:07 -07:00
animalize
2e5d73dd72 Use MEM_STATIC FORCE_INLINE_ATTR instead of FORCE_INLINE_TEMPLATE
It adds `__attribute__((unused))` for __GNUC__, to eliminate `-Werror=unused-function` error.
2020-09-21 13:26:38 +08:00
animalize
0a69a6b1ca Let MSVC force inline ZSTD_hashPtr() function
ZSTD_hashPtr() function was not expanded by MSVC, led to low performance compared to GCC.
2020-09-21 10:38:55 +08:00
Felix Handte
200c960f1d
Merge pull request #2311 from felixhandte/ddss-fix-cparam-derivation
Fix Compression Parameter Derivation Bugs Introduced by DDSS Changes
2020-09-18 14:02:14 -04:00
W. Felix Handte
8930c6e551 Use ZSTD_CCtxParams_init() to Init CCtxParams, not memset()
Even if the discrepancies are at the moment benign, it's probably better to
standardize on using the one true initializer, rather than trying (and failing)
to correctly duplicate its behavior.
2020-09-17 12:15:33 -04:00
W. Felix Handte
e8a44326fa Avoid Redundancy in ZSTD_initCDict_internal() Args; Don't Take CParams + CCtxParams 2020-09-17 12:08:36 -04:00
W. Felix Handte
eee51a664a Fall Back if Derived CParams are Incompatible with DDSS; Refactor CDict Creation
Rewrite ZSTD_createCDict_advanced() as a wrapper around
ZSTD_createCDict_advanced2(). Evaluate whether to use DDSS mode *after* fully
resolving cparams. If not, fall back.
2020-09-15 18:01:08 -04:00
W. Felix Handte
bc6521a6f6 Make ZSTD_createCDict_advanced2() cctxParams Arg Const 2020-09-15 14:06:10 -04:00
W. Felix Handte
26a96a5b35 Do More Complete CParams Deduction in Non-DDSS Path of ZSTD_createCDict_advanced2
Call ZSTD_getCParamsFromCCtxParams() instead of ZSTD_getCParams_internal().
2020-09-15 13:57:43 -04:00
W. Felix Handte
a2af804129 Pull CParam Override Logic into Helper 2020-09-15 13:38:05 -04:00
Yann Collet
c91a0855f8 check endDirective in ZSTD_compressStream2()
fix #2297
also :
- `assert()` `endDirective` in `ZSTD_compressStream_internal()`, for debug mode
- add relevant tests
2020-09-14 10:56:08 -07:00
senhuang42
17b56f934e Coding style cleanup 2020-09-11 11:42:12 -04:00
senhuang42
801513b5e7 Modify params rather than cctx->requestedParams 2020-09-11 11:41:10 -04:00
W. Felix Handte
c5fab8848a Document searchFuncs Table 2020-09-10 22:10:02 -04:00
W. Felix Handte
85a95840e4 Further Consolidate Dict Mode Checks 2020-09-10 22:10:02 -04:00
W. Felix Handte
0faefbf1b3 Make DDSS Selection Override ForceCopy Directive 2020-09-10 22:10:02 -04:00
W. Felix Handte
efa33861f2 Attempt to Fix MSVC Warnings 2020-09-10 22:10:02 -04:00
W. Felix Handte
ed43832770 Simplify Match Limit Checks
Seems like a ~1.25% speedup.
2020-09-10 22:10:02 -04:00
W. Felix Handte
06d240b8a7 Use All Available Space in the Hash Table to Extent Chain Table Reach
Rather than restrict our temp chain table to 2 ** chainLog entries, this
commit uses all available space to reach further back to gather longer
chains to pack into the DDSS chain table.
2020-09-10 22:10:02 -04:00
W. Felix Handte
b2b0641ea0 Rewrite Table Fill to Retain Cache Entries Beyond Chain Window 2020-09-10 22:10:02 -04:00
W. Felix Handte
916238d9dc Avoid Malloc in Table Fill; Pack Tmp Structure into Hash Table 2020-09-10 22:10:02 -04:00
W. Felix Handte
f42c5bddd9 Truncate Chain at Last Possible Attempt
Make the chain table denser?
2020-09-10 22:10:02 -04:00
W. Felix Handte
20a020edbc Prefetch Chain Table Matches 2020-09-10 22:10:02 -04:00
W. Felix Handte
9b9feb84f2 Lay Out Chain Table Chains Contiguously
Rather than interleave all of the chain table entries, tying each entry's
position to the corresponding position in the input, this commit changes the
layout so that all the entries in a single chain are laid out next to each
other. The last entry in the hash table's bucket for this hash is now a packed
pointer of position + length of this chain.

This cannot be merged as written, since it allocates temporary memory inside
ZSTD_dedicatedDictSearch_lazy_loadDictionary().
2020-09-10 22:10:02 -04:00
W. Felix Handte
66509c7bf4 Only Insert Positions Inside the Chain Window 2020-09-10 22:10:02 -04:00
W. Felix Handte
13c5ec3e41 Only Allow Dedicated Dict Search for Dicts Loaded in 1 Chunk
The load algorithm requires we do it all in one go.
2020-09-10 22:10:02 -04:00
W. Felix Handte
07793547e6 Fix Bug: Only Use DDSS Insertion on CDict MatchStates
Previously, if DDSS was enabled on a CCtx and a dictionary was inserted into
the CCtx, the CCtx MatchState would be filled as a DDSS struct, causing
segfaults etc. This changes the check to use whether the MatchState is marked
as using the DDSS (which is only ever set for CDict MatchStates), rather than
looking at the CCtxParams.
2020-09-10 18:51:52 -04:00
W. Felix Handte
d214d8c859 Shorten Dict Mode Conditionals in Order to Improve Readability 2020-09-10 18:51:52 -04:00
W. Felix Handte
f49c1563ff Force-Inline ZSTD_insertAndFindFirstIndex_internal()
Without this, gcc was declining to inline the function in `ZSTD_noDict` mode,
resulting in a ~10% slowdown.
2020-09-10 18:51:52 -04:00
W. Felix Handte
cab86b074f Clean Up Search Function Selection 2020-09-10 18:51:52 -04:00
W. Felix Handte
2ffbde0d95 Fix -Wshorten-64-to-32 Error 2020-09-10 18:51:52 -04:00
W. Felix Handte
7b5d2f72ea Adjust Working Context Table Sizes Back Down 2020-09-10 18:51:52 -04:00
W. Felix Handte
d332f57897 Permit Matching Against Lowest Valid Position
This comparison was previously faulty: the lowest valid position is itself
valid, and we should therefore be allowed to match against it.
2020-09-10 18:51:52 -04:00
W. Felix Handte
a3659fe1ef Make ZSTD_dedicatedDictSearch_getCParams Wrap ZSTD_getCParams
Fixes up bounds-checking, and lets us clean up what is at the moment an
unnecessary duplication of the default cparams tables.
2020-09-10 18:51:52 -04:00
W. Felix Handte
7b9a755ac9 Remove Chain Limit on Hash Cache Entries; Slightly Improve Compression
Entries in the hashTable chain cache aren't subject to the same aliasing that
the circular chain table is subject to. As such, we don't need to stop when we
cross the chain limit. We can delve deeper. :)
2020-09-10 18:51:52 -04:00
W. Felix Handte
e8b4011b52 Split Lookups in Hash Cache and Chain Table into Two Loops
Sliiiight speedup.
2020-09-10 18:51:52 -04:00
W. Felix Handte
9e83c782f8 Simplify DDS Hash Table Construction
No need to walk the chainTable; we can just keep shifting the entries in the
hashTable.
2020-09-10 18:51:52 -04:00
W. Felix Handte
5390fee4f7 Rename and Move DD_BLOG Constant to ZSTD_LAZY_DDSS_BUCKET_LOG 2020-09-10 18:51:52 -04:00
W. Felix Handte
5e91ae27eb Prefetch First Batch of Match Positions; +11% Speed in Level 5 w/ 1 Dict 2020-09-10 18:51:52 -04:00
W. Felix Handte
df386b3d8d Fix Off-By-One Error in Counting DDS Search Attempts
This caused us to double-search the first position and fail to search the
last position in the chain, slowing down search and making it less effective.
2020-09-10 18:51:52 -04:00
W. Felix Handte
914bfe7ee4 Init CCtx's Local Dict with CCtxParams 2020-09-10 18:51:52 -04:00
W. Felix Handte
db2aa25252 Decision for Whether to Attach Should be Based on CDict Config, not CCtx 2020-09-10 18:51:52 -04:00
W. Felix Handte
a494111385 Move Prefetch Before Insertion; Speed Up ~6% 2020-09-10 18:51:52 -04:00
W. Felix Handte
eede46a47e Misc Refactor of DDS Search Code 2020-09-10 18:51:52 -04:00
W. Felix Handte
f1b428fdac Rename enableDedicatedDictSearch to dedicatedDictSearch in MatchState
This makes it clear that not only is the feature allowed here, we're actually
using it, as opposed to the CCtxParam field, in which it's enabled, but we may
or may not be using it.
2020-09-10 18:51:52 -04:00
W. Felix Handte
41012193ad Always Init CDict's enableDedicatedDictSearch Field 2020-09-10 18:51:52 -04:00
W. Felix Handte
34b545acb0 Add a ZSTD_dedicatedDictSearch ZSTD_dictMode_e to Allow Const Propagation
Speed +1.5%.
2020-09-10 18:51:52 -04:00
W. Felix Handte
beefdb0d3d Fix ZSTD_c_forceAttachDict Bounds 2020-09-10 18:51:52 -04:00
W. Felix Handte
def62e2d3e Fix Compilation Warnings 2020-09-10 18:51:52 -04:00
Bimba Shrestha
9c628238d3 creating ZSTD_createCDict_advanced_internal 2020-09-10 18:51:52 -04:00
Bimba Shrestha
0a9787c3e1 changing to int for consistency 2020-09-10 18:51:52 -04:00
Bimba Shrestha
e29bc3a009 using dict mls instead of src mls 2020-09-10 18:51:52 -04:00
Bimba Shrestha
145c2d12f9 add hashtable head prefetching 2020-09-10 18:51:52 -04:00
Bimba Shrestha
5d5507788d change method name for consistency 2020-09-10 18:51:52 -04:00
Bimba Shrestha
b30f71becf pass correct cparams 2020-09-10 18:51:52 -04:00
Bimba Shrestha
71fda0362f making cctxParams a pointer 2020-09-10 18:51:52 -04:00
Bimba Shrestha
628559d0e4 loading dict using new algorithm 2020-09-10 18:51:52 -04:00
Bimba Shrestha
22705f0c93 adding dedicatedDictSearch algorithm 2020-09-10 18:51:52 -04:00
Bimba Shrestha
31e581bf65 adding enableDedicatedDictSearch to matchState_t 2020-09-10 18:51:52 -04:00
Bimba Shrestha
50550a14ad adding dedicated dict load method to lazy 2020-09-10 18:51:52 -04:00
Bimba Shrestha
75b6360036 adding ZSTD_createCDict_advanced2 to zstd.h 2020-09-10 18:51:52 -04:00
Bimba Shrestha
b7dddbe89b always attach dict when using dedicatedDictSearch 2020-09-10 18:51:52 -04:00
Bimba Shrestha
e36a373df4 adding dedicatedDictSearch cParams helper methods 2020-09-10 18:51:52 -04:00
Bimba Shrestha
f10d4e313c adding ZSTD_dedicatedDictSearch_defaultCParameters variable 2020-09-10 18:51:52 -04:00
Bimba Shrestha
c497cb6716 Add ZSTD_c_enableDedicatedDictSearch Param 2020-09-10 18:51:52 -04:00
senhuang42
64bd68e44b Adjust ZSTD_createCDict_byReference() function, and check for cdict when using compressStream2 2020-09-10 13:42:26 -04:00
Nick Terrell
79ded1b4a9 [lib] Add ZSTD_NO_UNUSED_FUNCTIONS macro to hide unused functions
The unused function definitions are hidden behind a
`#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check.

Initially hiding all functions which are unused and take up more than
2KB of stack space, because these will show up as warnings in the
Linux Kernel build system.
2020-09-09 14:35:39 -07:00
Nick Terrell
ac3a136b0a [lib] Replace 64-bit divisions with ZSTD_div64() 2020-09-09 14:35:39 -07:00
Nick Terrell
a90779397a [lib] Reduce zstd stack usage by 1KB 2020-09-09 14:35:39 -07:00
Nick Terrell
046aca190f Fix ZSTD_initCStream_advanced() with no dictionary and static allocation 2020-09-09 14:35:39 -07:00
Nick Terrell
f91ed5c766 [lib] s/current/curr because it collides with Linux Kernel macro 2020-09-09 14:35:39 -07:00
Nick Terrell
5e4efd22d4
Merge pull request #2291 from i-do-cpp/fix-compression-level-default
Fix setParameter not falling back to default compression level
2020-09-08 16:42:34 -07:00
Nick Terrell
6da8acd231
Merge pull request #2293 from allanjude/coverity
Resolve Coverity 1432392 Unintentional integer overflow
2020-09-03 13:58:45 -07:00
Allan Jude
8665793164 Resolve Coverity 1432392 Unintentional integer overflow
Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression:
cdict->dictContentSize * 6U
with type unsigned int (32 bits, unsigned) is evaluated using 32-bit
arithmetic, and then used in a context that expects an expression of
type U64 (64 bits, unsigned).
2020-09-03 19:31:50 +00:00
i-do-cpp
aec8b27fff
Update zstd_compress.c 2020-08-31 09:34:08 +02:00
i-do-cpp
d514281e73
Fix setParameter not falling back to default compression level on 0 value
See documentation for `ZSTD_c_compressionLevel`: `Special: value 0 means default, which is controlled by ZSTD_CLEVEL_DEFAULT`
2020-08-31 09:25:43 +02:00
Nick Terrell
c465f24457 ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free 2020-08-26 12:26:03 -07:00
Nick Terrell
a686d306d2 Rename ZSTD_{malloc,calloc,free} to ZSTD_custom{Malloc,Calloc,Free} 2020-08-26 12:25:08 -07:00
Nick Terrell
80f577baa2 Move standard includes to zstd_deps.h 2020-08-26 12:25:08 -07:00
Nick Terrell
614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
Nick Terrell
8def0e5fd3 Fix up code after reading through 2020-08-24 12:24:45 -07:00
Nick Terrell
575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Nick Terrell
ba1fd17a9f speed up literal header decoding 2020-08-17 12:17:53 -07:00
Nick Terrell
6004c1117f speed up small blocks 2020-08-16 23:03:38 -07:00
Yann Collet
38e38546a4
Merge pull request #2258 from Niadb/dev
Added STATIC_BMI2 for compile time detection of BMI2 on MSVC, when enabled various intrinsics are used
2020-08-04 09:43:59 -07:00
Niadb
216a63dcf7
Add files via upload 2020-07-28 02:52:52 -06:00
Yann Collet
8b9cdd2597 fixed overlapping count & workspace special case 2020-07-26 22:40:21 -07:00
Yann Collet
051232223f optimized histogram
new version easier to vectorize
leads to smaller code and faster execution
notably at the last recombination stage
(basically, fixed cost per block).

Assembly inspected with godbolt

On my laptop, with `clang` and `-mavx2` :
2K block : 1280 MB/s -> 1550 MB/s
8K block : 1750 MB/s -> 1860 MB/s
2020-07-26 22:24:22 -07:00
Yann Collet
c224367ede ensure workspace is large enough
even when MAX_TABLELOG is reduced
2020-07-16 20:33:50 -07:00
Nick Terrell
1047097dad [superblock] Add defensive assert and bounds check
The bound check condition should always be met because we selected `set_basic` as
our encoding type. But that code is very far away, so assert it is true so if it is
ever false we can catch it, and add a bounds check.

Fixes #2213.
2020-06-22 10:21:38 -07:00
Nick Terrell
08981d2638 [lib] Allow compression dictionaries with missing symbols
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.

Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.

Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.

Fixes #2174.
2020-06-12 17:57:19 -07:00
Felix Handte
2af4e07326
Merge pull request #2133 from felixhandte/single-size-calculation
Consolidate CCtx Size Estimation Code
2020-05-28 13:07:18 -04:00
Nick Terrell
3cc227e90e [ldm][mt] Fix loadedDictEnd 2020-05-19 15:55:03 -07:00
Yann Collet
fdc56baa42
fix 22294 (#2151) 2020-05-18 21:05:10 -07:00
Nick Terrell
b2092c6dc4 [ldm] Reset loadedDictEnd when the context is reset 2020-05-18 12:35:44 -07:00
Nick Terrell
add7ed2d4a [lib] Fix bug in loading LDM dictionary in MT mode
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.

Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```

TODO: Add an explicit test that loads a small dictionary in MT mode
2020-05-14 11:52:28 -07:00
W. Felix Handte
3bb7992350 Fix Size Estimate for LDM Seq Space 2020-05-14 13:50:53 -04:00
Nick Terrell
70c80e19e6 [greedy] Fix performance instability 2020-05-12 17:51:16 -07:00
Nick Terrell
c3e921c639
Merge pull request #2131 from terrelln/raw-dict-fuzzer
Fix rare scenario with lazy parser, dictionary, and repcodes
2020-05-12 17:44:31 -07:00
W. Felix Handte
d9a1e37aec Nit: Fix Size Type for 32-bit 2020-05-12 18:03:31 -04:00
W. Felix Handte
1aa6c7ccce Assert We Allocated Approximately What We Expected To 2020-05-12 16:55:03 -04:00
W. Felix Handte
27e2482217 Minor Refactor 2020-05-12 16:55:03 -04:00
W. Felix Handte
afc2488973 Handle Non-Static CCtxes in Estimation 2020-05-12 16:54:33 -04:00
W. Felix Handte
7ed996f5a0 Consolidate CCtx Size Estimation Code
This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams`
into a helper. It then migrates two other callsites to use that helper,
a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which
folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`,
which is more invasive.

This attempts to guarantee that the estimates returned to users are always
correct.
2020-05-12 16:26:53 -04:00
Nick Terrell
3c1eba4d99 [lib] Fix lazy repcode validity checks 2020-05-12 12:25:06 -07:00
Nick Terrell
4e0515916d [lib] Fix repcode validation in no dict mode 2020-05-12 11:57:15 -07:00
Nick Terrell
6d687a8816 [lib] Fix dictionary + repcodes + optimal parser 2020-05-12 10:36:53 -07:00
Nick Terrell
4b88bd3ee0 [lib][fuzz] Assert sequences are valid in round trip tests 2020-05-11 20:38:49 -07:00
Nick Terrell
80d3585e31 [lib] Fix lazy parser with dictionary + repcodes 2020-05-11 19:04:30 -07:00
Yann Collet
608f1bfc4c fixed context downsize with initStatic
When context is created using initStatic,
no resize is possible.

fix : only bump oversizeDuration when !initStatic
2020-05-11 18:16:38 -07:00
W. Felix Handte
c6636afbbb Fix ZSTD_estimateCCtxSize() Under ASAN
`ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which
is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the
buffers, assuming they'll be zero. However, the actual workspace allocation
logic always allocates those buffers, and when running under ASAN, the
workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized
buffers end up consuming 512 bytes of space, which is accounted for in the
actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't
in the estimation path, since it ignores the buffers entirely.

This commit fixes this.
2020-05-11 18:58:19 -04:00
Yann Collet
54144285fd small speed improvement for strategy fast
gcc 9.3.0 :
kennedy : 459 -> 466
silesia : 360 -> 365
enwik8  : 267 -> 269

clang 10.0.0 :
kennedy : 436 -> 441
silesia : 364 -> 366
enwik8  : 271 -> 272
2020-05-07 06:15:58 -07:00
Felix Handte
ad8dbae1b7
Merge pull request #2103 from felixhandte/relative-includes
Migrate Includes to Relative Paths
2020-05-06 09:42:23 -07:00
Yann Collet
c29fd7cd8b some more conversion warnings
hunting down some static analyzer warnings
2020-05-05 10:16:59 -07:00
Yann Collet
c1b836f4c3 fix minor conversion warnings 2020-05-04 14:43:09 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
Felix Handte
7e9aabd652
Merge pull request #2099 from felixhandte/compile-under-pedantic
Compile Under `-pedantic -Werror` and `-std=c90`
2020-05-04 10:07:13 -07:00
Felix Handte
816ed80774
Merge pull request #1984 from MeghnaM/1636-Reduce-stack-usage-of-HUF_sort
Reduce stack usage of HUF_sort()
2020-05-04 08:15:31 -07:00
W. Felix Handte
c7da66c9cf Purge C++-Style Comments (// ...), Make Compilation Succeed Under C90 2020-05-04 10:59:15 -04:00
W. Felix Handte
6696933b32 Make All Invocations Start With Literal Format String 2020-05-04 10:59:15 -04:00
W. Felix Handte
5e5f262612 Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations 2020-05-04 10:58:55 -04:00
Nick Terrell
e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Meghna Malhotra
0adfc8dfce Fix broken CI; make changes in response to the comments 2020-05-01 13:45:48 -07:00
Meghna Malhotra
53d76dc20f Remove magic constant and made other changes addressing the comments 2020-05-01 13:45:48 -07:00
Meghna Malhotra
fe8402b522 WIP: Still getting an error 2020-05-01 13:45:48 -07:00
Meghna Malhotra
a084d959bd WIP: Increased wksp size, but it's segfaulting 2020-05-01 13:45:48 -07:00
Meghna Malhotra
fdb2780c47 Move rank table into HUF_buildCTable_wksp() 2020-05-01 13:45:48 -07:00
Bimba Shrestha
1875f616ce passing dictContentType instead of rawContent every time 2020-04-21 22:29:35 -07:00
Bimba Shrestha
5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Nick Terrell
5fcbc484c8
Merge pull request #2040 from caoyzh/dev-2
Optimize by prefetching on aarch64
2020-04-08 13:14:47 -07:00
Bimba Shrestha
c0d4b2b5a3
Merge pull request #2075 from bimbashrestha/dict_fuzzer_ref
[bug] handling case where prefix is NULL or 0 sized in refPrefix_advanced
2020-04-07 17:37:19 -05:00
Bimba Shrestha
1658ae75cd handling nil case for refprefix 2020-04-07 14:41:53 -07:00
Carl Woffenden
a93fadfcd9 Further replication removed
`CHECK_F` is now in `error_private.h`. Minor tidy.
2020-04-07 11:25:16 +02:00
Carl Woffenden
7af7735fa3 Merge remote-tracking branch 'upstream/dev' into single-file-lib 2020-04-07 11:13:02 +02:00
Carl Woffenden
edd9a07322 Code replicated in compression and decompression moved to shared headers
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
2020-04-07 11:02:06 +02:00
Bimba Shrestha
0154866749 moving consts to zstd_internal and reusing them 2020-04-03 14:26:15 -07:00
Carl Woffenden
7c420344d2 Single-file decoder script can now (optionally) create an encoder
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
2020-04-03 19:07:46 +02:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell
d34204a7b7
Merge pull request #2029 from terrelln/minor-opt
[opt] Update repcodes less often
2020-03-23 18:12:32 -07:00
caoyzh
7201980650 Optimize by prefetching on aarch64 2020-03-14 15:25:59 +08:00
Bimba Shrestha
66607d0eac
Merge pull request #2033 from bimbashrestha/icc
[opt] Small icc level 1 compression speed gain using #pragma vector
2020-03-10 20:42:19 -05:00
Bimba Shrestha
a89c45bdbd Typo 2020-03-10 15:19:48 -05:00
Bimba Shrestha
43fc88f443 Adding comment and remvoing ivdep 2020-03-10 14:57:27 -05:00
Bimba Shrestha
dba3abc95a Missed returns 2020-03-05 12:20:59 -08:00
Bimba Shrestha
a75e5f2ffc bitscan add undef check 2020-03-05 11:52:15 -08:00
Bimba Shrestha
4c72a1a9c2 adding vector to main loop 2020-03-05 09:55:38 -08:00
Nick Terrell
81fda0419e [opt] Only update repcodes upon arrival 2020-03-04 17:57:15 -08:00
Nick Terrell
04744e52dc
Merge pull request #2028 from terrelln/minor-opt
[opt] Don't recompute initial literals price
2020-03-04 17:40:59 -08:00
Nick Terrell
0f9882deb9 [opt] Don't recompute repcodes while emitting sequences 2020-03-04 17:23:00 -08:00
Nick Terrell
c6caa2d04e [opt] Delete ZSTD_litLengthContribution 2020-03-04 16:35:26 -08:00
Nick Terrell
610171ed86 [opt] Explain why we don't include literals price 2020-03-04 16:29:19 -08:00
Nick Terrell
5f49578be7 [opt] Don't recompute initial literals price 2020-03-04 16:27:17 -08:00
Nick Terrell
c836992be1 Dont log errors when ZSTD_fseBitCost() returns an error 2020-03-02 11:13:18 -08:00
Bimba Shrestha
80c26117a9 Line-wrapping 2020-02-03 09:38:16 -08:00
Bimba Shrestha
ee8a712af3 Using appliedParams instead of supplied params 2020-01-31 15:49:07 -08:00
Nick Terrell
a11a9271d6 Fix lowLimit underflow in overflow correction 2020-01-17 12:10:18 -08:00
Nick Terrell
036b30b555
Fix super block compression and stream raw blocks in decompression (#1947)
Super blocks must never violate the zstd block bound of input_size + ZSTD_blockHeaderSize. The individual sub-blocks may, but not the super block. If the superblock violates the block bound we are liable to violate ZSTD_compressBound(), which we must not do. Whenever the super block violates the block bound we instead emit an uncompressed block.

This means we increase the latency because of the single uncompressed block. I fix this by enabling streaming an uncompressed block, so the latency of an uncompressed block is 1 byte. This doesn't reduce the latency of the buffer-less API, but I don't think we really care.

* I added a test case that verifies that the decompression has 1 byte latency.
* I rely on existing zstreamtest / fuzzer / libfuzzer regression tests for correctness. During development I had several correctness bugs, and they easily caught them.
* The added assert that the superblock doesn't violate the block bound will help us discover any missed conditions (though I think I got them all).

Credit to OSS-Fuzz.
2020-01-10 18:02:11 -08:00
Nick Terrell
d1cc9d2797
[fuzz] Allow zero sized buffers for streaming fuzzers (#1945)
* Allow zero sized buffers in `stream_decompress`. Ensure that we never have two
  zero sized buffers in a row so we guarantee forwards progress.
* Make case 4 in `stream_round_trip` do a zero sized buffers call followed by
  a full call to guarantee forwards progress.
* Fix `limitCopy()` in legacy decoders.
* Fix memcpy in `zstdmt_compress.c`.

Catches the bug fixed in PR #1939
2020-01-09 11:38:50 -08:00
Bimba Shrestha
b1f53b1a10 [fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936)
* Adding fail logging for superblock flow

* Dividing by targetCBlockSize instead of blockSize

* Adding new const and using more acurate formula for nbBlocks

* Only do dstCapacity check if using superblock

* Remvoing disabling logic

* Updating test to make it catch more extreme case of previou bug

* Also updating comment

* Only taking compressEnd shortcut on non-superblock
2020-01-03 16:53:51 -08:00
Bimba Shrestha
56415efc76 Constifying, malloc check and naming nit 2019-12-17 17:16:51 -08:00
Bimba Shrestha
5225dcfc0f Adding bool to check if enough room left for noCompress superblocks 2019-12-13 15:47:28 -08:00
Yann Collet
d73e2fb465
Merge pull request #1891 from bimbashrestha/oss
[fuzz] Superblock fuzz issues
2019-12-10 13:17:00 -08:00
Bimba Shrestha
e1913dc87f Making const, removing unnecessary indent, changing parameter order 2019-12-04 15:51:17 -08:00
Bimba Shrestha
2ec556fec2 Moving init/end functions, moving compressSuperBlock inside body() 2019-12-04 15:23:13 -08:00
Bimba Shrestha
ffb0463041 Refactor 2019-12-04 14:52:27 -08:00
Bimba Shrestha
49c6d49247 [fuzz] msan uninitialized unsigned value (#1908)
Fixes new fuzz issue

Credit to OSS-Fuzz

* Initializing unsigned value

* Initialilzing to 1 instead of 0 because its more conservative

* Unconditionoally setting to check first and then checking zero

* Moving bool to before block for c90

* Move check set before block
2019-12-04 10:02:17 -08:00
Bimba Shrestha
1fc9352f81 Using bss var instead of creating new bool 2019-12-02 21:39:06 -08:00
Bimba Shrestha
1f681d8592 Merge branch 'oss' of https://github.com/bimbashrestha/zstd into oss 2019-11-27 10:56:54 -08:00
Bimba Shrestha
a3a3c62b81 [fuzz] Only set HUF_repeat_valid if loaded table has all non-zero weights (#1898)
Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table.

* Only setting loaded dict huf table to valid on non-zero

* Adding hasNoZeroWeights test to fse tables

* Forbiding nbBits != 0 when weight == 0

* Reverting the last commit

* Setting table log to 0 when weight == 0

* Small (invalid) zero weight dict test

* Small (valid) zero weight dict test

* Initializing repeatMode vars to check before zero check

* Removing FSE changes to seperate pr

* Reverting accidentally changed file

* Negating bool, using unsigned, optimization nit
2019-11-26 12:24:19 -08:00
Bimba Shrestha
d4e17d0776 Negating bool, updating bool on inner branches 2019-11-26 12:17:43 -08:00
Bimba Shrestha
826b555463
Merge branch 'dev' into oss 2019-11-22 17:29:33 -08:00
Bimba Shrestha
10bce1919e Mixed declration fix 2019-11-21 13:08:27 -08:00
Bimba Shrestha
0451accab1 Checking noCompressBlock explicitly for rep code confirmation 2019-11-21 13:06:26 -08:00
Nick Terrell
659e9f05cf Fix null pointer addition 2019-11-20 18:36:04 -08:00
Bimba Shrestha
8f0c2d04c8 Going back to original flow but removing else return 2019-11-19 10:03:07 -08:00
Nick Terrell
a839d6852c
Merge pull request #1888 from senhuang42/superblocks_fixed
RLE test and re-enable RLE in main compression loop
2019-11-18 16:09:33 -08:00
Bimba Shrestha
80586f5e80 Reversing condition order and forwarding error 2019-11-18 13:53:55 -08:00
Bimba Shrestha
dade64428f Output regular uncompressed block when compressSequences fails 2019-11-18 08:43:14 -08:00
Bimba Shrestha
2d5d961a60 Typo in comment 2019-11-15 19:00:53 -08:00
Bimba Shrestha
dba767c0bb Leaving room for checksum 2019-11-15 18:44:51 -08:00
Sen Huang
d9646dcbb5 Fixed main compression logic changes 2019-11-14 19:39:09 -05:00
Yann Collet
4b1ac69f19
Merge pull request #1868 from senhuang42/superblocks_fixed
Superblocks rebased for merge
2019-11-14 13:31:34 -08:00
Sen Huang
c26d32c91c Change superblock #include to be last 2019-11-14 13:12:17 -05:00
Yann Collet
d67742bc5d
Merge pull request #1858 from senhuang42/dictionary_header_size
Method to get dictionary header size
2019-11-14 09:44:07 -08:00
Sen Huang
d9c475f3b3 Fix static analyze error, use proper bounds for dictEnd 2019-11-08 13:57:26 -05:00
Sen Huang
d06b90692b Move asserts to loadZstdDictionary() 2019-11-08 13:57:26 -05:00
Sen Huang
b39149e156 Expose ZSTD_reset_compressedBlockState() to shared API 2019-11-08 13:57:26 -05:00
Sen Huang
6ce335371b Add error forwarding to loadCEntropy(), make check for dictSize >= 8 from bad merge 2019-11-08 13:57:26 -05:00
Sen Huang
c787b351ea Use ZSTD Error codes, improve explanation of ZSTD_loadCEntropy() and ZSTD_loadDEntropy() 2019-11-08 13:57:26 -05:00
Sen Huang
04fb42b4f3 Integrated refactor into getDictHeaderSize, now passes tests 2019-11-08 13:57:26 -05:00
Sen Huang
0bcaf6db08 First working pass at refactor of loadZstdDictionary() 2019-11-08 13:57:26 -05:00
Nick Terrell
8c474f9845 Fix parameter selection and adjustment with srcSize == 0 2019-11-07 08:58:43 -08:00
Felix Handte
5688447758
Merge pull request #1873 from felixhandte/make-overlap-log-multithread-only
Fix #1861: Restrict overlapLog Parameter When Not Built With Multithreading
2019-11-06 16:56:37 -05:00
Felix Handte
ba4613602f
Merge pull request #1843 from moozzyk/issue-1637
Take ZSTD_parameters as a const pointer
2019-11-06 16:56:14 -05:00
W. Felix Handte
c13f81905a Fix #1861: Restrict overlapLog Parameter When Not Built With Multithreading
This parameter is unused in single-threaded compression. We should make it
behave like the other multithread-only parameters, for which we only accept
zero when we are not built with multithreading.
2019-11-06 16:05:02 -05:00
Sen Huang
13bb7500e8 Fix frame argument to compression 2019-11-05 16:15:55 -05:00
Sen Huang
f2932fb5eb Fix more merge conflicts 2019-11-05 15:54:05 -05:00
Sen Huang
7ce891870c Fix merge conflicts 2019-11-05 15:51:25 -05:00
Bimba Shrestha
3fb5b106da Replacing some literals with constants 2019-11-05 10:26:57 -08:00
Nick Terrell
60205fec02 Fix 2 bugs in dictionary loading
* Silently skip dictionaries less than 8 bytes, unless using `ZSTD_dct_fullDict`.
  This changes the compressor, which silently skips dictionaries <= 8 bytes.
* Allow repcodes that are equal to the dictionary content size, since it is in bounds.
2019-11-01 16:52:07 -07:00
Sen Huang
b9ede1c8c2 Make sure contentsize is known 2019-10-30 16:03:58 -04:00
Yann Collet
a9a216a846
Merge pull request #1824 from senhuang42/new_path_for_cdict
Avoid using CDict params when input is large.
2019-10-23 12:04:40 -07:00
moozzyk
eda7946a36 Take ZSTD_parameters as a const pointer
Fixes: #1637
2019-10-22 23:21:54 -07:00
Yann Collet
5d5c895b18 fix initCStream_advanced() for fast strategies
Compression ratio of fast strategies (levels 1 & 2)
was seriously reduced, due to accidental disabling of Literals compression.

Credit to @QrczakMK, which perfectly described the issue, and implementation details,
making the fix straightforward.

Example : initCStream with level 1 on synthetic sample P50 :
Before : 5,273,976 bytes
After  : 3,154,678 bytes
ZSTD_compress (for comparison) : 3,154,550

Fix #1787.

To follow : refactor the test which was supposed to catch this issue (and failed)
2019-10-22 15:01:38 -07:00
Sen Huang
c2e1e54f24 ((x or y) or z) == (x or y or z), remove brackets 2019-10-21 19:16:50 -04:00
Sen Huang
0c00455ea6 Merge branch 'dev' of github.com:senhuang42/zstd into new_path_for_cdict 2019-10-21 19:06:51 -04:00
Sen Huang
5b2f4ac1a8 merge 2019-10-21 19:02:52 -04:00
Sen Huang
2ab484a5f9 Fix bad merge 2019-10-21 18:55:17 -04:00
Nick Terrell
919d1d8e93
Merge pull request #1831 from terrelln/zstdmt-bad-memset
[zstdmt] Don't memset the jobDescription
2019-10-21 15:53:57 -07:00
Sen Huang
b6c3459d50 merge 2019-10-21 18:46:17 -04:00
Yann Collet
6cf04c0344
Merge pull request #1834 from facebook/winFix
Windows fixes
2019-10-21 13:45:17 -07:00
Sen Huang
676f89902a Added multiplier, renamed new enum to something more useful 2019-10-21 15:36:12 -04:00
Sen Huang
1f3a51fb52 Updated forceAttachDict param bounds 2019-10-21 15:36:12 -04:00
Sen Huang
8f69c47643 Add enum to decision process 2019-10-21 15:36:12 -04:00
Sen Huang
e4de8b098a Added support for forcing new CDict behavior and updated enum 2019-10-21 15:36:12 -04:00
Sen Huang
9294f4826b Changed to int from BYTE 2019-10-21 15:36:12 -04:00
Sen Huang
f0fccc8847 Changed to int from BYTE 2019-10-21 15:36:12 -04:00
Sen Huang
bb2df8c499 Trailing whitespace 2019-10-21 15:36:12 -04:00
Sen Huang
cf51501d2f Fix test 2019-10-21 15:36:12 -04:00
Sen Huang
ea3cb6988f Cast to BYTE to appease appveyor 2019-10-21 15:36:12 -04:00
Sen Huang
a727a85a7e merge conflicts round 2 2019-10-21 15:36:12 -04:00
Sen Huang
053a35fd64 formatting 2019-10-21 15:35:33 -04:00
Sen Huang
3fa4daaa55 Fix error 2019-10-21 15:35:33 -04:00
Sen Huang
3328348c63 Add compressionlevel to cdict 2019-10-21 15:32:39 -04:00
Felix Handte
cf725630a6
Merge pull request #1795 from felixhandte/workspace-asan
Add Poisoned Redzones to the Workspace When Compiling with ASAN
2019-10-21 12:15:17 -04:00
Sen Huang
e8aa3e486d Updated forceAttachDict param bounds 2019-10-20 22:01:08 -04:00
Sen Huang
6d297265f9 Add enum to decision process 2019-10-20 19:02:47 -04:00
Sen Huang
1daa898c93 Added support for forcing new CDict behavior and updated enum 2019-10-20 14:03:09 -04:00
Nick Terrell
0bc39bc3a0 [zstdmt] Don't memset the jobDescription 2019-10-18 15:05:51 -07:00
Yann Collet
1795133c45 refactored FIO_compressMultipleFilenames() prototype
for consistency
2019-10-17 15:32:03 -07:00
Yann Collet
6323966e53 updated erroneous comments using ZSTD_dm_*
instead of the current ZSTD_dct_*,
reported by @nigeltao (#1822)
2019-10-16 16:14:04 -07:00
Sen Huang
4455f00cb8 Changed to int from BYTE 2019-10-16 15:06:02 -04:00
Sen Huang
4f7d26b0ee Changed to int from BYTE 2019-10-16 15:05:29 -04:00
Sen Huang
cf00ea367a Trailing whitespace 2019-10-16 10:31:27 -04:00
Sen Huang
8cb2174446 Fix test 2019-10-16 10:29:31 -04:00
Sen Huang
5e901b6f32 Cast to BYTE to appease appveyor 2019-10-15 13:58:44 -04:00
Sen Huang
5c010c9d2d merge conflicts round 2 2019-10-15 13:10:05 -04:00
Sen Huang
a06b51879c merge conflict 2019-10-15 12:58:50 -04:00
Sen Huang
23dac23a49 formatting 2019-10-15 12:44:48 -04:00
Sen Huang
0c8df5c928 Fix error 2019-10-15 12:28:23 -04:00
Sen Huang
a65eb39f9d Add compressionlevel to cdict 2019-10-15 10:22:06 -04:00
Yann Collet
fb77afc626
Merge pull request #1760 from bimbashrestha/extract_sequences_api
Adding api for extracting sequences from seqstore
2019-10-10 13:11:18 -07:00
W. Felix Handte
ede31da2ea Fix CCtx Size Estimation 2019-10-10 15:02:08 -04:00
W. Felix Handte
bd6a20b8a0 Expand Default Redzone Size 2019-10-10 13:45:55 -04:00
W. Felix Handte
2c80a9f8ac Check if CCtx in Workspace after Null Check 2019-10-10 13:40:16 -04:00
W. Felix Handte
0ffae7e440 Stop Allocating Extra Space for Table Redzones 2019-10-10 13:40:16 -04:00
W. Felix Handte
a07037b784 Don't Try to Redzone the Tables 2019-10-10 13:40:16 -04:00
W. Felix Handte
0cc481ef66 Fix Workspace Size Calculation 2019-10-10 13:40:16 -04:00
W. Felix Handte
b6c0a02a17 Fix ZSTD_sizeof_matchState() Calculation 2019-10-10 13:40:16 -04:00
W. Felix Handte
8cffd6ed08 Avoid ASAN Failure in ZSTD_cwksp_free() 2019-10-10 13:40:16 -04:00
W. Felix Handte
ef0b5707c5 Refactor Freeing CCtxes / CDicts Inside Workspaces 2019-10-10 13:40:16 -04:00
W. Felix Handte
143b296cf6 Surround Workspace Allocs with Dead Zone 2019-10-10 13:40:16 -04:00
W. Felix Handte
19a0955ec9 Add ZSTD_cwksp_alloc_size() to Help Calculate Needed Workspace Size 2019-10-10 13:40:16 -04:00
W. Felix Handte
da88c35d41 Stop Assuming Tables are Adjacent 2019-10-10 13:40:16 -04:00
W. Felix Handte
35c30d6ca7 Poison Unused Workspace Memory 2019-10-10 13:40:16 -04:00
Bimba Shrestha
36528b96c4 Manually moving instead of memcpy on decoder and using genBuffer() 2019-10-03 09:26:51 -07:00
Bimba Shrestha
61ec4c2e7f Cleaning sequence parsing logic 2019-10-03 06:42:40 -07:00
Bimba Shrestha
c04245b257 Replacing assert with memory_allocation error code throw 2019-09-23 15:42:16 -07:00
Bimba Shrestha
be0bebd24e Adding test and null check for malloc 2019-09-23 15:08:18 -07:00
Nick Terrell
5cb7615f1f Add UNUSED_ATTR to ZSTD_storeSeq() 2019-09-20 21:37:13 -07:00
Nick Terrell
5dc0a1d659 HINT_INLINE ZSTD_storeSeq()
Clang on Mac wasn't inlining `ZSTD_storeSeq()` in level 1, which was
causing a 5% performance regression. This fixes it.
2019-09-20 16:39:27 -07:00
Bimba Shrestha
f3c4fd17e3 Passing in dummy dst buffer of compressbound(srcSize) 2019-09-20 15:50:58 -07:00
Nick Terrell
44c65da97e Remove literals overread in ZSTD_storeSeq() for ~neutral perf 2019-09-20 12:23:25 -07:00
Nick Terrell
fde217df04 Fix bounds check in ZSTD_storeSeq() 2019-09-20 08:25:12 -07:00
Nick Terrell
67b1f5fc72 Fix too strict assert 2019-09-20 01:23:35 -07:00
Nick Terrell
ddab2a94e8 Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy() 2019-09-20 00:56:20 -07:00
Nick Terrell
efd37a64ea Optimize decompression and fix wildcopy overread
* Bump `WILDCOPY_OVERLENGTH` to 16 to fix the wildcopy overread.
* Optimize `ZSTD_wildcopy()` by removing unnecessary branches and
  unrolling the loop.
* Extract `ZSTD_overlapCopy8()` into its own function.
* Add `ZSTD_safecopy()` for `ZSTD_execSequenceEnd()`. It is
  optimized for single long sequences, since that is the important
  case that can end up in `ZSTD_execSequenceEnd()`. Without this
  optimization, decompressing a block with 1 long match goes
  from 5.7 GB/s to 800 MB/s.
* Refactor `ZSTD_execSequenceEnd()`.
* Increase the literal copy shortcut to 16.
* Add a shortcut for offset >= 16.
* Simplify `ZSTD_execSequence()` by pushing more cases into
  `ZSTD_execSequenceEnd()`.
* Delete `ZSTD_execSequenceLong()` since it is exactly the
  same as `ZSTD_execSequence()`.

clang-8 seeds +17.5% on silesia and +21.8% on enwik8.
gcc-9 sees +12% on silesia and +15.5% on enwik8.

TODO: More detailed measurements, and on more datasets.

Crdit to OSS-Fuzz for finding the wildcopy overread.
2019-09-19 21:07:14 -07:00
Bimba Shrestha
ae6d0e64ae Addressing comments 2019-09-19 15:25:20 -07:00
Yann Collet
bfff5b30a4
Merge pull request #1756 from mgrice/dev
Improvements in zstd decode performance
2019-09-18 11:35:50 -07:00
Yann Collet
243200e5bf minor refactor of ZSTD_fast
- reduced variables lifetime
- more accurate code comments
2019-09-17 14:02:57 -07:00
Bimba Shrestha
76fea3fb99 Resolving appveyor test failure implicit conversion 2019-09-16 14:02:23 -07:00
Bimba Shrestha
a874435478
Merge branch 'dev' into extract_sequences_api 2019-09-16 13:29:59 -07:00
Bimba Shrestha
bff6072e3a Bailing early when collecting sequences and documentation 2019-09-16 08:26:21 -07:00
W. Felix Handte
20c69077d1 Shrink Table Valid End During Alloc Alignment / Phase Change 2019-09-11 17:14:59 -04:00
W. Felix Handte
51d90668ba Add Assertions to Confirm that Workspace Pointers are Correctly Ordered 2019-09-11 17:14:59 -04:00
W. Felix Handte
a10c191613 __msan_poison() Workspace When Preparing for Re-Use 2019-09-11 17:14:45 -04:00
W. Felix Handte
7c57e2b9ca Zero h3size When h3log is 0
This led to a nasty edgecase, where index reduction for modes that don't use
the h3 table would have a degenerate table (size 4) allocated and marked clean,
but which would not be re-indexed.
2019-09-11 13:14:26 -04:00
W. Felix Handte
bc020eec92 Also Shrink Clean Table Area When Reducing Indices 2019-09-11 11:40:57 -04:00
W. Felix Handte
1999b2ed9b Update DEBUGLOG Statements 2019-09-11 11:21:00 -04:00
W. Felix Handte
13e29a56de Shrink Clean Table Area When Copying Table Contents into Context
The source matchState is potentially at a lower current index, which means
that any extra table space not overwritten by the copy may now contain
invalid indices. The simple solution is to unconditionally shrink the valid
table area to just the area overwritten.
2019-09-11 11:18:45 -04:00
W. Felix Handte
edb3ad053e Comments 2019-09-10 18:25:45 -04:00
W. Felix Handte
f31ef28ff8 Only Reset Indexing in ZSTD_resetCCtx_internal() When Necessary 2019-09-10 18:25:45 -04:00
W. Felix Handte
9968a53e91 Remove No-Longer-Used Continuation Functions 2019-09-10 18:25:45 -04:00
W. Felix Handte
1b28e80416 Remove Fast Continue Path in ZSTD_resetCCtx_internal() 2019-09-10 18:25:45 -04:00
W. Felix Handte
ad16eda5e4 ZSTD_reset_matchState Optionally Doesn't Restart Indexing 2019-09-10 18:25:45 -04:00
W. Felix Handte
5b10bb5ec3 Rename ZSTD_compResetPolicy_e Values and Add Comment 2019-09-10 18:25:45 -04:00
W. Felix Handte
0492b9a9ec Accept ZSTD_indexResetPolicy_e Param in ZSTD_reset_matchState() 2019-09-10 18:25:45 -04:00
W. Felix Handte
14c5471d5e Introduce ZSTD_indexResetPolicy_e Enum 2019-09-10 18:25:45 -04:00
W. Felix Handte
17b6da2e0f Track Usable Table Space in Compression Workspace 2019-09-10 18:25:37 -04:00
Yann Collet
22bd158e0f
Merge pull request #1712 from felixhandte/workspace-efficiency-2
Allocate Internal Buffers via Workspace Abstraction
2019-09-10 15:20:29 -07:00
Bimba Shrestha
1407919d13 Addressing comments on parsing 2019-09-10 15:10:50 -07:00
Bimba Shrestha
47199480da Cleaning up parsing per suggestion 2019-09-10 13:18:59 -07:00
W. Felix Handte
a9d373f093 Remove Empty lib/compress/zstd_cwksp.c 2019-09-10 16:03:13 -04:00
Yann Collet
41416f0927
Merge pull request #1773 from bimbashrestha/rle_first_block_decompression_fix
Removing redundant condition in decompression, making first block rle…
2019-09-10 11:17:29 -07:00
Bimba Shrestha
e3c5825918 Fizing litLength == 0 case 2019-09-10 10:38:13 -07:00
Bimba Shrestha
9e7bb55e14 Addressing comments 2019-09-09 20:04:46 -07:00
W. Felix Handte
81208fd7c2 Forward Declare ZSTD_cwksp_available_space to Fix Build 2019-09-09 19:10:09 -04:00
W. Felix Handte
91bf1babd1 Inline Workspace Functions 2019-09-09 18:53:53 -04:00
W. Felix Handte
0db3ffe7ee Forward resetCCtx Errors when Using CDict 2019-09-09 16:47:19 -04:00
W. Felix Handte
eb6f69d978 Fix sizeof_CCtx and sizeof_CDict Calculations for Statically Init'ed Objects 2019-09-09 16:45:17 -04:00
W. Felix Handte
e3703825a8 Fix workspaceTooSmall Calculation 2019-09-09 15:12:14 -04:00
W. Felix Handte
0a65a67901 Shorten &zc->workspace -> ws in ZSTD_resetCCtx_internal() 2019-09-09 14:59:09 -04:00
W. Felix Handte
1120e4d962 Clean Up TODOs and Comments pt. II 2019-09-09 14:04:39 -04:00
W. Felix Handte
c60e1c3be5 Nit 2019-09-09 13:34:08 -04:00
W. Felix Handte
7d7b665c90 Pull Phase Advance Logic Out into Internal Function 2019-09-09 13:34:08 -04:00
W. Felix Handte
8549ae9f1d Hide Workspace Movement Behind Helper Function 2019-09-09 13:34:08 -04:00
W. Felix Handte
2405c03bcd Fix DEBUGLOG Statement Levels 2019-09-09 13:34:08 -04:00
W. Felix Handte
7100d24221 Fix Rescale Continue Special Case 2019-09-09 13:34:08 -04:00
W. Felix Handte
7321e4c9f3 Remove Unused noRealloc CRP Value 2019-09-09 13:34:08 -04:00
W. Felix Handte
901bba4ca6 Re-Implement Workspace Shrinking when Oversized 2019-09-09 13:34:08 -04:00
W. Felix Handte
881bcd80ca Cleanup from Move 2019-09-09 13:34:08 -04:00
W. Felix Handte
b511a84adc Move Workspace Functions to Their Own File 2019-09-09 13:34:08 -04:00
W. Felix Handte
077a2d7dc9 Rename 2019-09-09 13:34:08 -04:00
W. Felix Handte
ebd162194f Clean Up TODOs and Comments 2019-09-09 13:34:08 -04:00
W. Felix Handte
2abe0145b1 Improve Comments a Bit 2019-09-09 13:34:08 -04:00
W. Felix Handte
7a2416a863 Allocate CDict in Workspace (Rather than in Separate Allocation) 2019-09-09 13:34:08 -04:00
W. Felix Handte
65057cf009 Rewrite ZSTD_initStaticCCtx to Alloc CCtx in Workspace 2019-09-09 13:34:08 -04:00
W. Felix Handte
58b69ab15c Only the CCtx Itself Needs to be Cleared during Static CCtx Init 2019-09-09 13:34:08 -04:00
W. Felix Handte
88c2fcd0ee Align Alloc Pointer When Transitioning from Buffers to Aligned Allocs 2019-09-09 13:34:08 -04:00
W. Felix Handte
e936b73889 Remove Overly-Restrictive Assert 2019-09-09 13:34:08 -04:00
W. Felix Handte
75d574368b When Loading Dict By Copy, Always Put it in the Workspace 2019-09-09 13:34:08 -04:00
W. Felix Handte
e69b67e33a Alloc Tables Separately 2019-09-09 13:34:08 -04:00
W. Felix Handte
6177354b36 Begin Introducing Phases 2019-09-09 13:34:08 -04:00
W. Felix Handte
786f2266bb TMP 2019-09-09 13:34:08 -04:00
W. Felix Handte
c25283cf00 Disambiguate 'workspace' and 'entropyWorkspace' 2019-09-09 13:34:08 -04:00
W. Felix Handte
ccaac852e8 Normalize Case 'workSpace' -> 'workspace' 2019-09-09 13:27:18 -04:00
Bimba Shrestha
44e122053b Mentioning cli only in the comment as suggested 2019-09-06 14:48:41 -07:00
Bimba Shrestha
a917cd597d Put back omission for first rle block and updated comment as suggested 2019-09-06 13:44:25 -07:00
Bimba Shrestha
d687d603e4 Removing redundant condition in decompression, making first block rles valid to deocmpress 2019-09-06 10:46:19 -07:00
Varun S Nair
9816560649 Fixing assert and DEBUGLOG due to ZSTD_CCtx_params parameter change to const pointer 2019-09-05 15:47:17 +05:30
Varun S Nair
771645471f Passing ZSTD_CCtx_params by const pointer 2019-09-05 15:28:30 +05:30
Bimba Shrestha
5f8b0f6890 Changing api to get sequences across all blocks 2019-08-30 09:18:44 -07:00
Yann Collet
5198347382
Merge pull request #1744 from bimbashrestha/dev
Generate RLE blocks in the encoder
2019-08-29 15:19:10 -07:00
Bimba Shrestha
623b90f85d Fixing ci-circle test complaints 2019-08-29 13:09:42 -07:00
Bimba Shrestha
ece465644b Adding api for extracting sequences from seqstore 2019-08-29 12:29:39 -07:00
mgrice
b830599582 Improvements in zstd decode performance
Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3).  This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.

Speedup on Xeon E5-2680v4 is in the range of 3-5%.

Measured wildcopy length distributions on silesia.tar:

level	<=8	<=16	<=24	>24
1	78.05%	11.49%	3.52%	6.94%
3	82.14%	8.99%	2.44%	6.43%
6	85.81%	6.51%	2.92%	4.76%
8	83.02%	7.31%	3.64%	6.03%
10	84.13%	6.67%	3.29%	5.91%
15	77.58%	7.55%	5.21%	9.66%
16	80.07%	7.20%	3.98%	8.75%

Test Plan: benchmark silesia, make check
2019-08-29 12:25:56 -07:00
Bimba Shrestha
c3e3c8bf32 Undoing the last commit (that was an accident) 2019-08-29 12:05:47 -07:00
bimbashrestha
4a1ca5e0a8 Adding method for extracting sequences. 2019-08-29 11:55:12 -07:00
bimbashrestha
e5704bbfdf Added test for multiple blocks of zeros and fixed nit about comments 2019-08-28 08:32:34 -07:00
bimbashrestha
96201d9774 Added bool to cctx and fixed some comment nits 2019-08-26 15:30:41 -07:00
bimbashrestha
991cbc9024 Fixing mixed declaration compiler complaint 2019-08-26 15:00:50 -07:00
bimbashrestha
ce264ce53b Forbiding emission of RLE when its the first block 2019-08-26 14:54:29 -07:00
bimbashrestha
33b6446ca7 Removing accidental method call 2019-08-26 14:34:43 -07:00
bimbashrestha
7b041b552e Removing assert for rle that doesn't always hold 2019-08-26 12:26:53 -07:00
bimbashrestha
1f2bf77f2a Using typedef U32 instead of int 2019-08-26 09:00:22 -07:00
bimbashrestha
ba46932492 Removing implicit conversion from const void* to const BYTE* and added constant for threshold 2019-08-26 08:51:34 -07:00
bimbashrestha
0e3ba02cf1 Fixing more test falure errors 2019-08-22 13:54:41 -07:00
bimbashrestha
4faf3a5911 Fixing ci-circle test failure issues 2019-08-22 13:46:15 -07:00
bimbashrestha
cba5350f88 Moving RLE logic to inside ZSTD_compressBlock_internal and adding assert 2019-08-22 12:12:44 -07:00
Nick Magerko
493f95c7df Fix merge conflicts 2019-08-22 11:51:41 -07:00
bimbashrestha
4c90d862e3 Generate RLE blocks in the encoder 2019-08-22 11:27:20 -07:00
Nick Magerko
c7a24d7a14 Define ZSTD_SRCSIZEHINT_MIN as 0 2019-08-20 13:06:15 -07:00
Nick Magerko
2d39b43906 Use int for srcSizeHint when sensible 2019-08-19 16:49:25 -07:00
Nick Magerko
edf2abf106 Fix fall-through case 2019-08-19 12:32:43 -07:00
Nick Magerko
dffbac5f89 Add --size-hint=# option 2019-08-19 11:38:49 -07:00
Yann Collet
782bfb858a fixed very minor inefficiency (nbSeq==127)
The nbSeq "short" format (1-byte)
is compatible with any value < 128.

However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.

Spotted by Intel's Smita Kumar.
2019-08-15 16:41:34 +02:00
Yann Collet
facbe8b2c2 factored the logic selecting lowest match index
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
Yann Collet
0b0b83e8f3 fix test 122
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet
98e7c344cd fixed strategies btopt+ 2019-08-02 14:42:53 +02:00
Yann Collet
b4257b04e7 fixed strategy btlazy2 2019-08-02 14:26:26 +02:00
Yann Collet
5cf1b24aca fixed strategies greedy, lazy & lazy2
restore dictionary compression ratio
2019-08-02 14:21:39 +02:00
Yann Collet
98692c2838 fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3 2019-08-01 15:58:17 +02:00
Yann Collet
be3d2e2de8
Merge pull request #1679 from ephiepark/dev
Restructure the source files
2019-07-19 15:29:07 -07:00
Ephraim Park
1dc98de279 Restructure the source files 2019-07-15 17:39:18 -07:00
Yann Collet
8fb08b68cc
Merge pull request #1681 from facebook/level3
updated double_fast complementary insertion
2019-07-12 16:16:06 -07:00
Nick Terrell
75cfe1dc69
[ldm] Fix bug in overflow correction with large job size (#1678)
* [ldm] Fix bug in overflow correction with large job size

* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

* [test] Add test that exposes the bug

Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
2019-07-12 18:45:18 -04:00
Yann Collet
eaeb7f00b5 updated the _extDict variant of double fast 2019-07-12 14:17:17 -07:00
Yann Collet
e8a7f5d3ce double-fast: changed the trade-off for a smaller positive change
same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).
2019-07-12 11:34:53 -07:00
mgrice
812e8f2a16 perf improvements for zstd decode (#1668)
* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
 via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Yann Collet
d1327738c2 updated double_fast complementary insertion
in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).

More details in the PR.
2019-07-11 15:25:22 -07:00
Yann Collet
b01c1c679f
Merge pull request #1675 from ephiepark/dev
Factor out the logic to build sequences
2019-07-10 13:32:31 -07:00
Yann Collet
096714d1b8
Merge pull request #1671 from ephiepark/dev
Adding targetCBlockSize param
2019-07-03 17:47:44 -07:00
Ephraim Park
f57ac7b09e Factor out the logic to build sequences 2019-07-03 15:42:38 -07:00
Ephraim Park
9007701670 Adding targetCBlockSize param 2019-07-03 15:41:52 -07:00
Nick Terrell
6c92ba774e
ZSTD_compressSequences_internal assert op <= oend (#1667)
When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.
2019-07-02 15:45:47 -07:00
Yann Collet
857e608b51
Merge pull request #1658 from facebook/memset
memset() rather than reduceIndex()
2019-07-01 15:01:43 -07:00
Yann Collet
621adde3b2 changed naming to ZSTD_indexTooCloseToMax()
Also : minor speed optimization :
shortcut to ZSTD_reset_matchState() rather than the full reset process.
It still needs to be completed with ZSTD_continueCCtx() for proper initialization.

Also : changed position of LDM hash tables in the context,
so that the "regular" hash tables can be at a predictable position,
hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.
2019-06-24 14:39:29 -07:00
Yann Collet
45c9fbd6d9 prefer memset() rather than reduceIndex() when close to index range limit
by disabling continue mode when index is close to limit.
2019-06-21 16:19:21 -07:00
Yann Collet
944e2e9e12 benchfn : added macro macro CONTROL()
like assert() but cannot be disabled.
proper separation of user contract errors (CONTROL())
and invariant verification (assert()).
2019-06-21 15:58:55 -07:00
Nick Terrell
674534a700 [zstd] Fix data corruption in niche use case
* Extract the overflow correction into a helper function.
* Load the dictionary `ZSTD_CHUNKSIZE_MAX = 512 MB` bytes at a time
  and overflow correct between each chunk.

Data corruption could happen when all these conditions are true:

* You are using multithreading mode
* Your overlap size is >= 512 MB (implies window size >= 512 MB)
* You are using a strategy >= ZSTD_btlazy
* You are compressing more than 4 GB

The problem is that when loading a large dictionary we don't do
overflow correction. We can only load 512 MB at a time, and may
need to do overflow correction before each chunk.
2019-06-21 15:47:31 -07:00
Nick Terrell
4156060ca4 [zstdmt] Update assert to use ZSTD_WINDOWLOG_MAX 2019-06-21 15:39:33 -07:00
Nick Terrell
95e2b430ea [opt] Add asserts for corruption in ZSTD_updateTree() 2019-06-21 15:22:29 -07:00
Yann Collet
9af909bf35
Merge pull request #1624 from facebook/smallwlog
Improves compression ratio for small windowLog
2019-06-14 17:28:21 -07:00
Nick Terrell
cdb9481e38 [libzstd] Optimize ZSTD_insertBt1() for repetitive data
We would only skip at most 192 bytes at a time before this diff.
This was added to optimize long matches and skip the middle of the
match. However, it doesn't handle the case of repetitive data.

This patch keeps the optimization, but also handles repetitive data
by taking the max of the two return values.

```
> for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 | command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done
strategy=1
0.27
strategy=2
0.23
strategy=3
0.27
strategy=4
0.43
strategy=5
0.56
strategy=6
0.43
strategy=7
0.34
strategy=8
0.34
strategy=9
0.35
```

At level 19 with multithreading the compressed size of `silesia.tar` regresses 300 bytes, and `enwik8` regresses 100 bytes.
In single threaded mode `enwik8` is also within 100 bytes, and I didn't test `silesia.tar`.

Fixes Issue #1634.
2019-06-05 20:34:00 -07:00
Yann Collet
80d6ccea79 removed UINT32_MAX
apparently not guaranteed on all platforms,
replaced by UINT_MAX.
2019-05-31 17:27:07 -07:00
Yann Collet
fce4df3ab7 fixed wrong assert in double_fast 2019-05-31 17:06:28 -07:00
Yann Collet
a968099038 minor code cleaning for new index invalidation strategy 2019-05-31 16:52:37 -07:00
Yann Collet
d605f482c7 make double_fast compatible with new index invalidation strategy 2019-05-31 16:50:04 -07:00
Yann Collet
a30febaeeb Made fast strategy compatible with new offset validation strategy
fast mode does the same thing as before :
it pre-emptively invalidates any index that could lead to offset > maxDistance.
It's supposed to help speed.

But this logic is performed inside zstd_fast,
so that other strategies can select a different behavior.
2019-05-31 16:34:55 -07:00
Yann Collet
58adb1059f extended exact window size to greedy/lazy modes 2019-05-31 16:08:48 -07:00
Yann Collet
bc601bdc6d first implementation of small window size for btopt
noticeably improves compression ratio
when window size is small (< 18).

enwik7	level 19

windowLog	`dev`	`smallwlog`	improvement
23	3.577	3.577	0.02%
22	3.536	3.538	0.06%
21	3.462	3.467	0.14%
20	3.364	3.377	0.39%
19	3.244	3.272	0.86%
18	3.110	3.166	1.80%
17	2.843	3.057	7.53%
16	2.724	2.943	8.04%
15	2.594	2.822	8.79%
14	2.456	2.686	9.36%
13	2.312	2.523	9.13%
12	2.162	2.361	9.20%
11	2.003	2.182	8.94%
2019-05-31 15:55:12 -07:00
Yann Collet
b13a9207f9
Merge pull request #1623 from facebook/fullbench
fullbench minor improvements
2019-05-31 14:40:19 -07:00
Yann Collet
ed38b645db fullbench: pass proper parameters in scenario 43 2019-05-29 15:26:06 -07:00
Yann Collet
9719fd616c removed nextToUpdate3 from ZSTD_window
it's now a local variable of ZSTD_compressBlock_opt()
2019-05-28 16:18:12 -07:00
Yann Collet
33dabc8c80 get bt matches : made it a bit clearer which parameters are input and output 2019-05-28 16:11:32 -07:00
Yann Collet
327cf6fac1 nextToUpdate3 does not need to be maintained outside of zstd_opt.c
It's re-synchronized with nextToUpdate at beginning of each block.
It only needs to be tracked from within zstd_opt block parser.

Made the logic clear, so that no code tried to maintain this variable.

An even better solution would be to make nextToUpdate3
an internal variable of ZSTD_compressBlock_opt_generic().
That would make it possible to remove it from ZSTD_matchState_t,
thus restricting its visibility to only where it's actually useful.

This would require deeper changes though,
since the matchState is the natural structure to transport parameters into and inside the parser.
2019-05-28 15:26:52 -07:00
Yann Collet
6453f8158f complementary code comments
on variables used / impacted during maxDist check
2019-05-28 14:12:16 -07:00
Yann Collet
4baecdf72a added comments to better understand enforceMaxDist() 2019-05-28 13:15:48 -07:00
Nick Terrell
a17fe4c9e5 [visual] Fix unreachable code warning 2019-04-16 11:32:35 -07:00
Nick Terrell
de0499f7fa [libzstd] Require ZSTD_MULTITHREAD to create a ZSTDMT_CCtx
ZSTDMT was broken when compiled without ZSTD_MULTITHREAD defined,
because `ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, nbWorkerss)`
failed. It was detected by the MSVC test which runs the fuzzer with
multithreading disabled.

This is a very niche use case of a deprecated API, because the API is
inefficient and synchronous, since `threading.h` will be synchronous.
Users almost certainly don't want this, and anyone who tested their code
should realize that it is broken. Therefore, I think it is safe to
require `ZSTD_MULTITHREAD` to be defined to use ZSTDMT.
2019-04-15 23:04:46 -07:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell
48a6427d22 [libzstd] Fix ZSTD_compress2() for multithreaded compression
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.

Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`

* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
  `ZSTD_compressBound()` output space.
2019-04-09 16:24:17 -07:00
Nick Terrell
641e594309 [libzstd] Remove ZSTDMT from the shared object
* Remove ZSTDMT from the shared object by default.
* Provide a macro `ZSTD_LEGACY_MULTITHREADED_API` to override it.
* Document it in `lib/README.md`.
2019-04-07 18:47:52 -07:00
Nick Terrell
72a3fbc0e4
Merge pull request #1562 from terrelln/2fast
[libzstd] Speed up single segment zstd_fast by 5%
2019-04-03 18:08:15 -07:00
Nick Terrell
95624b77e4 [libzstd] Speed up single segment zstd_fast by 5%
This PR is based on top of PR #1563.

The optimization is to process two input pointers per loop.
It is based on ideas from [igzip] level 1, and talking to @gbtucker.

| Platform                | Silesia     | Enwik8 |
|-------------------------|-------------|--------|
| OSX clang-10            | +5.3%       | +5.4%  |
| i9 5 GHz gcc-8          | +6.6%       | +6.6%  |
| i9 5 GHz clang-7        | +8.0%       | +8.0%  |
| Skylake 2.4 GHz gcc-4.8 | +6.3%       | +7.9%  |
| Skylake 2.4 GHz clang-7 | +6.2%       | +7.5%  |

Testing on all Silesia files on my Intel i9-9900k with gcc-8

| Silesia File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| silesia.tar  | +0.17%       | +6.6%        |
| dickens      | +0.25%       | +7.0%        |
| mozilla      | +0.02%       | +6.8%        |
| mr           | -0.30%       | +10.9%       |
| nci          | +1.28%       | +4.5%        |
| ooffice      | -0.35%       | +10.7%       |
| osdb         | +0.75%       | +9.8%        |
| reymont      | +0.65%       | +4.6%        |
| samba        | +0.70%       | +5.9%        |
| sao          | -0.01%       | +14.0%       |
| webster      | +0.30%       | +5.5%        |
| xml          | +0.92%       | +5.3%        |
| x-ray        | -0.00%       | +1.4%        |

Same tests on Calgary. For brevity, I've only included files
where compression ratio regressed or was much better.

| Calgary File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| calgary.tar  | +0.30%       | +7.1%        |
| geo          | -0.14%       | +25.0%       |
| obj1         | -0.46%       | +15.2%       |
| obj2         | -0.18%       | +6.0%        |
| pic          | +1.80%       | +9.3%        |
| trans        | -0.35%       | +5.5%        |

We gain 0.1% of compression ratio on Silesia.
We gain 0.3% of compression ratio on enwik8.
I also tested on the GitHub and hg-commands datasets without a dictionary,
and we gain a small amount of compression ratio on each, as well as speed.

I tested the negative compression levels on Silesia on my
Intel i9-9900k with gcc-8:

| Level | Ratio Change | Speed Change |
|-------|--------------|--------------|
| -1    | +0.13%       | +6.4%        |
| -2    | +4.6%        | -1.5%        |
| -3    | +7.5%        | -4.8%        |
| -4    | +8.5%        | -6.9%        |
| -5    | +9.1%        | -9.1%        |

Roughly, the negative levels now scale half as quickly. E.g. the new
level 16 is roughly equivalent to the old level 8, but a bit quicker
and smaller.  If you don't think this is the right trade off, we can
change it to multiply the step size by 2, instead of adding 1. I think
this makes sense, because it gives a bit slower ratio decay.

[igzip]: https://github.com/01org/isa-l/tree/master/igzip
2019-04-02 19:02:50 -07:00