Sen Huang
e34332834a
Clean up various functions, add debuglogging for estimate vs. actual sizes
2021-03-24 08:21:29 -07:00
Sen Huang
41c3eae6d9
Fix various fuzzer failures: repcode history, superblocks
2021-03-24 08:21:29 -07:00
senhuang42
0633bf17c3
Change 1.3.4 bugfix to be cross-compatible with superblocks and normal compression
2021-03-24 08:21:29 -07:00
senhuang42
eb1ee8686d
Refactor buildSequencesStatistics() to avoid pointer increment for superblocks
2021-03-24 08:21:29 -07:00
senhuang42
e2bb215117
Add unit tests and fuzzer param
2021-03-24 08:21:09 -07:00
senhuang42
de52de1347
Add recursive block split algorithm
2021-03-24 08:21:09 -07:00
senhuang42
f06f6626ed
Update function names for consistency
2021-03-24 08:20:54 -07:00
senhuang42
c56d6e49e8
Add block splitter to experimental params
2021-03-24 08:20:54 -07:00
senhuang42
2949a95224
Refactor block compression logic into single function
2021-03-24 08:20:54 -07:00
senhuang42
c05c090cc2
Centralize entropy statistics calculations to zstd_compress.c
2021-03-24 08:20:29 -07:00
sen
c48889f097
Merge pull request #2538 from senhuang42/monotonicity_test
...
Add memory monotonicity test over srcSize
2021-03-22 16:54:34 -04:00
Sen Huang
dff4a0e867
Make ZSTD_estimateCCtxSize_internal() loop through all srcSize parameter sets as well
2021-03-21 16:15:31 -07:00
ihsinme
a5bf09d764
simple fix for using bit operator.
...
good day.
It seems to me that the developer intended to use a logical operator.
so I suggest a simple fix.
2021-03-17 11:37:42 +03:00
Sen Huang
77ae664ba6
Fix ZSTD_dedicatedDictSearch_isSupported() requirements
2021-03-16 17:36:05 -07:00
senhuang42
386111adec
Add a nbSeq argument to compressSequences()
...
Refactor ZSTD_compressBlock_internal() to do the block header write within and add nbSeq argument to compressSequences()
2021-03-16 14:04:22 -07:00
senhuang42
98764493cf
Move block header write into compressBlock_internal()
2021-03-16 14:04:22 -07:00
Nick Terrell
cd1551d261
[lib][tracing] Add ZSTD_NO_TRACE macro
...
When defined, it disables tracing, and avoids including the header.
2021-03-16 11:47:27 -07:00
Nick Terrell
b5fd348a85
Merge pull request #2523 from terrelln/huf-stack-reduction
...
Add HUF_writeCTable_wksp() function
2021-03-05 12:35:09 -08:00
Nick Terrell
5df2a21f1e
Add HUF_writeCTable_wksp() function
...
This saves ~700 bytes of stack space in HUF_writeCTable.
2021-03-05 10:29:18 -08:00
Nick Terrell
27498ff00f
Reduce stack usage of ZSTD_buildCTable()
...
It is a stack high-point for some compression strategies and has an easy
fix. This moves the normalized count into the entropy workspace.
2021-03-04 16:12:11 -08:00
Nick Terrell
7736549bea
[bug-fix] Make simple single-pass functions ignore advanced parameters
...
The simple compression functions are intended to ignore the advanced
parameters, but they were accidentally using them. All the
`ZSTD_parameters` were set correctly, but any extra parameters were
used as-is. E.g. `ZSTD_c_format`.
This PR makes all the simple single-pass functions listed below ignore
the advanced parameters, as intended.
* `ZSTD_compressCCtx()`
* `ZSTD_compress_usingDict()`
* `ZSTD_compress_usingCDict()`
* `ZSTD_compress_advanced()`
* `ZSTD_compress_usingCDict_advanced()`
It also adds a test case that ensures that each of these functions
ignore the advanced parameters.
2021-02-12 19:11:23 -08:00
Nick Terrell
c62eb05964
[lib] Set appliedParams.compressionLevel correctly
...
Forward the correct compressionLevel to the appliedParams in all cases.
It was already correct for the advanced API, so only the old single-pass
functions needed to be fixed.
This compression level is unused by the library, but is set so that the
tracing framework can consume it.
2021-02-12 15:00:14 -08:00
Nick Terrell
f520f6dfbe
[trace] Minor fixes found during integration
...
* Mark `ZSTD_CCtx_getParameter()` as const
* Add `extern "C"` guards to `zstd_trace.h`
2021-02-11 16:20:04 -08:00
Yann Collet
8884cb887d
Merge pull request #2483 from mpu/ldmgear
...
New algorithms for the long distance matcher
2021-02-11 08:38:23 -08:00
Quentin Carbonneaux
552efcac2d
relocate large arrays from the stack to ldmState_t
2021-02-10 16:16:54 +01:00
Nick Terrell
e59c9459a5
[trace] Keep track of a uint64_t tracing context
...
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.
This keeps the simple case simple.
2021-02-09 11:37:05 -08:00
Quentin Carbonneaux
e2ad174d73
fix some compiler warnings
2021-02-08 20:19:16 +01:00
Nick Terrell
54a4998a80
Add basic tracing functionality
2021-02-05 16:28:52 -08:00
Yann Collet
b9748757b0
fixed minor cast warning
2021-02-05 09:55:54 -08:00
Quentin Carbonneaux
874a590e5c
deal safely with short inputs in ZSTD_ldm_generateSequences
...
The fuzzer CI found this bug.
2021-02-04 11:15:24 +01:00
Quentin Carbonneaux
9f327c02fd
new core ldm algorithm
2021-02-03 22:24:07 +01:00
Quentin Carbonneaux
aee3dc877f
fix a variable name to reflect its nature
2021-01-22 02:24:19 -08:00
Quentin Carbonneaux
d6e3de77dc
fix warning and remove one more occurrence of makeEntryAndInsertByTag
2021-01-20 01:39:16 -08:00
Quentin Carbonneaux
e0d5eca8fa
fix forgotten numTagBits in getTagMask
2021-01-20 00:54:20 -08:00
Quentin Carbonneaux
1e65711ca5
a couple performance improvement changes for ldm
2021-01-20 00:54:20 -08:00
Thomas Waldmann
92a2b5ccc9
fixup: lits means literals
2021-01-07 23:30:42 +01:00
Thomas Waldmann
f9802d80a0
fix typos (work done by Andrea Gelmini)
2021-01-07 18:47:23 +01:00
Nick Terrell
58476bcf7f
Don't shrink window log in ZSTD_getCParams()
...
Treat ZSTD_getCParams() and ZSTD_adjustCParams() in the same way
we treat streaming compression. Choose parameters based on the
dictionary size + source size, and assume the source size is small
if unkown. But, don't shrink the window log down in
ZSTD_adjustCParams_internal().
2021-01-04 15:54:09 -08:00
Nick Terrell
9d31c704d5
Don't shrink window log when streaming with a dictionary
...
Fixes #2442 .
1. When creating a dictionary keep the same behavior as before.
Assume the source size is 513 bytes when adjusting parameters.
2. When calling ZSTD_getCParams() or ZSTD_adjustCParams() keep
the same behavior as before.
3. When attaching a dictionary keep the same behavior of ignoring
the dictionary size. When streaming this will select the
largest parameters and not adjust them down. But, the CDict
will use the correctly sized parameters, which seems like the
right tradeoff.
4. When not attaching a dictionary (either forced not to, or
using a prefix dictionary) we select parameters based on the
dictionary size + source size, and assume the source size is
small, which is the same behavior as before. But, now we don't
adjust the window log (and hash and chain log) down when the
source size is unknown.
When the source size is unknown all cdicts should attach, except
when the user disables attaching, or `forceWindow` is used. This
means that when streaming with a CDict we end up in the good case
where we get small CDict parameters, and large source parameters.
TODO: Add a streaming + dictionary regression test case.
2021-01-04 15:54:09 -08:00
Nick Terrell
66e811d782
[license] Update year to 2021
2021-01-04 17:53:52 -05:00
senhuang42
5c41490bfe
Use pre-defined constants
2020-12-21 11:52:05 -05:00
senhuang42
7e11bd012b
Implement skippable frame function
2020-12-21 11:13:22 -05:00
Yann Collet
a7cb4af573
added emphasis on the alignment condition of workspace
...
and made it a programming mistake (`assert()`)
rather than a runtime error.
2020-12-18 15:04:09 -08:00
Nick Terrell
ae85676d44
Fix alignment of scratchBuffer in HUF_compressWeights()
...
The scratch buffer must be 4-byte aligned. This causes test failures in
32-bit systems, where the stack isn't aligned.
Fixes Issue #2428 .
2020-12-17 14:30:27 -08:00
Yann Collet
0b39531d75
moving all references to release
branch
...
was previously `master`
2020-12-16 23:00:35 -08:00
Yann Collet
b8c3a473ec
Merge pull request #2420 from terrelln/huf-comment
...
[huf_compress] Refactor and comment HUF_buildCTable()
2020-12-14 16:14:07 -08:00
W. Felix Handte
9dab03db90
Create Enum to Represent Static/Dynamic Allocation Distinction in cwksp
2020-12-09 14:57:37 -05:00
W. Felix Handte
db9e73cb07
Don't ASAN-Poison Statically-Allocated Workspaces
...
Addresses #2286 .
2020-12-09 13:00:47 -05:00
Nick Terrell
1bbcf07bd5
[huf_compress] Refactor and comment HUF_buildCTable()
...
Comment and refactor `HUF_buildCTable()` and the helper functions
it calls as I read and understand the code. Hopefully this refactor
makes the code a bit more clear.
2020-12-08 13:57:01 -08:00
Yann Collet
b86e3c9304
Merge pull request #2415 from facebook/fix_aliasing
...
fix gcc-10 strict aliasing warnings
2020-12-04 21:30:57 -08:00
Yann Collet
6132df8dd3
fix gcc-10 strict aliasing warnings
...
by exposing HUF_CElt declaration.
2020-12-04 16:43:19 -08:00
Yann Collet
68c14bdff2
minor speed improvement to HUF_readCTable()
...
faster by ~+1-2%
2020-12-04 16:33:39 -08:00
Nick Terrell
c238db046f
Merge pull request #2414 from terrelln/mt-progress
...
[lib] Ensure that multithreaded compression always makes some progress
2020-12-04 16:30:08 -08:00
Nick Terrell
4c58cb8383
[lib] Ensure that multithreaded compression always makes some progress
2020-12-03 20:25:14 -08:00
Nick Terrell
6672689e7e
Merge pull request #2406 from terrelln/linux-wrapper-api
...
[linux] Add the linux wrapper API
2020-12-02 16:49:03 -08:00
Nick Terrell
894ae36675
Merge pull request #2390 from animalize/clamp_level
...
Clamp compression level
2020-12-02 14:35:58 -08:00
senhuang42
2cbd038528
Move max nb seq check to per-block
2020-12-02 12:11:32 -05:00
Nick Terrell
3cda5fae77
[minor][lib] Remove double semicolon
2020-12-02 01:08:08 -08:00
senhuang42
3efe9c902b
Add sequence nb validation to compressSequences(), adjust minMatch comparisons
2020-12-01 10:54:45 -05:00
senhuang42
4c5f337248
Use cctx's minMatch instead of global MINMATCH, make fuzzer use validation
2020-11-30 15:41:20 -05:00
sen
c5fbd55dac
Merge pull request #2387 from senhuang42/compress_sequence_API
...
[RFC] New sequence compression API
2020-11-20 16:54:20 -05:00
senhuang42
7742f076b4
Add experimental param for sequence validation
2020-11-20 11:57:41 -05:00
senhuang42
0e32928b7d
Remove unnecessary repcode backup, apply style choices, use function pointer
2020-11-20 11:02:19 -05:00
sen
e924a0fa51
Explicit cast for visual warnings
...
Github has automatic commits now! Cool
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2020-11-19 17:32:40 -05:00
senhuang42
dcbbf7c09f
Unroll isRLE loop
2020-11-19 12:38:13 -05:00
senhuang42
05c0229668
Clean up visual conversion warnings
2020-11-18 15:36:29 -05:00
senhuang42
d6d7ba2a1f
Modification to offset validation to include entire sequence
2020-11-17 10:13:22 -05:00
senhuang42
8f3136a9c7
Fix assert edge case, improve documentation in zstd.h
2020-11-16 18:05:35 -05:00
senhuang42
f6baad87d6
Fix warnings and make validation enabled by default
2020-11-16 12:00:06 -05:00
senhuang42
55b90ef010
Fix unit tests to agree with new changes
2020-11-16 11:36:37 -05:00
senhuang42
7f563b0519
Add new sequence format as an experimental CCtx param
2020-11-16 10:49:17 -05:00
senhuang42
347824ad73
Overhaul logic to simplify, add in proper validations, fix match splitting
2020-11-16 10:49:17 -05:00
senhuang42
46824cb018
Add new sequence compress api params to cctx
2020-11-16 10:49:17 -05:00
senhuang42
48405b4633
Fix srcSize=0 edge case
2020-11-16 10:49:17 -05:00
senhuang42
022e6d81e7
Fix literals length calculation
2020-11-16 10:49:17 -05:00
senhuang42
dad20b5ccb
Remove dstCapacity error check
2020-11-16 10:49:17 -05:00
senhuang42
b8e16a2057
Remove extraneous function in this API
2020-11-16 10:49:17 -05:00
senhuang42
f29507c4fc
Add check comparing offset to window size
2020-11-16 10:49:17 -05:00
senhuang42
7a6e46a92f
Fix MSAN errors
2020-11-16 10:49:17 -05:00
senhuang42
cc2642bd17
Address edge case with endPosInSequence
2020-11-16 10:49:17 -05:00
senhuang42
fd10007174
Change debug levels to appropriate ones
2020-11-16 10:49:17 -05:00
senhuang42
2db8441245
Add RLE support
2020-11-16 10:49:17 -05:00
senhuang42
dfef298336
Fix various build warnings
2020-11-16 10:49:17 -05:00
senhuang42
2bbdddf24e
Add test case to roundtrip using ZSTD_getSequences() and ZSTD_compressSequences()
2020-11-16 10:49:16 -05:00
senhuang42
5fd69f8173
Add documentation for new api functions
2020-11-16 10:49:16 -05:00
senhuang42
e8b7fdb64b
Refactor for enhanced code clarity
2020-11-16 10:49:16 -05:00
senhuang42
c675fb46f1
Rename internal function compressSequences(), and promote new *_ext() functions to their actual name
2020-11-16 10:49:16 -05:00
senhuang42
013434e1e4
Add another API function to compress with existing CCTX
2020-11-16 10:49:16 -05:00
senhuang42
c44ce29013
More adjustments to improve code clarity
2020-11-16 10:49:16 -05:00
senhuang42
48f67da854
Pull compressStream2() transparent initialization into its own function
2020-11-16 10:49:16 -05:00
senhuang42
c86151f53c
Add initial support for new ZSTD_Sequence mode
2020-11-16 10:49:16 -05:00
senhuang42
e0f26afce9
Add sequence compression format param
2020-11-16 10:49:16 -05:00
senhuang42
f51af9a609
Always ensure sequenceRange updates properly, add more error forwarding
2020-11-16 10:49:16 -05:00
senhuang42
1a449688fd
Various minor logical refactors to improve clarity
2020-11-16 10:49:16 -05:00
senhuang42
e5fe485dcc
Fix cSize calculation for noCompressBlocks
2020-11-16 10:49:16 -05:00
senhuang42
6145ebb400
Rebased, roundtrips silesia.tar
2020-11-16 10:49:16 -05:00
senhuang42
b5b61cc216
Refactor for better debugging info
2020-11-16 10:49:16 -05:00
senhuang42
293fad6b45
Corrections and edge-case fixes to be able to roundtrip dickens
2020-11-16 10:49:16 -05:00
senhuang42
7eb6fa7be4
Multi-block compression scaffolding - works on single-block files
2020-11-16 10:49:16 -05:00
senhuang42
75b01f34b9
Add support for uncompressible blocks
2020-11-16 10:49:16 -05:00
senhuang42
e04da68157
Enable usage of ZSTD_sequenceRange for single-block compression
2020-11-16 10:49:16 -05:00
senhuang42
337fac216d
Add logic to handle ZSTD_sequenceRange
2020-11-16 10:49:16 -05:00
senhuang42
85822ddd53
Add last literals handling like getSequences()
2020-11-16 10:49:16 -05:00
senhuang42
2cff8df1a2
Pull block compression out of main compressSequences() function
2020-11-16 10:49:16 -05:00
senhuang42
cfced9344a
Implement ZSTD_updateSequenceRange
2020-11-16 10:49:16 -05:00
senhuang42
b116e1f211
Modify SequenceRange to have posInSequence
2020-11-16 10:49:16 -05:00
senhuang42
d99b675112
Add function definition for sequenceRange updater
2020-11-16 10:49:16 -05:00
senhuang42
74e95c05cc
Add ZSTD_SequenceRange to count ranges in array of ZSTD_Sequence
2020-11-16 10:49:16 -05:00
senhuang42
89f3848310
Add support for repcodes
2020-11-16 10:49:16 -05:00
senhuang42
3e930fd044
Code cleanup, add debuglog statments
2020-11-16 10:49:16 -05:00
senhuang42
086513b5b9
Implement first pass at compressSequences()
2020-11-16 10:49:16 -05:00
senhuang42
a9327b1e9b
Add initial function prototype for ZSTD_compressSequences_ext (to be renamed later)
2020-11-16 10:33:35 -05:00
animalize
52f8c07a3f
Clamp compression level in ZSTD_getCParams_internal() function
2020-11-14 13:26:08 +08:00
senhuang42
9d936d61d2
Reduce number of memcpy() calls
2020-11-13 19:43:30 -05:00
senhuang42
be4ac6c5bc
Use existing repcode update function to implement updates
2020-11-12 16:51:12 -05:00
senhuang42
674c9b9235
Add in proper block repcode histories
2020-11-12 15:34:37 -05:00
senhuang42
06c7f14066
Let block reps persist
2020-11-12 12:24:44 -05:00
senhuang42
396275068c
Fix incorrect repcode setting
2020-11-12 11:57:01 -05:00
senhuang42
1a8af0de73
Improve unit test
2020-11-12 11:09:09 -05:00
senhuang42
4d4fd2c55f
Overhaul repcode handling logic
2020-11-12 10:59:35 -05:00
sen
f62edf0fe9
Merge pull request #2381 from senhuang42/expand_sequence_extraction_api
...
Add enum to define ZSTD_Sequence type and update sequence extraction API
2020-11-06 13:00:31 -05:00
senhuang42
7d1dea070c
Update unit tests
2020-11-06 11:10:37 -05:00
senhuang42
779df995c6
Implement mergeGeneratedSequences()
2020-11-06 10:55:46 -05:00
senhuang42
51abd58208
Rename getSequences() to generateSequences()
2020-11-06 10:53:22 -05:00
Luke Pitt
eac309c71b
Add ZSTD_getDictID_fromCDict function to experimental section
2020-11-04 11:37:37 +00:00
senhuang42
f782cac3d4
Change block delimiter removing to linear time approach
2020-11-02 17:06:20 -05:00
senhuang42
3434049c1f
Use ZSTD_memmove() instead of memmove()
2020-11-02 11:43:19 -05:00
senhuang42
d4d0346b40
Update name of enum, clarify documentation
2020-11-02 11:38:17 -05:00
senhuang42
e6178f837f
Revert unnecessary seqCollector adjustment
2020-11-02 10:59:20 -05:00
senhuang42
e8501e00b8
Fix incorrect index increment in merge algorithm
2020-11-02 10:58:41 -05:00
senhuang42
a36fdada57
Add algorithm to remove all delimiters
2020-11-02 10:46:52 -05:00
senhuang42
435a3a0428
Update seqCollector definition
2020-11-02 10:19:26 -05:00
senhuang42
3327932609
Update ZSTD_getSequences function signature
2020-11-02 10:17:59 -05:00
Nick Terrell
7205e609a9
Merge pull request #2354 from terrelln/stable-buffer
...
Add ZSTD_c_stable{In,Out}Buffer and optimize when set
2020-10-30 15:06:56 -07:00
sen
c37c714ef1
Merge pull request #2376 from senhuang42/clarify_sequence_extraction_api
...
Refine external ZSTD_Sequence API
2020-10-30 15:47:25 -04:00
Nick Terrell
d4e021fe35
[lib] Avoid allocating the input buffer when ZSTD_c_stableInBuffer is set
...
We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
2020-10-30 10:55:34 -07:00
Nick Terrell
24f72789e2
[lib] Skip the input window buffer when ZSTD_c_stableInBuffer is set
...
Compress directly from the `ZSTD_inBuffer`. We still allocate the input
buffer. A following commit will remove that allocation.
2020-10-30 10:55:34 -07:00
Nick Terrell
6bd6b6f7d3
[cwksp] Return NULL when 0 bytes are requested
...
This ensures that the buffer is never used.
2020-10-30 10:55:34 -07:00
Nick Terrell
fcf81cee5e
[lib] Avoid allocating output buffer when ZSTD_c_stableOutBuffer is set
...
We compress directly to the `ZSTD_outBuffer` so we don't need to
allocate it.
2020-10-30 10:55:34 -07:00
Nick Terrell
6d5dc93d4e
[lib] Compress directly into output when ZSTD_c_stableOutBuffer is set
...
When we have a stable output buffer always compress directly into the
`ZSTD_outBuffer`. We are allowed to return `dstSizeTooSmall`.
2020-10-30 10:55:34 -07:00
Nick Terrell
987cb4ca6a
[lib] Take the shortcut when ZSTD_c_stableOutBuffer is set
...
When we have a stable output buffer take the single-pass shortcut.
It is okay to return `dstSizeTooSmall` if the output buffer isn't
big enough, because we know it will never grow.
2020-10-30 10:55:34 -07:00
Nick Terrell
809b2f2071
[lib] Set ZSTD_c_stable{In,Out}Buffer in ZSTD_compress2()
...
Sets these parameters in ZSTD_compress2() then resets them to their
orignal values after the compression call.
An alternative design could be to add a flush mode `ZSTD_e_singlePass`
which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single
compression call, by directly setting the applied parameters. I've opted
for the smaller change, but this is open for discussion.
2020-10-30 10:55:34 -07:00
Nick Terrell
c74be3f6de
[lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set
...
Adds the validation of the input/output buffers only. They are still
unused.
2020-10-30 10:55:34 -07:00
Nick Terrell
e3e0775cc8
[API] Add ZSTD_c_stable{In,Out}Buffer parameters
...
This commit adds the parameters and sets the value in the CCtxParams
but it does not do anything with the value.
2020-10-30 10:54:39 -07:00
Nick Terrell
e2581d9572
[lib] Set appliedParams in zstdmt mode
...
Previously only `nbWorkers` was set. Set all parameters, because that is
what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer`
parameters.
2020-10-30 10:54:38 -07:00
senhuang42
536e89c723
Sequence extractor should update CBlockState
2020-10-30 12:13:19 -04:00
senhuang42
32cac2627a
Emit last literals of 0 size as well, to indicate block boundary
2020-10-29 16:41:17 -04:00
senhuang42
69bd5f0654
Correct literalsRead calculation to include longLength
2020-10-29 14:49:37 -04:00
senhuang42
59624f3163
Remove implicit typecast to appease appVeyor windows build
2020-10-28 16:25:09 -04:00
senhuang42
3ed5d053d8
Clarify comments in zstd.h some more
2020-10-28 09:53:09 -04:00
Nick Terrell
599ff58e08
Merge pull request #2339 from terrelln/zstdmt-stability
...
Fix zstdmt stability issues and clean up the zstdmt code
2020-10-27 19:43:13 -07:00
sen
17b700d78a
Merge pull request #2366 from senhuang42/enable_ldm_by_default
...
Enable LDM by default if window size >= 128MB and strategy uses opt parser
2020-10-27 14:59:28 -04:00
Nick Terrell
0953645837
Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue
...
Fix long distance matcher OSS-fuzz issue
2020-10-27 11:13:03 -07:00
senhuang42
3163909d14
Remove unused variable position
2020-10-27 12:58:12 -04:00
senhuang42
dc448563e9
Add test compatibility with last literals in sequences
2020-10-27 12:35:28 -04:00
senhuang42
1d221ecc03
Add support for representing last literals in the extracted seqs
2020-10-27 11:19:48 -04:00
senhuang42
9171f920cd
Improve documentation of seqStore_t
2020-10-27 10:50:22 -04:00
senhuang42
96b0ff7886
Improve documentation regarding various operations in copyBlockSequences
2020-10-27 10:36:06 -04:00
senhuang42
3a11c7eb03
Modify ZSTD_copyBlockSequences to agree with new API
2020-10-27 10:31:40 -04:00
senhuang42
8bdb32aebe
Add a function for LDM enable check
2020-10-20 13:46:02 -04:00
senhuang42
578e889ec1
Move ldm enable to compressStream2()
2020-10-20 13:04:45 -04:00
senhuang42
d28d8a1d72
Include LDM tables size for CCtx size estimation where relevant
2020-10-20 09:21:30 -04:00
senhuang42
b1c7fc5768
Add compatibility for multithreading
2020-10-19 12:07:06 -04:00
senhuang42
590f7f55f0
Add ldm enable condition in ZSTD_resetCCtx_internal
2020-10-19 10:26:17 -04:00
senhuang42
4d01979b62
Expose and call ZSTD_ldm_skipRawSeqStoreBytes()
2020-10-16 20:30:00 -04:00
Yann Collet
a0ec50c2dc
Merge pull request #2355 from senhuang42/change_ldm_mt_config
...
Reduce --long mode MT jobsize at higher levels
2020-10-16 13:35:50 -07:00
senhuang42
f49926edf4
Change cycleLog adjustment to +3 from +4
2020-10-15 09:56:05 -04:00
senhuang42
ee84817fe7
Reset posInSequence when using ZSTD_referenceExternalSequences()
2020-10-14 22:06:08 -04:00
senhuang42
d0550bb18f
Clarify argument names, fix DEBUGLOG() statements
2020-10-14 15:45:43 -04:00
senhuang42
3f99c9b38d
Adjust match backwards count args
2020-10-14 15:23:03 -04:00
senhuang42
bf0d559449
Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments()
2020-10-14 12:58:06 -04:00
senhuang42
467e4383b0
Merge branch 'dev' of github.com:senhuang42/zstd into change_ldm_mt_config
2020-10-14 10:17:50 -04:00
Yann Collet
f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
...
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell
7e6f91ed84
[minor] Improve docs and add an assert in response to review
2020-10-12 16:43:17 -07:00
senhuang42
354b5f1c0a
Use cycleLog instead of chainLog to determine LDM jobLog
2020-10-12 16:09:59 -04:00
Nick Terrell
441ce4178f
[zstdmt] Clarify a comment
2020-10-12 12:58:13 -07:00
Nick Terrell
efff5d8b2d
[zstdmt] Fix determinism issue with rsyncable mode
...
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
for the next sync point.
This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.
Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
2020-10-12 12:55:17 -07:00
Nick Terrell
ede4f97153
[zstdmt] Fix bug where extra empty blocks are emitted
...
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.
The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
2020-10-12 12:55:17 -07:00
Nick Terrell
c51a9e79b9
[zstdmt] Rip out the zstdmt API
...
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
2020-10-12 12:55:16 -07:00
Nick Terrell
1784c4b4ab
[zstdmt] Remove single-pass shortcut
...
Simplifies the code and removes blocking from zstdmt.
At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.
Fixes #2327 .
2020-10-12 12:53:26 -07:00
Nick Terrell
b55ae009ac
[zstdmt] Remove singleBlockingThread mode
...
This is already handled by zstd, so this logic is never used.
2020-10-12 12:53:26 -07:00
Nick Terrell
d5c688e8ae
Fix ZSTD_adjustCParams_internal() to handle dictionary logic
...
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.
Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.
Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.
Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Nick Terrell
fadaab8c7c
[minor improvement] Pass 0 as the content size in the DDS
...
The DDS structure can't be copied into the working tables like the DMS.
So it doesn't need to account for the source size when sizing its
parameters, just the dictionary size.
2020-10-12 12:47:21 -07:00
Nick Terrell
48ef15fb47
[minor improvement] Pass dictSize when selecting parameters
...
When selecting parameters in streaming compression with a dictionary use
the dictionary size to select the parameters.
2020-10-12 12:47:19 -07:00
Nick Terrell
012818df99
[refactor] Remove ZSTD_resetCStream_internal()
...
This function is only called in one place. It isn't a logical separation
of duties, and it was only obsfucating the code now, so inline it.
2020-10-12 12:46:10 -07:00
Nick Terrell
7083f79008
[bug] Fix dictContentType when reprocessing cdict
...
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).
Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.
Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.
Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
2020-10-12 12:46:10 -07:00
senhuang42
d6911b86be
Require LDM matches to be strictly greater in length
2020-10-09 12:56:18 -04:00
Yann Collet
12541931fa
Merge pull request #2328 from marxin/zstd-pool-api
...
Allow external creation of POOLs that can be shared.
2020-10-09 01:00:50 -07:00
Yann Collet
6fdb0cb8d9
Merge pull request #2303 from senhuang42/let_cdict_take_clevel_priority
...
For ZSTD_compressStream2(), let cdict take compression level priority
2020-10-09 00:48:30 -07:00
senhuang42
b9c8033cde
Define kNullRawSeqStore for every file
2020-10-07 19:02:41 -04:00
senhuang42
a6165c1b28
Change matchState_t::ldmSeqStore to pointer
2020-10-07 14:13:57 -04:00
senhuang42
abce708a56
Move posInSequence correction to correct location
2020-10-07 13:56:25 -04:00
senhuang42
0c515590d8
Replace offCode of largest match if ldm's offCode is superior
2020-10-07 13:56:25 -04:00
senhuang42
0fac8e07e1
Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes
2020-10-07 13:56:25 -04:00
senhuang42
a5500cf2af
Refactor separate ldm variables all into one struct
2020-10-07 13:56:25 -04:00
senhuang42
0731b94e7c
Use kNullRawSeqStore constant in zstdmt_compress.c
2020-10-07 13:56:25 -04:00
senhuang42
0325d878f2
Remove bubbling down matches with longer offCode and same matchLen
2020-10-07 13:56:25 -04:00
senhuang42
031b7ec15f
Disable LDM minMatch adjustment when using opt parser
2020-10-07 13:56:25 -04:00
senhuang42
ddf8a3f1b9
Enable inclusion of mid-flight LDMs in opt parser
2020-10-07 13:56:25 -04:00
senhuang42
88f72ed942
Correct incorrect offcode calculation
2020-10-07 13:56:25 -04:00
senhuang42
d8b43a4202
Add explicit conversion of size_t to U32
2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d
Add cSize regression test to fuzzer.c
2020-10-07 13:56:25 -04:00
senhuang42
c87d2e5866
Prefix new static ldm helpers with ZSTD_opt
2020-10-07 13:56:25 -04:00
senhuang42
429dec4f42
Add DEBUGLOG() calls in ldm helpers
2020-10-07 13:56:25 -04:00
senhuang42
10647924f1
Make function descriptions more accurate
2020-10-07 13:56:25 -04:00
senhuang42
1a687b3fcb
Improve documentation of relevant structs
2020-10-07 13:56:25 -04:00
senhuang42
37617e23d7
Correct matchLength calculation and remove unnecessary functions
2020-10-07 13:56:25 -04:00
senhuang42
7dee62c287
Reset ldmSeqStore after initStats_ultra() pass for btultra2
2020-10-07 13:56:25 -04:00
senhuang42
0718aa70df
Refactor existing functions to use posInSequence
2020-10-07 13:56:25 -04:00
senhuang42
7348b40a87
Adjustments to ldm_calculateMatchRange() to calculate bounds correctly
2020-10-07 13:56:25 -04:00
senhuang42
a1ef2db5b2
Add ldm_calculateMatchRange() function
2020-10-07 13:56:25 -04:00
senhuang42
ef823e0299
Remove rawSeqStore.base and add rawSeqStore.posInSequence
2020-10-07 13:56:25 -04:00
senhuang42
4793ae3b84
Prevent duplicate LDMs from being inserted
2020-10-07 13:56:25 -04:00
senhuang42
65f9cfeeec
Add extra bounds check to prevent heap access after free ASAN error
2020-10-07 13:56:25 -04:00
senhuang42
bff5785fd5
Address mixed variables C90 warning
2020-10-07 13:56:25 -04:00
senhuang42
724b94ed18
ldm_getNextMatch fixed return values
2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68
Cleanups, add comments and explanations
2020-10-07 13:56:25 -04:00
senhuang42
78da2e1808
Fixed sifting algorithm
2020-10-07 13:56:25 -04:00
senhuang42
6ccd97fc96
Fixed end of match boundary update issues
2020-10-07 13:56:25 -04:00
senhuang42
28394b64f2
Add proper bounds check on adding ldms
2020-10-07 13:56:25 -04:00
senhuang42
a2f2b58d04
Add a function ldm_voidSequences()
2020-10-07 13:56:25 -04:00
senhuang42
9c3c7cd20e
Fix function argument to getNextMatch()
2020-10-07 13:56:25 -04:00
senhuang42
c8b8572b38
Adjustments to no longer segfault on nci
2020-10-07 13:56:25 -04:00
senhuang42
f57c7e6bbf
Add base adjustment correction
2020-10-07 13:56:25 -04:00
senhuang42
5df9b5e05f
Add initial getNextMatch() in opt parser
2020-10-07 13:56:25 -04:00
senhuang42
f8ce7cabc3
Added more debugging
2020-10-07 13:56:25 -04:00
senhuang42
84009a076a
Add re-copying of ldmSeqStore after processing
2020-10-07 13:56:25 -04:00
senhuang42
42395a70c2
Add debug statements, flesh out functions
2020-10-07 13:56:25 -04:00
senhuang42
dd3dd199bb
Get zstd to build with new functions and callsites, fix arguments
2020-10-07 13:56:25 -04:00
senhuang42
766c4a8c28
Implement part of ldm_maybeAddLdm()
2020-10-07 13:56:25 -04:00
senhuang42
84777059d2
Implement ldm_getNextMatch()
2020-10-07 13:56:24 -04:00
senhuang42
28c74bf591
Implement basic splitSequence and skipSequence functions
2020-10-07 13:56:24 -04:00
senhuang42
634ab7830d
Flesh out required args for ldm_handleLdm()
2020-10-07 13:56:24 -04:00
senhuang42
db70761032
Add callsites to appropriate locations in ..opt_generic()
2020-10-07 13:56:24 -04:00
senhuang42
aea61e3c91
Add ldm helper function declarations into opt parser
2020-10-07 13:56:24 -04:00
senhuang42
35d9f488f5
Modify codepath to use opt parser exclusively if the compression level is high enough
2020-10-07 13:56:24 -04:00
senhuang42
e1ae398ad5
Add rawSeqStore to match state
2020-10-07 13:56:24 -04:00
Martin Liska
b684900a4a
Allow external creation of POOLs that can be shared.
2020-10-07 12:44:33 +02:00
Nick Terrell
27c969ed07
Add comments to ZSTD_getLowest{Match,Prefix}Index()
...
Clarify how we handle dictionaries in each case.
2020-10-01 13:21:46 -07:00
Yann Collet
cc88eb7594
Merge pull request #2317 from animalize/msvc_inline
...
Let MSVC force inline ZSTD_hashPtr() function
2020-09-30 08:27:53 -07:00
Nick Terrell
f1cbeec039
[superblock] Reduce stack usage by correctly sizing header buffers
2020-09-24 19:42:04 -07:00
Nick Terrell
6a1e526ea7
[lib] Add ZSTD_COMPRESS_HEAPMODE tuning parameter
2020-09-24 19:42:04 -07:00
Nick Terrell
b841387218
[freestanding] Improve macro resolution to handle #if X
2020-09-24 19:42:04 -07:00
Nick Terrell
caecd8c211
Allow user to override ASAN/MSAN detection
...
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
2020-09-24 19:42:04 -07:00
Nick Terrell
88fac5d514
Remove call to memset
...
The previous commit fixes the test so it errors on calls to mem*()
functions from <string.h>.
2020-09-24 19:42:04 -07:00
Nick Terrell
9261476b7d
[lib] Wrap customMem xor checks in parens for readability
...
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
2020-09-23 23:26:07 -07:00
animalize
2e5d73dd72
Use MEM_STATIC FORCE_INLINE_ATTR
instead of FORCE_INLINE_TEMPLATE
...
It adds `__attribute__((unused))` for __GNUC__, to eliminate `-Werror=unused-function` error.
2020-09-21 13:26:38 +08:00
animalize
0a69a6b1ca
Let MSVC force inline ZSTD_hashPtr() function
...
ZSTD_hashPtr() function was not expanded by MSVC, led to low performance compared to GCC.
2020-09-21 10:38:55 +08:00
Felix Handte
200c960f1d
Merge pull request #2311 from felixhandte/ddss-fix-cparam-derivation
...
Fix Compression Parameter Derivation Bugs Introduced by DDSS Changes
2020-09-18 14:02:14 -04:00
W. Felix Handte
8930c6e551
Use ZSTD_CCtxParams_init() to Init CCtxParams, not memset()
...
Even if the discrepancies are at the moment benign, it's probably better to
standardize on using the one true initializer, rather than trying (and failing)
to correctly duplicate its behavior.
2020-09-17 12:15:33 -04:00
W. Felix Handte
e8a44326fa
Avoid Redundancy in ZSTD_initCDict_internal() Args; Don't Take CParams + CCtxParams
2020-09-17 12:08:36 -04:00
W. Felix Handte
eee51a664a
Fall Back if Derived CParams are Incompatible with DDSS; Refactor CDict Creation
...
Rewrite ZSTD_createCDict_advanced() as a wrapper around
ZSTD_createCDict_advanced2(). Evaluate whether to use DDSS mode *after* fully
resolving cparams. If not, fall back.
2020-09-15 18:01:08 -04:00
W. Felix Handte
bc6521a6f6
Make ZSTD_createCDict_advanced2() cctxParams Arg Const
2020-09-15 14:06:10 -04:00
W. Felix Handte
26a96a5b35
Do More Complete CParams Deduction in Non-DDSS Path of ZSTD_createCDict_advanced2
...
Call ZSTD_getCParamsFromCCtxParams() instead of ZSTD_getCParams_internal().
2020-09-15 13:57:43 -04:00
W. Felix Handte
a2af804129
Pull CParam Override Logic into Helper
2020-09-15 13:38:05 -04:00
Yann Collet
c91a0855f8
check endDirective in ZSTD_compressStream2()
...
fix #2297
also :
- `assert()` `endDirective` in `ZSTD_compressStream_internal()`, for debug mode
- add relevant tests
2020-09-14 10:56:08 -07:00
senhuang42
17b56f934e
Coding style cleanup
2020-09-11 11:42:12 -04:00
senhuang42
801513b5e7
Modify params rather than cctx->requestedParams
2020-09-11 11:41:10 -04:00
W. Felix Handte
c5fab8848a
Document searchFuncs Table
2020-09-10 22:10:02 -04:00
W. Felix Handte
85a95840e4
Further Consolidate Dict Mode Checks
2020-09-10 22:10:02 -04:00
W. Felix Handte
0faefbf1b3
Make DDSS Selection Override ForceCopy Directive
2020-09-10 22:10:02 -04:00
W. Felix Handte
efa33861f2
Attempt to Fix MSVC Warnings
2020-09-10 22:10:02 -04:00
W. Felix Handte
ed43832770
Simplify Match Limit Checks
...
Seems like a ~1.25% speedup.
2020-09-10 22:10:02 -04:00
W. Felix Handte
06d240b8a7
Use All Available Space in the Hash Table to Extent Chain Table Reach
...
Rather than restrict our temp chain table to 2 ** chainLog entries, this
commit uses all available space to reach further back to gather longer
chains to pack into the DDSS chain table.
2020-09-10 22:10:02 -04:00
W. Felix Handte
b2b0641ea0
Rewrite Table Fill to Retain Cache Entries Beyond Chain Window
2020-09-10 22:10:02 -04:00
W. Felix Handte
916238d9dc
Avoid Malloc in Table Fill; Pack Tmp Structure into Hash Table
2020-09-10 22:10:02 -04:00
W. Felix Handte
f42c5bddd9
Truncate Chain at Last Possible Attempt
...
Make the chain table denser?
2020-09-10 22:10:02 -04:00
W. Felix Handte
20a020edbc
Prefetch Chain Table Matches
2020-09-10 22:10:02 -04:00
W. Felix Handte
9b9feb84f2
Lay Out Chain Table Chains Contiguously
...
Rather than interleave all of the chain table entries, tying each entry's
position to the corresponding position in the input, this commit changes the
layout so that all the entries in a single chain are laid out next to each
other. The last entry in the hash table's bucket for this hash is now a packed
pointer of position + length of this chain.
This cannot be merged as written, since it allocates temporary memory inside
ZSTD_dedicatedDictSearch_lazy_loadDictionary().
2020-09-10 22:10:02 -04:00
W. Felix Handte
66509c7bf4
Only Insert Positions Inside the Chain Window
2020-09-10 22:10:02 -04:00
W. Felix Handte
13c5ec3e41
Only Allow Dedicated Dict Search for Dicts Loaded in 1 Chunk
...
The load algorithm requires we do it all in one go.
2020-09-10 22:10:02 -04:00
W. Felix Handte
07793547e6
Fix Bug: Only Use DDSS Insertion on CDict MatchStates
...
Previously, if DDSS was enabled on a CCtx and a dictionary was inserted into
the CCtx, the CCtx MatchState would be filled as a DDSS struct, causing
segfaults etc. This changes the check to use whether the MatchState is marked
as using the DDSS (which is only ever set for CDict MatchStates), rather than
looking at the CCtxParams.
2020-09-10 18:51:52 -04:00
W. Felix Handte
d214d8c859
Shorten Dict Mode Conditionals in Order to Improve Readability
2020-09-10 18:51:52 -04:00
W. Felix Handte
f49c1563ff
Force-Inline ZSTD_insertAndFindFirstIndex_internal()
...
Without this, gcc was declining to inline the function in `ZSTD_noDict` mode,
resulting in a ~10% slowdown.
2020-09-10 18:51:52 -04:00
W. Felix Handte
cab86b074f
Clean Up Search Function Selection
2020-09-10 18:51:52 -04:00
W. Felix Handte
2ffbde0d95
Fix -Wshorten-64-to-32
Error
2020-09-10 18:51:52 -04:00
W. Felix Handte
7b5d2f72ea
Adjust Working Context Table Sizes Back Down
2020-09-10 18:51:52 -04:00
W. Felix Handte
d332f57897
Permit Matching Against Lowest Valid Position
...
This comparison was previously faulty: the lowest valid position is itself
valid, and we should therefore be allowed to match against it.
2020-09-10 18:51:52 -04:00
W. Felix Handte
a3659fe1ef
Make ZSTD_dedicatedDictSearch_getCParams Wrap ZSTD_getCParams
...
Fixes up bounds-checking, and lets us clean up what is at the moment an
unnecessary duplication of the default cparams tables.
2020-09-10 18:51:52 -04:00
W. Felix Handte
7b9a755ac9
Remove Chain Limit on Hash Cache Entries; Slightly Improve Compression
...
Entries in the hashTable chain cache aren't subject to the same aliasing that
the circular chain table is subject to. As such, we don't need to stop when we
cross the chain limit. We can delve deeper. :)
2020-09-10 18:51:52 -04:00
W. Felix Handte
e8b4011b52
Split Lookups in Hash Cache and Chain Table into Two Loops
...
Sliiiight speedup.
2020-09-10 18:51:52 -04:00
W. Felix Handte
9e83c782f8
Simplify DDS Hash Table Construction
...
No need to walk the chainTable; we can just keep shifting the entries in the
hashTable.
2020-09-10 18:51:52 -04:00
W. Felix Handte
5390fee4f7
Rename and Move DD_BLOG Constant to ZSTD_LAZY_DDSS_BUCKET_LOG
2020-09-10 18:51:52 -04:00
W. Felix Handte
5e91ae27eb
Prefetch First Batch of Match Positions; +11% Speed in Level 5 w/ 1 Dict
2020-09-10 18:51:52 -04:00
W. Felix Handte
df386b3d8d
Fix Off-By-One Error in Counting DDS Search Attempts
...
This caused us to double-search the first position and fail to search the
last position in the chain, slowing down search and making it less effective.
2020-09-10 18:51:52 -04:00
W. Felix Handte
914bfe7ee4
Init CCtx's Local Dict with CCtxParams
2020-09-10 18:51:52 -04:00
W. Felix Handte
db2aa25252
Decision for Whether to Attach Should be Based on CDict Config, not CCtx
2020-09-10 18:51:52 -04:00
W. Felix Handte
a494111385
Move Prefetch Before Insertion; Speed Up ~6%
2020-09-10 18:51:52 -04:00
W. Felix Handte
eede46a47e
Misc Refactor of DDS Search Code
2020-09-10 18:51:52 -04:00
W. Felix Handte
f1b428fdac
Rename enableDedicatedDictSearch to dedicatedDictSearch in MatchState
...
This makes it clear that not only is the feature allowed here, we're actually
using it, as opposed to the CCtxParam field, in which it's enabled, but we may
or may not be using it.
2020-09-10 18:51:52 -04:00
W. Felix Handte
41012193ad
Always Init CDict's enableDedicatedDictSearch Field
2020-09-10 18:51:52 -04:00
W. Felix Handte
34b545acb0
Add a ZSTD_dedicatedDictSearch ZSTD_dictMode_e to Allow Const Propagation
...
Speed +1.5%.
2020-09-10 18:51:52 -04:00
W. Felix Handte
beefdb0d3d
Fix ZSTD_c_forceAttachDict Bounds
2020-09-10 18:51:52 -04:00
W. Felix Handte
def62e2d3e
Fix Compilation Warnings
2020-09-10 18:51:52 -04:00
Bimba Shrestha
9c628238d3
creating ZSTD_createCDict_advanced_internal
2020-09-10 18:51:52 -04:00
Bimba Shrestha
0a9787c3e1
changing to int for consistency
2020-09-10 18:51:52 -04:00
Bimba Shrestha
e29bc3a009
using dict mls instead of src mls
2020-09-10 18:51:52 -04:00
Bimba Shrestha
145c2d12f9
add hashtable head prefetching
2020-09-10 18:51:52 -04:00
Bimba Shrestha
5d5507788d
change method name for consistency
2020-09-10 18:51:52 -04:00
Bimba Shrestha
b30f71becf
pass correct cparams
2020-09-10 18:51:52 -04:00
Bimba Shrestha
71fda0362f
making cctxParams a pointer
2020-09-10 18:51:52 -04:00
Bimba Shrestha
628559d0e4
loading dict using new algorithm
2020-09-10 18:51:52 -04:00
Bimba Shrestha
22705f0c93
adding dedicatedDictSearch algorithm
2020-09-10 18:51:52 -04:00
Bimba Shrestha
31e581bf65
adding enableDedicatedDictSearch to matchState_t
2020-09-10 18:51:52 -04:00
Bimba Shrestha
50550a14ad
adding dedicated dict load method to lazy
2020-09-10 18:51:52 -04:00
Bimba Shrestha
75b6360036
adding ZSTD_createCDict_advanced2 to zstd.h
2020-09-10 18:51:52 -04:00
Bimba Shrestha
b7dddbe89b
always attach dict when using dedicatedDictSearch
2020-09-10 18:51:52 -04:00
Bimba Shrestha
e36a373df4
adding dedicatedDictSearch cParams helper methods
2020-09-10 18:51:52 -04:00
Bimba Shrestha
f10d4e313c
adding ZSTD_dedicatedDictSearch_defaultCParameters variable
2020-09-10 18:51:52 -04:00
Bimba Shrestha
c497cb6716
Add ZSTD_c_enableDedicatedDictSearch Param
2020-09-10 18:51:52 -04:00
senhuang42
64bd68e44b
Adjust ZSTD_createCDict_byReference() function, and check for cdict when using compressStream2
2020-09-10 13:42:26 -04:00
Nick Terrell
79ded1b4a9
[lib] Add ZSTD_NO_UNUSED_FUNCTIONS macro to hide unused functions
...
The unused function definitions are hidden behind a
`#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check.
Initially hiding all functions which are unused and take up more than
2KB of stack space, because these will show up as warnings in the
Linux Kernel build system.
2020-09-09 14:35:39 -07:00
Nick Terrell
ac3a136b0a
[lib] Replace 64-bit divisions with ZSTD_div64()
2020-09-09 14:35:39 -07:00
Nick Terrell
a90779397a
[lib] Reduce zstd stack usage by 1KB
2020-09-09 14:35:39 -07:00
Nick Terrell
046aca190f
Fix ZSTD_initCStream_advanced() with no dictionary and static allocation
2020-09-09 14:35:39 -07:00
Nick Terrell
f91ed5c766
[lib] s/current/curr because it collides with Linux Kernel macro
2020-09-09 14:35:39 -07:00
Nick Terrell
5e4efd22d4
Merge pull request #2291 from i-do-cpp/fix-compression-level-default
...
Fix setParameter not falling back to default compression level
2020-09-08 16:42:34 -07:00
Nick Terrell
6da8acd231
Merge pull request #2293 from allanjude/coverity
...
Resolve Coverity 1432392 Unintentional integer overflow
2020-09-03 13:58:45 -07:00
Allan Jude
8665793164
Resolve Coverity 1432392 Unintentional integer overflow
...
Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression:
cdict->dictContentSize * 6U
with type unsigned int (32 bits, unsigned) is evaluated using 32-bit
arithmetic, and then used in a context that expects an expression of
type U64 (64 bits, unsigned).
2020-09-03 19:31:50 +00:00
i-do-cpp
aec8b27fff
Update zstd_compress.c
2020-08-31 09:34:08 +02:00
i-do-cpp
d514281e73
Fix setParameter not falling back to default compression level on 0 value
...
See documentation for `ZSTD_c_compressionLevel`: `Special: value 0 means default, which is controlled by ZSTD_CLEVEL_DEFAULT`
2020-08-31 09:25:43 +02:00
Nick Terrell
c465f24457
ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free
2020-08-26 12:26:03 -07:00
Nick Terrell
a686d306d2
Rename ZSTD_{malloc,calloc,free} to ZSTD_custom{Malloc,Calloc,Free}
2020-08-26 12:25:08 -07:00
Nick Terrell
80f577baa2
Move standard includes to zstd_deps.h
2020-08-26 12:25:08 -07:00
Nick Terrell
614e446000
Merge pull request #2271 from terrelln/small-blocks
...
Small block optimizations
2020-08-24 18:54:33 -07:00
Nick Terrell
8def0e5fd3
Fix up code after reading through
2020-08-24 12:24:45 -07:00
Nick Terrell
575731b6db
Use ncount=1 when < 4096 symbols
2020-08-18 16:47:53 -07:00
Nick Terrell
ba1fd17a9f
speed up literal header decoding
2020-08-17 12:17:53 -07:00
Nick Terrell
6004c1117f
speed up small blocks
2020-08-16 23:03:38 -07:00
Yann Collet
38e38546a4
Merge pull request #2258 from Niadb/dev
...
Added STATIC_BMI2 for compile time detection of BMI2 on MSVC, when enabled various intrinsics are used
2020-08-04 09:43:59 -07:00
Niadb
216a63dcf7
Add files via upload
2020-07-28 02:52:52 -06:00
Yann Collet
8b9cdd2597
fixed overlapping count & workspace special case
2020-07-26 22:40:21 -07:00
Yann Collet
051232223f
optimized histogram
...
new version easier to vectorize
leads to smaller code and faster execution
notably at the last recombination stage
(basically, fixed cost per block).
Assembly inspected with godbolt
On my laptop, with `clang` and `-mavx2` :
2K block : 1280 MB/s -> 1550 MB/s
8K block : 1750 MB/s -> 1860 MB/s
2020-07-26 22:24:22 -07:00
Yann Collet
c224367ede
ensure workspace is large enough
...
even when MAX_TABLELOG is reduced
2020-07-16 20:33:50 -07:00
Nick Terrell
1047097dad
[superblock] Add defensive assert and bounds check
...
The bound check condition should always be met because we selected `set_basic` as
our encoding type. But that code is very far away, so assert it is true so if it is
ever false we can catch it, and add a bounds check.
Fixes #2213 .
2020-06-22 10:21:38 -07:00
Nick Terrell
08981d2638
[lib] Allow compression dictionaries with missing symbols
...
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.
Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.
Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.
Fixes #2174 .
2020-06-12 17:57:19 -07:00
Felix Handte
2af4e07326
Merge pull request #2133 from felixhandte/single-size-calculation
...
Consolidate CCtx Size Estimation Code
2020-05-28 13:07:18 -04:00
Nick Terrell
3cc227e90e
[ldm][mt] Fix loadedDictEnd
2020-05-19 15:55:03 -07:00
Yann Collet
fdc56baa42
fix 22294 ( #2151 )
2020-05-18 21:05:10 -07:00
Nick Terrell
b2092c6dc4
[ldm] Reset loadedDictEnd when the context is reset
2020-05-18 12:35:44 -07:00
Nick Terrell
add7ed2d4a
[lib] Fix bug in loading LDM dictionary in MT mode
...
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.
Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```
TODO: Add an explicit test that loads a small dictionary in MT mode
2020-05-14 11:52:28 -07:00
W. Felix Handte
3bb7992350
Fix Size Estimate for LDM Seq Space
2020-05-14 13:50:53 -04:00
Nick Terrell
70c80e19e6
[greedy] Fix performance instability
2020-05-12 17:51:16 -07:00
Nick Terrell
c3e921c639
Merge pull request #2131 from terrelln/raw-dict-fuzzer
...
Fix rare scenario with lazy parser, dictionary, and repcodes
2020-05-12 17:44:31 -07:00
W. Felix Handte
d9a1e37aec
Nit: Fix Size Type for 32-bit
2020-05-12 18:03:31 -04:00
W. Felix Handte
1aa6c7ccce
Assert We Allocated Approximately What We Expected To
2020-05-12 16:55:03 -04:00
W. Felix Handte
27e2482217
Minor Refactor
2020-05-12 16:55:03 -04:00
W. Felix Handte
afc2488973
Handle Non-Static CCtxes in Estimation
2020-05-12 16:54:33 -04:00
W. Felix Handte
7ed996f5a0
Consolidate CCtx Size Estimation Code
...
This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams`
into a helper. It then migrates two other callsites to use that helper,
a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which
folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`,
which is more invasive.
This attempts to guarantee that the estimates returned to users are always
correct.
2020-05-12 16:26:53 -04:00
Nick Terrell
3c1eba4d99
[lib] Fix lazy repcode validity checks
2020-05-12 12:25:06 -07:00