sen
84ccb81e7c
Merge pull request #2561 from senhuang42/longlength_enum
...
Add enum for representing long length ID
2021-03-26 15:55:12 -04:00
Sen Huang
b1a43455f8
Add enum for representing long length ID
2021-03-26 10:41:09 -07:00
sen
4fe2e7ae14
Merge pull request #2558 from senhuang42/msan_block_splitter_fix
...
Fix block splitter minor MSAN warning.
2021-03-25 13:51:43 -04:00
sen
b0407b9f0e
Merge pull request #2555 from senhuang42/default_clevel_func
...
Add ZSTD_defaultCLevel() function to public API
2021-03-25 13:07:28 -04:00
Sen Huang
2a907bf4aa
Move lastCountSize into a returned struct, fix MSAN error
2021-03-25 09:11:15 -07:00
Sen Huang
e398744a35
Add ZSTD_defaultCLevel() function to public API
2021-03-25 08:04:00 -07:00
Nick Terrell
f8ac0ea7ef
Merge pull request #2539 from terrelln/linux-kernel-fixes
...
Fixes for the next linux kernel patch version
2021-03-24 10:34:29 -07:00
sen
bf542c8a8d
Merge pull request #2447 from senhuang42/block_splitter_v2
...
Recursive block splitting
2021-03-24 12:27:22 -04:00
Sen Huang
5b566ebe08
Rename *compressSequences*() functions for clarity
2021-03-24 08:21:29 -07:00
Sen Huang
0ef1f935b7
Add a fallback in case the total blocksize of split blocks exceeds raw block size
2021-03-24 08:21:29 -07:00
Sen Huang
c90e81a692
Enable block splitter by default when applicable
2021-03-24 08:21:29 -07:00
Sen Huang
e34332834a
Clean up various functions, add debuglogging for estimate vs. actual sizes
2021-03-24 08:21:29 -07:00
Sen Huang
41c3eae6d9
Fix various fuzzer failures: repcode history, superblocks
2021-03-24 08:21:29 -07:00
senhuang42
0633bf17c3
Change 1.3.4 bugfix to be cross-compatible with superblocks and normal compression
2021-03-24 08:21:29 -07:00
senhuang42
eb1ee8686d
Refactor buildSequencesStatistics() to avoid pointer increment for superblocks
2021-03-24 08:21:29 -07:00
senhuang42
e2bb215117
Add unit tests and fuzzer param
2021-03-24 08:21:09 -07:00
senhuang42
de52de1347
Add recursive block split algorithm
2021-03-24 08:21:09 -07:00
senhuang42
f06f6626ed
Update function names for consistency
2021-03-24 08:20:54 -07:00
senhuang42
c56d6e49e8
Add block splitter to experimental params
2021-03-24 08:20:54 -07:00
senhuang42
2949a95224
Refactor block compression logic into single function
2021-03-24 08:20:54 -07:00
senhuang42
c05c090cc2
Centralize entropy statistics calculations to zstd_compress.c
2021-03-24 08:20:29 -07:00
sen
c48889f097
Merge pull request #2538 from senhuang42/monotonicity_test
...
Add memory monotonicity test over srcSize
2021-03-22 16:54:34 -04:00
Sen Huang
dff4a0e867
Make ZSTD_estimateCCtxSize_internal() loop through all srcSize parameter sets as well
2021-03-21 16:15:31 -07:00
Sen Huang
77ae664ba6
Fix ZSTD_dedicatedDictSearch_isSupported() requirements
2021-03-16 17:36:05 -07:00
senhuang42
386111adec
Add a nbSeq argument to compressSequences()
...
Refactor ZSTD_compressBlock_internal() to do the block header write within and add nbSeq argument to compressSequences()
2021-03-16 14:04:22 -07:00
senhuang42
98764493cf
Move block header write into compressBlock_internal()
2021-03-16 14:04:22 -07:00
Nick Terrell
cd1551d261
[lib][tracing] Add ZSTD_NO_TRACE macro
...
When defined, it disables tracing, and avoids including the header.
2021-03-16 11:47:27 -07:00
Nick Terrell
7736549bea
[bug-fix] Make simple single-pass functions ignore advanced parameters
...
The simple compression functions are intended to ignore the advanced
parameters, but they were accidentally using them. All the
`ZSTD_parameters` were set correctly, but any extra parameters were
used as-is. E.g. `ZSTD_c_format`.
This PR makes all the simple single-pass functions listed below ignore
the advanced parameters, as intended.
* `ZSTD_compressCCtx()`
* `ZSTD_compress_usingDict()`
* `ZSTD_compress_usingCDict()`
* `ZSTD_compress_advanced()`
* `ZSTD_compress_usingCDict_advanced()`
It also adds a test case that ensures that each of these functions
ignore the advanced parameters.
2021-02-12 19:11:23 -08:00
Nick Terrell
c62eb05964
[lib] Set appliedParams.compressionLevel correctly
...
Forward the correct compressionLevel to the appliedParams in all cases.
It was already correct for the advanced API, so only the old single-pass
functions needed to be fixed.
This compression level is unused by the library, but is set so that the
tracing framework can consume it.
2021-02-12 15:00:14 -08:00
Nick Terrell
f520f6dfbe
[trace] Minor fixes found during integration
...
* Mark `ZSTD_CCtx_getParameter()` as const
* Add `extern "C"` guards to `zstd_trace.h`
2021-02-11 16:20:04 -08:00
Yann Collet
8884cb887d
Merge pull request #2483 from mpu/ldmgear
...
New algorithms for the long distance matcher
2021-02-11 08:38:23 -08:00
Quentin Carbonneaux
552efcac2d
relocate large arrays from the stack to ldmState_t
2021-02-10 16:16:54 +01:00
Nick Terrell
e59c9459a5
[trace] Keep track of a uint64_t tracing context
...
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.
This keeps the simple case simple.
2021-02-09 11:37:05 -08:00
Nick Terrell
54a4998a80
Add basic tracing functionality
2021-02-05 16:28:52 -08:00
Quentin Carbonneaux
1e65711ca5
a couple performance improvement changes for ldm
2021-01-20 00:54:20 -08:00
Nick Terrell
58476bcf7f
Don't shrink window log in ZSTD_getCParams()
...
Treat ZSTD_getCParams() and ZSTD_adjustCParams() in the same way
we treat streaming compression. Choose parameters based on the
dictionary size + source size, and assume the source size is small
if unkown. But, don't shrink the window log down in
ZSTD_adjustCParams_internal().
2021-01-04 15:54:09 -08:00
Nick Terrell
9d31c704d5
Don't shrink window log when streaming with a dictionary
...
Fixes #2442 .
1. When creating a dictionary keep the same behavior as before.
Assume the source size is 513 bytes when adjusting parameters.
2. When calling ZSTD_getCParams() or ZSTD_adjustCParams() keep
the same behavior as before.
3. When attaching a dictionary keep the same behavior of ignoring
the dictionary size. When streaming this will select the
largest parameters and not adjust them down. But, the CDict
will use the correctly sized parameters, which seems like the
right tradeoff.
4. When not attaching a dictionary (either forced not to, or
using a prefix dictionary) we select parameters based on the
dictionary size + source size, and assume the source size is
small, which is the same behavior as before. But, now we don't
adjust the window log (and hash and chain log) down when the
source size is unknown.
When the source size is unknown all cdicts should attach, except
when the user disables attaching, or `forceWindow` is used. This
means that when streaming with a CDict we end up in the good case
where we get small CDict parameters, and large source parameters.
TODO: Add a streaming + dictionary regression test case.
2021-01-04 15:54:09 -08:00
Nick Terrell
66e811d782
[license] Update year to 2021
2021-01-04 17:53:52 -05:00
senhuang42
5c41490bfe
Use pre-defined constants
2020-12-21 11:52:05 -05:00
senhuang42
7e11bd012b
Implement skippable frame function
2020-12-21 11:13:22 -05:00
Yann Collet
0b39531d75
moving all references to release
branch
...
was previously `master`
2020-12-16 23:00:35 -08:00
W. Felix Handte
9dab03db90
Create Enum to Represent Static/Dynamic Allocation Distinction in cwksp
2020-12-09 14:57:37 -05:00
W. Felix Handte
db9e73cb07
Don't ASAN-Poison Statically-Allocated Workspaces
...
Addresses #2286 .
2020-12-09 13:00:47 -05:00
Nick Terrell
c238db046f
Merge pull request #2414 from terrelln/mt-progress
...
[lib] Ensure that multithreaded compression always makes some progress
2020-12-04 16:30:08 -08:00
Nick Terrell
4c58cb8383
[lib] Ensure that multithreaded compression always makes some progress
2020-12-03 20:25:14 -08:00
Nick Terrell
6672689e7e
Merge pull request #2406 from terrelln/linux-wrapper-api
...
[linux] Add the linux wrapper API
2020-12-02 16:49:03 -08:00
Nick Terrell
894ae36675
Merge pull request #2390 from animalize/clamp_level
...
Clamp compression level
2020-12-02 14:35:58 -08:00
senhuang42
2cbd038528
Move max nb seq check to per-block
2020-12-02 12:11:32 -05:00
Nick Terrell
3cda5fae77
[minor][lib] Remove double semicolon
2020-12-02 01:08:08 -08:00
senhuang42
3efe9c902b
Add sequence nb validation to compressSequences(), adjust minMatch comparisons
2020-12-01 10:54:45 -05:00
senhuang42
4c5f337248
Use cctx's minMatch instead of global MINMATCH, make fuzzer use validation
2020-11-30 15:41:20 -05:00
sen
c5fbd55dac
Merge pull request #2387 from senhuang42/compress_sequence_API
...
[RFC] New sequence compression API
2020-11-20 16:54:20 -05:00
senhuang42
7742f076b4
Add experimental param for sequence validation
2020-11-20 11:57:41 -05:00
senhuang42
0e32928b7d
Remove unnecessary repcode backup, apply style choices, use function pointer
2020-11-20 11:02:19 -05:00
sen
e924a0fa51
Explicit cast for visual warnings
...
Github has automatic commits now! Cool
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2020-11-19 17:32:40 -05:00
senhuang42
dcbbf7c09f
Unroll isRLE loop
2020-11-19 12:38:13 -05:00
senhuang42
05c0229668
Clean up visual conversion warnings
2020-11-18 15:36:29 -05:00
senhuang42
d6d7ba2a1f
Modification to offset validation to include entire sequence
2020-11-17 10:13:22 -05:00
senhuang42
8f3136a9c7
Fix assert edge case, improve documentation in zstd.h
2020-11-16 18:05:35 -05:00
senhuang42
f6baad87d6
Fix warnings and make validation enabled by default
2020-11-16 12:00:06 -05:00
senhuang42
55b90ef010
Fix unit tests to agree with new changes
2020-11-16 11:36:37 -05:00
senhuang42
7f563b0519
Add new sequence format as an experimental CCtx param
2020-11-16 10:49:17 -05:00
senhuang42
347824ad73
Overhaul logic to simplify, add in proper validations, fix match splitting
2020-11-16 10:49:17 -05:00
senhuang42
46824cb018
Add new sequence compress api params to cctx
2020-11-16 10:49:17 -05:00
senhuang42
48405b4633
Fix srcSize=0 edge case
2020-11-16 10:49:17 -05:00
senhuang42
022e6d81e7
Fix literals length calculation
2020-11-16 10:49:17 -05:00
senhuang42
dad20b5ccb
Remove dstCapacity error check
2020-11-16 10:49:17 -05:00
senhuang42
b8e16a2057
Remove extraneous function in this API
2020-11-16 10:49:17 -05:00
senhuang42
f29507c4fc
Add check comparing offset to window size
2020-11-16 10:49:17 -05:00
senhuang42
7a6e46a92f
Fix MSAN errors
2020-11-16 10:49:17 -05:00
senhuang42
cc2642bd17
Address edge case with endPosInSequence
2020-11-16 10:49:17 -05:00
senhuang42
fd10007174
Change debug levels to appropriate ones
2020-11-16 10:49:17 -05:00
senhuang42
2db8441245
Add RLE support
2020-11-16 10:49:17 -05:00
senhuang42
dfef298336
Fix various build warnings
2020-11-16 10:49:17 -05:00
senhuang42
2bbdddf24e
Add test case to roundtrip using ZSTD_getSequences() and ZSTD_compressSequences()
2020-11-16 10:49:16 -05:00
senhuang42
5fd69f8173
Add documentation for new api functions
2020-11-16 10:49:16 -05:00
senhuang42
e8b7fdb64b
Refactor for enhanced code clarity
2020-11-16 10:49:16 -05:00
senhuang42
c675fb46f1
Rename internal function compressSequences(), and promote new *_ext() functions to their actual name
2020-11-16 10:49:16 -05:00
senhuang42
013434e1e4
Add another API function to compress with existing CCTX
2020-11-16 10:49:16 -05:00
senhuang42
c44ce29013
More adjustments to improve code clarity
2020-11-16 10:49:16 -05:00
senhuang42
48f67da854
Pull compressStream2() transparent initialization into its own function
2020-11-16 10:49:16 -05:00
senhuang42
c86151f53c
Add initial support for new ZSTD_Sequence mode
2020-11-16 10:49:16 -05:00
senhuang42
e0f26afce9
Add sequence compression format param
2020-11-16 10:49:16 -05:00
senhuang42
f51af9a609
Always ensure sequenceRange updates properly, add more error forwarding
2020-11-16 10:49:16 -05:00
senhuang42
1a449688fd
Various minor logical refactors to improve clarity
2020-11-16 10:49:16 -05:00
senhuang42
e5fe485dcc
Fix cSize calculation for noCompressBlocks
2020-11-16 10:49:16 -05:00
senhuang42
6145ebb400
Rebased, roundtrips silesia.tar
2020-11-16 10:49:16 -05:00
senhuang42
b5b61cc216
Refactor for better debugging info
2020-11-16 10:49:16 -05:00
senhuang42
293fad6b45
Corrections and edge-case fixes to be able to roundtrip dickens
2020-11-16 10:49:16 -05:00
senhuang42
7eb6fa7be4
Multi-block compression scaffolding - works on single-block files
2020-11-16 10:49:16 -05:00
senhuang42
75b01f34b9
Add support for uncompressible blocks
2020-11-16 10:49:16 -05:00
senhuang42
e04da68157
Enable usage of ZSTD_sequenceRange for single-block compression
2020-11-16 10:49:16 -05:00
senhuang42
337fac216d
Add logic to handle ZSTD_sequenceRange
2020-11-16 10:49:16 -05:00
senhuang42
85822ddd53
Add last literals handling like getSequences()
2020-11-16 10:49:16 -05:00
senhuang42
2cff8df1a2
Pull block compression out of main compressSequences() function
2020-11-16 10:49:16 -05:00
senhuang42
cfced9344a
Implement ZSTD_updateSequenceRange
2020-11-16 10:49:16 -05:00
senhuang42
b116e1f211
Modify SequenceRange to have posInSequence
2020-11-16 10:49:16 -05:00
senhuang42
d99b675112
Add function definition for sequenceRange updater
2020-11-16 10:49:16 -05:00
senhuang42
74e95c05cc
Add ZSTD_SequenceRange to count ranges in array of ZSTD_Sequence
2020-11-16 10:49:16 -05:00
senhuang42
89f3848310
Add support for repcodes
2020-11-16 10:49:16 -05:00
senhuang42
3e930fd044
Code cleanup, add debuglog statments
2020-11-16 10:49:16 -05:00
senhuang42
086513b5b9
Implement first pass at compressSequences()
2020-11-16 10:49:16 -05:00
senhuang42
a9327b1e9b
Add initial function prototype for ZSTD_compressSequences_ext (to be renamed later)
2020-11-16 10:33:35 -05:00
animalize
52f8c07a3f
Clamp compression level in ZSTD_getCParams_internal() function
2020-11-14 13:26:08 +08:00
senhuang42
9d936d61d2
Reduce number of memcpy() calls
2020-11-13 19:43:30 -05:00
senhuang42
be4ac6c5bc
Use existing repcode update function to implement updates
2020-11-12 16:51:12 -05:00
senhuang42
674c9b9235
Add in proper block repcode histories
2020-11-12 15:34:37 -05:00
senhuang42
06c7f14066
Let block reps persist
2020-11-12 12:24:44 -05:00
senhuang42
396275068c
Fix incorrect repcode setting
2020-11-12 11:57:01 -05:00
senhuang42
1a8af0de73
Improve unit test
2020-11-12 11:09:09 -05:00
senhuang42
4d4fd2c55f
Overhaul repcode handling logic
2020-11-12 10:59:35 -05:00
sen
f62edf0fe9
Merge pull request #2381 from senhuang42/expand_sequence_extraction_api
...
Add enum to define ZSTD_Sequence type and update sequence extraction API
2020-11-06 13:00:31 -05:00
senhuang42
7d1dea070c
Update unit tests
2020-11-06 11:10:37 -05:00
senhuang42
779df995c6
Implement mergeGeneratedSequences()
2020-11-06 10:55:46 -05:00
senhuang42
51abd58208
Rename getSequences() to generateSequences()
2020-11-06 10:53:22 -05:00
Luke Pitt
eac309c71b
Add ZSTD_getDictID_fromCDict function to experimental section
2020-11-04 11:37:37 +00:00
senhuang42
f782cac3d4
Change block delimiter removing to linear time approach
2020-11-02 17:06:20 -05:00
senhuang42
3434049c1f
Use ZSTD_memmove() instead of memmove()
2020-11-02 11:43:19 -05:00
senhuang42
d4d0346b40
Update name of enum, clarify documentation
2020-11-02 11:38:17 -05:00
senhuang42
e6178f837f
Revert unnecessary seqCollector adjustment
2020-11-02 10:59:20 -05:00
senhuang42
e8501e00b8
Fix incorrect index increment in merge algorithm
2020-11-02 10:58:41 -05:00
senhuang42
a36fdada57
Add algorithm to remove all delimiters
2020-11-02 10:46:52 -05:00
senhuang42
435a3a0428
Update seqCollector definition
2020-11-02 10:19:26 -05:00
senhuang42
3327932609
Update ZSTD_getSequences function signature
2020-11-02 10:17:59 -05:00
Nick Terrell
7205e609a9
Merge pull request #2354 from terrelln/stable-buffer
...
Add ZSTD_c_stable{In,Out}Buffer and optimize when set
2020-10-30 15:06:56 -07:00
sen
c37c714ef1
Merge pull request #2376 from senhuang42/clarify_sequence_extraction_api
...
Refine external ZSTD_Sequence API
2020-10-30 15:47:25 -04:00
Nick Terrell
d4e021fe35
[lib] Avoid allocating the input buffer when ZSTD_c_stableInBuffer is set
...
We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
2020-10-30 10:55:34 -07:00
Nick Terrell
24f72789e2
[lib] Skip the input window buffer when ZSTD_c_stableInBuffer is set
...
Compress directly from the `ZSTD_inBuffer`. We still allocate the input
buffer. A following commit will remove that allocation.
2020-10-30 10:55:34 -07:00
Nick Terrell
fcf81cee5e
[lib] Avoid allocating output buffer when ZSTD_c_stableOutBuffer is set
...
We compress directly to the `ZSTD_outBuffer` so we don't need to
allocate it.
2020-10-30 10:55:34 -07:00
Nick Terrell
6d5dc93d4e
[lib] Compress directly into output when ZSTD_c_stableOutBuffer is set
...
When we have a stable output buffer always compress directly into the
`ZSTD_outBuffer`. We are allowed to return `dstSizeTooSmall`.
2020-10-30 10:55:34 -07:00
Nick Terrell
987cb4ca6a
[lib] Take the shortcut when ZSTD_c_stableOutBuffer is set
...
When we have a stable output buffer take the single-pass shortcut.
It is okay to return `dstSizeTooSmall` if the output buffer isn't
big enough, because we know it will never grow.
2020-10-30 10:55:34 -07:00
Nick Terrell
809b2f2071
[lib] Set ZSTD_c_stable{In,Out}Buffer in ZSTD_compress2()
...
Sets these parameters in ZSTD_compress2() then resets them to their
orignal values after the compression call.
An alternative design could be to add a flush mode `ZSTD_e_singlePass`
which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single
compression call, by directly setting the applied parameters. I've opted
for the smaller change, but this is open for discussion.
2020-10-30 10:55:34 -07:00
Nick Terrell
c74be3f6de
[lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set
...
Adds the validation of the input/output buffers only. They are still
unused.
2020-10-30 10:55:34 -07:00
Nick Terrell
e3e0775cc8
[API] Add ZSTD_c_stable{In,Out}Buffer parameters
...
This commit adds the parameters and sets the value in the CCtxParams
but it does not do anything with the value.
2020-10-30 10:54:39 -07:00
Nick Terrell
e2581d9572
[lib] Set appliedParams in zstdmt mode
...
Previously only `nbWorkers` was set. Set all parameters, because that is
what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer`
parameters.
2020-10-30 10:54:38 -07:00
senhuang42
536e89c723
Sequence extractor should update CBlockState
2020-10-30 12:13:19 -04:00
senhuang42
32cac2627a
Emit last literals of 0 size as well, to indicate block boundary
2020-10-29 16:41:17 -04:00
senhuang42
69bd5f0654
Correct literalsRead calculation to include longLength
2020-10-29 14:49:37 -04:00
senhuang42
59624f3163
Remove implicit typecast to appease appVeyor windows build
2020-10-28 16:25:09 -04:00
senhuang42
3ed5d053d8
Clarify comments in zstd.h some more
2020-10-28 09:53:09 -04:00
sen
17b700d78a
Merge pull request #2366 from senhuang42/enable_ldm_by_default
...
Enable LDM by default if window size >= 128MB and strategy uses opt parser
2020-10-27 14:59:28 -04:00
senhuang42
3163909d14
Remove unused variable position
2020-10-27 12:58:12 -04:00
senhuang42
dc448563e9
Add test compatibility with last literals in sequences
2020-10-27 12:35:28 -04:00
senhuang42
1d221ecc03
Add support for representing last literals in the extracted seqs
2020-10-27 11:19:48 -04:00
senhuang42
9171f920cd
Improve documentation of seqStore_t
2020-10-27 10:50:22 -04:00
senhuang42
96b0ff7886
Improve documentation regarding various operations in copyBlockSequences
2020-10-27 10:36:06 -04:00
senhuang42
3a11c7eb03
Modify ZSTD_copyBlockSequences to agree with new API
2020-10-27 10:31:40 -04:00
senhuang42
8bdb32aebe
Add a function for LDM enable check
2020-10-20 13:46:02 -04:00
senhuang42
578e889ec1
Move ldm enable to compressStream2()
2020-10-20 13:04:45 -04:00
senhuang42
d28d8a1d72
Include LDM tables size for CCtx size estimation where relevant
2020-10-20 09:21:30 -04:00