Commit Graph

2136 Commits

Author SHA1 Message Date
Stephen Kitt
e81d567547
Distinguish static symbols, allow hiding them
Even with -fvisibility=hidden added to CFLAGS, any symbol which is
given a default visibility attribute ends up exported in the dynamic
library. This happens through zstd_internal.h which defines
..._STATIC_LINKING_ONLY before including various header files, and is
included for example in lib/common/pool.c.

To avoid this, this patch distinguishes static and non-static APIs, by
using ZSTDLIB_API only for the latter, and introducing
ZSTDLIB_STATIC_API for the former. For now, both are exported, but
non-static APIs can be hidden by overriding the definition
ZSTDLIB_STATIC_API. lib/Makefile is modified to allow this using

	make CPPFLAGS_DYNLIB=-DZSTDLIB_STATIC_API=ZSTDLIB_HIDDEN

In addition, API declarations are dropped from zstd_compress.c (they
aren't needed there).

Signed-off-by: Stephen Kitt <steve@sk2.org>
2021-05-14 19:41:59 +02:00
Nick Terrell
03c4111299 [lib] Fix dictionary invalidation logic
Call `ZSTD_enforceMaxDist()` before each block with the beginning of the
block. This ensures that `lowLimit` is updated to `dictLimit` whenever
the ext-dict is out of range, so we can use prefix mode for speed.

This can cause non-determinism because prefix mode and ext-dict mode
match finders can return different results. It can also hurt speed
because ext-dict match finders are slower.

The scenario is:
1. Compress large data with a dictionary.
2. The dictionary goes out of bounds, so we invalidate it.
3. However, we still have `lowLimit < dictLimit`, since it is
   never updated.
4. We will call the ext-dict match finder instead of the prefix one.
2021-05-13 17:05:59 -07:00
Nick Terrell
10b35b312b [lib] Fix off-by-one error in repcode checks
The repcode checks disallowed repcodes that are equal to `windowLow`.
This is slightly inefficient, but isn't a problem on its own. Together
with the next commit, it cause non-determinism.
2021-05-13 17:05:59 -07:00
Nick Terrell
91c9a247b6 [lib] Fix determinism bug in the optimal parser
`ZSTD_insertBt1()` has a speed optimization that skips the prefix of
very long matches.

40def70387/lib/compress/zstd_opt.c (L476)

This optimization is based off the length longest match found. However,
when indices are reset, we only ensure that we can reference the whole
window starting from `ip`. If the previous block ended with a long match
then `nextToUpdate` could be much less than `ip`. It might be far enough
back that `nextToUpdate < maxDist`, so it doesn't have a full window of
data to reference. This can cause non-determinism bugs, because we may
find a match that is beyond `ip - maxDist`, and may sometimes be
un-referencable, and that match triggers the speed optimization.

The fix is to base the `windowLow` off of the `target` of
`ZSTD_updateTree_internal()`, because anything below that value will be
obsolete by the time `ZSTD_updateTree_internal()` completes.
2021-05-13 17:05:59 -07:00
Yann Collet
cb0cad9b79 reduce Max nb Workers to 64 in 32-bit mode
and restored limit to 256 when in 64-bit mode
(it was reduced to 200 to give more room for 32-bit).

This should fix test instability issues
using lot of threads in 32-bit environments.
2021-05-12 13:10:25 -07:00
Nick Terrell
66772efe73
Merge pull request #2627 from terrelln/timeout-fix
[lib] Fix fuzzer timeouts by backing off overflow correction
2021-05-07 10:55:26 -07:00
sen
9e94b7cac5
Assert no divison by 0, correct superblocks 0 sequences case (#2592) 2021-05-07 13:26:56 -04:00
Nick Terrell
c2555f8c6f [lib] Fix fuzzer timeouts by backing off overflow correction
Linearly back off the frequency of overflow correction based on the
number of times the `ZSTD_window_t` has been overflow corrected. This
will still allow the fuzzer to quickly find overflow correction bugs,
while also keeping good speed for larger inputs.

Additionally, the `nbOverflowCorrections` variable can be useful for
debugging coredumps, since we can inspect the `ZSTD_CCtx` to see if
overflow correction has happened yet.

I've verified this fixes the timeouts in OSS-Fuzz (176 seconds -> 6
seconds). I've also verified that fuzzers and `fuzzer` and `zstreamtest`
still catch the row-hash overflow correction bug.
2021-05-06 22:03:41 -07:00
sen
698f261b35
[1.5.0] Deprecate some functions (#2582)
* Add deprecated macro to zstd.h, mark certain functions as deprecated

* Remove ZSTD_compress.c dependencies on deprecated functions
2021-05-06 17:59:32 -04:00
Nick Terrell
207e33bb61
Merge pull request #2616 from terrelln/deterministic-dict
[lib] Add ZSTD_c_deterministicRefPrefix
2021-05-06 11:09:22 -07:00
Nick Terrell
172b4b6ac4 [lib] Add ZSTD_c_deterministicRefPrefix
This flag forces zstd to always load the prefix in ext-dict mode, even
if it happens to be contiguous, to force determinism. It also applies to
dictionaries that are re-processed.

A determinism test case is also added, which fails without
`ZSTD_c_deterministicRefPrefix` and passes with it set.

Question: Should this be the default behavior? It isn't in this PR.
2021-05-05 18:49:56 -07:00
Nick Terrell
eb7e74ccb7 [tests] Set DEBUGLEVEL=2 by default
This allows us to quickly check for compile errors in debug log
messages, which are compiled out when `DEBUGLEVEL < 2`.
2021-05-05 13:29:06 -07:00
Nick Terrell
c2183d7cdf [lib] Move some ZSTD_CCtx_params off the stack
* Take `params` by const reference in `ZSTD_resetCCtx_internal()`.
* Add `simpleApiParams` to the CCtx and use them in the simple API
  functions, instead of creating those parameters on the stack.

I think this is a good direction to move in, because we shouldn't need
to worry about adding parameters to `ZSTD_CCtx_params`, since it should
always be on the heap (unless they become absoultely gigantic).

Some `ZSTD_CCtx_params` are still on the stack in the CDict functions,
but I've left them for now, because it was a little more complex, and we
don't use those functions in stack-constrained currently.
2021-05-05 13:25:16 -07:00
Yann Collet
c077f257b4
Merge pull request #2611 from facebook/smallerJobs
allow jobSize to be as low as 512 KB
2021-05-05 00:03:29 -07:00
Nick Terrell
8389a5122b
Merge pull request #2602 from terrelln/ldm-opt
[LDM] Speed optimization on repetitive data
2021-05-04 23:13:09 -07:00
Nick Terrell
d40f55cd95
Merge pull request #2610 from senhuang42/lazy_underflow_fix
Fix bad integer wraparound in repcode index for fast, dfast, lazy
2021-05-04 23:10:23 -07:00
Nick Terrell
0b88c2582c [test] Add large dict/data --patch-from test
Dictionary size must be > `ZSTD_CHUNKSIZE_MAX`.
2021-05-04 17:31:32 -07:00
Sen Huang
e6c8a5dd40 Fix incorrect usages of repIndex across all strategies 2021-05-04 19:50:55 -04:00
Nick Terrell
94db4398a0 [lib] Always load the dictionary in one go
Dictionaries larger than `ZSTD_CHUNKSIZE_MAX` used to have to be loaded
in multiple segments. Instead, when we detect large dictionaries, ensure
that we reset the context's indicies. Then, for dictionaries larger than
`ZSTD_CURRENT_MAX - 1`, only load the suffix of the dictionary. Finally,
enable DDS for large dictionaries, since we no longer load in multiple
segments.

This simplifes the dictionary loading code, and reduces opportunities
for non-determinism to slip in.
2021-05-04 16:45:25 -07:00
Yann Collet
1026b9fa10 fix rsyncable mode 2021-05-04 15:59:27 -07:00
Nick Terrell
1ffa80a09e [easy] Rewrite rowHashLog computation
`ZSTD_highbit32(1u << x) == x` when it isn't undefined behavior.
2021-05-04 11:43:20 -07:00
Yann Collet
8f86c29c06 allow jobSize to be as low as 512 KB
previous lower limit was 1 MB.

Note : by default, the lowest job size is 2 MB, achieved at level 1.
Even lower job sizes can be achieved by manipulating this value directly,
or manually modifying window sizes to lower amounts.

Updated unit test to ensure that this new limit works fine
(test would fail with previous 1 MB limit).
2021-05-04 11:02:55 -07:00
Nick Terrell
32823bc150 [LDM] Speed optimization on repetitive data
LDM does especially poorly on repetitive data when that data's hash happens
to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or
random chance. Optimize this case by skipping over repetitive patterns.
The detection is very simplistic, but should catch most of the offending
cases.

```
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
      21.187881087 seconds time elapsed

head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
       1.149707921 seconds time elapsed

```
2021-05-04 10:57:42 -07:00
Nick Terrell
34aff7ea06 Bug fix & run overflow correction much more frequently in tests
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
  got the correction wrong in this case, and our chain tables and binary
  trees would be corrupted. Now, we work as long as `maxDist` is a power
  of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
  run overflow correction as frequently as allowed without impacting
  compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
  `zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
  speed penalty at most, which seems reasonable.
2021-05-03 15:21:47 -07:00
senhuang42
61fe571af6 Fix chaintable check to include rowhash in ZSTD_reduceIndex() 2021-04-30 19:52:04 -04:00
Nick Terrell
6cee3c2c4f [trace] Remove default definitions of weak symbols
Instead of providing a default no-op implementation, check the symbols
for `NULL` before accessing them. Providing a default implementation
doesn't reliably work with dynamic linking. Depending on link order the
default implementations may not be overridden. By skipping the default
implementation, all link order issues are resolved. If the symbols
aren't provided the weak function will be `NULL`.
2021-04-26 16:05:39 -07:00
felixhandte
efa6dfa729 Apply DDS adjustments to avoid assert failures 2021-04-23 16:41:00 -04:00
sen
12c045f74d
Merge pull request #2574 from senhuang42/repcode_mismatch_detector_fix
Correct the block splitter mismatched repcodes detection.
2021-04-12 23:27:43 -04:00
Sen Huang
8844f93957 Adjust nb elements to prefetch in ZSTD_row_fillHashCache() 2021-04-12 14:24:58 -04:00
Sen Huang
550f76f131 Correct the detection of mismatched repcodes 2021-04-09 09:08:51 -07:00
Sen Huang
4d63d6e8aa Update results.csv, add Row hash to regression test 2021-04-07 10:31:41 -07:00
Nick Terrell
4694423c4f Add and integrate lazy row hash strategy 2021-04-07 09:53:34 -07:00
sen
f71aabb5b5
Move clevel override to after initLocalDict() (#2571) 2021-04-06 21:05:37 -04:00
sen
f1e8b565c2
Maintain two repcode histories for block splitting, replace invalid repcodes (#2569) 2021-04-06 17:25:55 -04:00
sen
e38124555e
Fix dictionary force reloading clevel selection (#2570)
* Move cdict clevel override to before localdict init

* Update results.csv after dict load changes
2021-04-06 15:35:09 -04:00
Nick Terrell
8383fc828d
Merge pull request #2541 from ihsinme/patch-1
simple fix for using bit operator.
2021-04-02 13:01:09 -07:00
sen
980f3bbf83
[cwksp] Align all allocated "tables" and "aligneds" to 64 bytes (#2546)
* Perform 64-byte alignment of wksp tables and aligneds internally

* Clean up cwskp_finalize() function to only do two allocs

* Refactor aligned/buffer reservation code, remove ASAN req for alignment reservations

* Change from allocating 128 bytes always to allocating only buffer space as needed for tables/aligned

* Back out aligned/table reservation order restriction

* Add stricter bounds for new/resized wksps, fix comment in zstd_cwksp.h
2021-04-01 20:07:19 -04:00
sen
255925c231
Fix repcode-related OSS-fuzz issues in block splitter (#2560)
* Do not emit last partitions of blocks as RLE/uncompressed

* Fix repcode updates within block splitter

* Add a entropytables confirm function, redo ZSTD_confirmRepcodesAndEntropyTables() for better function signature

* Add a repcode updater to block splitter, no longer need to force emit compressed blocks
2021-03-31 15:14:59 -04:00
Nick Terrell
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
sen
84ccb81e7c
Merge pull request #2561 from senhuang42/longlength_enum
Add enum for representing long length ID
2021-03-26 15:55:12 -04:00
Sen Huang
b1a43455f8 Add enum for representing long length ID 2021-03-26 10:41:09 -07:00
sen
4fe2e7ae14
Merge pull request #2558 from senhuang42/msan_block_splitter_fix
Fix block splitter minor MSAN warning.
2021-03-25 13:51:43 -04:00
sen
b0407b9f0e
Merge pull request #2555 from senhuang42/default_clevel_func
Add ZSTD_defaultCLevel() function to public API
2021-03-25 13:07:28 -04:00
Sen Huang
2a907bf4aa Move lastCountSize into a returned struct, fix MSAN error 2021-03-25 09:11:15 -07:00
Sen Huang
e398744a35 Add ZSTD_defaultCLevel() function to public API 2021-03-25 08:04:00 -07:00
Nick Terrell
f8ac0ea7ef
Merge pull request #2539 from terrelln/linux-kernel-fixes
Fixes for the next linux kernel patch version
2021-03-24 10:34:29 -07:00
sen
bf542c8a8d
Merge pull request #2447 from senhuang42/block_splitter_v2
Recursive block splitting
2021-03-24 12:27:22 -04:00
Sen Huang
5b566ebe08 Rename *compressSequences*() functions for clarity 2021-03-24 08:21:29 -07:00
Sen Huang
0ef1f935b7 Add a fallback in case the total blocksize of split blocks exceeds raw block size 2021-03-24 08:21:29 -07:00
Sen Huang
c90e81a692 Enable block splitter by default when applicable 2021-03-24 08:21:29 -07:00
Sen Huang
e34332834a Clean up various functions, add debuglogging for estimate vs. actual sizes 2021-03-24 08:21:29 -07:00
Sen Huang
41c3eae6d9 Fix various fuzzer failures: repcode history, superblocks 2021-03-24 08:21:29 -07:00
senhuang42
0633bf17c3 Change 1.3.4 bugfix to be cross-compatible with superblocks and normal compression 2021-03-24 08:21:29 -07:00
senhuang42
eb1ee8686d Refactor buildSequencesStatistics() to avoid pointer increment for superblocks 2021-03-24 08:21:29 -07:00
senhuang42
e2bb215117 Add unit tests and fuzzer param 2021-03-24 08:21:09 -07:00
senhuang42
de52de1347 Add recursive block split algorithm 2021-03-24 08:21:09 -07:00
senhuang42
f06f6626ed Update function names for consistency 2021-03-24 08:20:54 -07:00
senhuang42
c56d6e49e8 Add block splitter to experimental params 2021-03-24 08:20:54 -07:00
senhuang42
2949a95224 Refactor block compression logic into single function 2021-03-24 08:20:54 -07:00
senhuang42
c05c090cc2 Centralize entropy statistics calculations to zstd_compress.c 2021-03-24 08:20:29 -07:00
sen
c48889f097
Merge pull request #2538 from senhuang42/monotonicity_test
Add memory monotonicity test over srcSize
2021-03-22 16:54:34 -04:00
Sen Huang
dff4a0e867 Make ZSTD_estimateCCtxSize_internal() loop through all srcSize parameter sets as well 2021-03-21 16:15:31 -07:00
ihsinme
a5bf09d764
simple fix for using bit operator.
good day.
It seems to me that the developer intended to use a logical operator.
so I suggest a simple fix.
2021-03-17 11:37:42 +03:00
Sen Huang
77ae664ba6 Fix ZSTD_dedicatedDictSearch_isSupported() requirements 2021-03-16 17:36:05 -07:00
senhuang42
386111adec Add a nbSeq argument to compressSequences()
Refactor ZSTD_compressBlock_internal() to do the block header write within and add nbSeq argument to compressSequences()
2021-03-16 14:04:22 -07:00
senhuang42
98764493cf Move block header write into compressBlock_internal() 2021-03-16 14:04:22 -07:00
Nick Terrell
cd1551d261 [lib][tracing] Add ZSTD_NO_TRACE macro
When defined, it disables tracing, and avoids including the header.
2021-03-16 11:47:27 -07:00
Nick Terrell
b5fd348a85
Merge pull request #2523 from terrelln/huf-stack-reduction
Add HUF_writeCTable_wksp() function
2021-03-05 12:35:09 -08:00
Nick Terrell
5df2a21f1e Add HUF_writeCTable_wksp() function
This saves ~700 bytes of stack space in HUF_writeCTable.
2021-03-05 10:29:18 -08:00
Nick Terrell
27498ff00f Reduce stack usage of ZSTD_buildCTable()
It is a stack high-point for some compression strategies and has an easy
fix. This moves the normalized count into the entropy workspace.
2021-03-04 16:12:11 -08:00
Nick Terrell
7736549bea [bug-fix] Make simple single-pass functions ignore advanced parameters
The simple compression functions are intended to ignore the advanced
parameters, but they were accidentally using them. All the
`ZSTD_parameters` were set correctly, but any extra parameters were
used as-is. E.g. `ZSTD_c_format`.

This PR makes all the simple single-pass functions listed below ignore
the advanced parameters, as intended.

* `ZSTD_compressCCtx()`
* `ZSTD_compress_usingDict()`
* `ZSTD_compress_usingCDict()`
* `ZSTD_compress_advanced()`
* `ZSTD_compress_usingCDict_advanced()`

It also adds a test case that ensures that each of these functions
ignore the advanced parameters.
2021-02-12 19:11:23 -08:00
Nick Terrell
c62eb05964 [lib] Set appliedParams.compressionLevel correctly
Forward the correct compressionLevel to the appliedParams in all cases.
It was already correct for the advanced API, so only the old single-pass
functions needed to be fixed.

This compression level is unused by the library, but is set so that the
tracing framework can consume it.
2021-02-12 15:00:14 -08:00
Nick Terrell
f520f6dfbe [trace] Minor fixes found during integration
* Mark `ZSTD_CCtx_getParameter()` as const
* Add `extern "C"` guards to `zstd_trace.h`
2021-02-11 16:20:04 -08:00
Yann Collet
8884cb887d
Merge pull request #2483 from mpu/ldmgear
New algorithms for the long distance matcher
2021-02-11 08:38:23 -08:00
Quentin Carbonneaux
552efcac2d relocate large arrays from the stack to ldmState_t 2021-02-10 16:16:54 +01:00
Nick Terrell
e59c9459a5 [trace] Keep track of a uint64_t tracing context
The most common information that you want to track between begin() and
end() is the timestamp of the begin function, so you can measure the
duration of the (de)compression call. Allow the tracing library to put
this information inside the `ZSTD_TraceCtx`, so it doesn't need to keep
a global map in this case. If a single uint64_t is not enough, the
tracing library can return a unique identifier (like the context
pointer) instead, and use it as a key in a map.

This keeps the simple case simple.
2021-02-09 11:37:05 -08:00
Quentin Carbonneaux
e2ad174d73 fix some compiler warnings 2021-02-08 20:19:16 +01:00
Nick Terrell
54a4998a80 Add basic tracing functionality 2021-02-05 16:28:52 -08:00
Yann Collet
b9748757b0 fixed minor cast warning 2021-02-05 09:55:54 -08:00
Quentin Carbonneaux
874a590e5c deal safely with short inputs in ZSTD_ldm_generateSequences
The fuzzer CI found this bug.
2021-02-04 11:15:24 +01:00
Quentin Carbonneaux
9f327c02fd new core ldm algorithm 2021-02-03 22:24:07 +01:00
Quentin Carbonneaux
aee3dc877f fix a variable name to reflect its nature 2021-01-22 02:24:19 -08:00
Quentin Carbonneaux
d6e3de77dc fix warning and remove one more occurrence of makeEntryAndInsertByTag 2021-01-20 01:39:16 -08:00
Quentin Carbonneaux
e0d5eca8fa fix forgotten numTagBits in getTagMask 2021-01-20 00:54:20 -08:00
Quentin Carbonneaux
1e65711ca5 a couple performance improvement changes for ldm 2021-01-20 00:54:20 -08:00
Thomas Waldmann
92a2b5ccc9 fixup: lits means literals 2021-01-07 23:30:42 +01:00
Thomas Waldmann
f9802d80a0 fix typos (work done by Andrea Gelmini) 2021-01-07 18:47:23 +01:00
Nick Terrell
58476bcf7f Don't shrink window log in ZSTD_getCParams()
Treat ZSTD_getCParams() and ZSTD_adjustCParams() in the same way
we treat streaming compression. Choose parameters based on the
dictionary size + source size, and assume the source size is small
if unkown. But, don't shrink the window log down in
ZSTD_adjustCParams_internal().
2021-01-04 15:54:09 -08:00
Nick Terrell
9d31c704d5 Don't shrink window log when streaming with a dictionary
Fixes #2442.

1. When creating a dictionary keep the same behavior as before.
   Assume the source size is 513 bytes when adjusting parameters.
2. When calling ZSTD_getCParams() or ZSTD_adjustCParams() keep
   the same behavior as before.
3. When attaching a dictionary keep the same behavior of ignoring
   the dictionary size. When streaming this will select the
   largest parameters and not adjust them down. But, the CDict
   will use the correctly sized parameters, which seems like the
   right tradeoff.
4. When not attaching a dictionary (either forced not to, or
   using a prefix dictionary) we select parameters based on the
   dictionary size + source size, and assume the source size is
   small, which is the same behavior as before. But, now we don't
   adjust the window log (and hash and chain log) down when the
   source size is unknown.

When the source size is unknown all cdicts should attach, except
when the user disables attaching, or `forceWindow` is used. This
means that when streaming with a CDict we end up in the good case
where we get small CDict parameters, and large source parameters.

TODO: Add a streaming + dictionary regression test case.
2021-01-04 15:54:09 -08:00
Nick Terrell
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
senhuang42
5c41490bfe Use pre-defined constants 2020-12-21 11:52:05 -05:00
senhuang42
7e11bd012b Implement skippable frame function 2020-12-21 11:13:22 -05:00
Yann Collet
a7cb4af573 added emphasis on the alignment condition of workspace
and made it a programming mistake (`assert()`)
rather than a runtime error.
2020-12-18 15:04:09 -08:00
Nick Terrell
ae85676d44 Fix alignment of scratchBuffer in HUF_compressWeights()
The scratch buffer must be 4-byte aligned. This causes test failures in
32-bit systems, where the stack isn't aligned.

Fixes Issue #2428.
2020-12-17 14:30:27 -08:00
Yann Collet
0b39531d75 moving all references to release branch
was previously `master`
2020-12-16 23:00:35 -08:00
Yann Collet
b8c3a473ec
Merge pull request #2420 from terrelln/huf-comment
[huf_compress] Refactor and comment HUF_buildCTable()
2020-12-14 16:14:07 -08:00
W. Felix Handte
9dab03db90 Create Enum to Represent Static/Dynamic Allocation Distinction in cwksp 2020-12-09 14:57:37 -05:00
W. Felix Handte
db9e73cb07 Don't ASAN-Poison Statically-Allocated Workspaces
Addresses #2286.
2020-12-09 13:00:47 -05:00
Nick Terrell
1bbcf07bd5 [huf_compress] Refactor and comment HUF_buildCTable()
Comment and refactor `HUF_buildCTable()` and the helper functions
it calls as I read and understand the code. Hopefully this refactor
makes the code a bit more clear.
2020-12-08 13:57:01 -08:00
Yann Collet
b86e3c9304
Merge pull request #2415 from facebook/fix_aliasing
fix gcc-10 strict aliasing warnings
2020-12-04 21:30:57 -08:00
Yann Collet
6132df8dd3 fix gcc-10 strict aliasing warnings
by exposing HUF_CElt declaration.
2020-12-04 16:43:19 -08:00
Yann Collet
68c14bdff2 minor speed improvement to HUF_readCTable()
faster by ~+1-2%
2020-12-04 16:33:39 -08:00
Nick Terrell
c238db046f
Merge pull request #2414 from terrelln/mt-progress
[lib] Ensure that multithreaded compression always makes some progress
2020-12-04 16:30:08 -08:00
Nick Terrell
4c58cb8383 [lib] Ensure that multithreaded compression always makes some progress 2020-12-03 20:25:14 -08:00
Nick Terrell
6672689e7e
Merge pull request #2406 from terrelln/linux-wrapper-api
[linux] Add the linux wrapper API
2020-12-02 16:49:03 -08:00
Nick Terrell
894ae36675
Merge pull request #2390 from animalize/clamp_level
Clamp compression level
2020-12-02 14:35:58 -08:00
senhuang42
2cbd038528 Move max nb seq check to per-block 2020-12-02 12:11:32 -05:00
Nick Terrell
3cda5fae77 [minor][lib] Remove double semicolon 2020-12-02 01:08:08 -08:00
senhuang42
3efe9c902b Add sequence nb validation to compressSequences(), adjust minMatch comparisons 2020-12-01 10:54:45 -05:00
senhuang42
4c5f337248 Use cctx's minMatch instead of global MINMATCH, make fuzzer use validation 2020-11-30 15:41:20 -05:00
sen
c5fbd55dac
Merge pull request #2387 from senhuang42/compress_sequence_API
[RFC] New sequence compression API
2020-11-20 16:54:20 -05:00
senhuang42
7742f076b4 Add experimental param for sequence validation 2020-11-20 11:57:41 -05:00
senhuang42
0e32928b7d Remove unnecessary repcode backup, apply style choices, use function pointer 2020-11-20 11:02:19 -05:00
sen
e924a0fa51
Explicit cast for visual warnings
Github has automatic commits now! Cool

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2020-11-19 17:32:40 -05:00
senhuang42
dcbbf7c09f Unroll isRLE loop 2020-11-19 12:38:13 -05:00
senhuang42
05c0229668 Clean up visual conversion warnings 2020-11-18 15:36:29 -05:00
senhuang42
d6d7ba2a1f Modification to offset validation to include entire sequence 2020-11-17 10:13:22 -05:00
senhuang42
8f3136a9c7 Fix assert edge case, improve documentation in zstd.h 2020-11-16 18:05:35 -05:00
senhuang42
f6baad87d6 Fix warnings and make validation enabled by default 2020-11-16 12:00:06 -05:00
senhuang42
55b90ef010 Fix unit tests to agree with new changes 2020-11-16 11:36:37 -05:00
senhuang42
7f563b0519 Add new sequence format as an experimental CCtx param 2020-11-16 10:49:17 -05:00
senhuang42
347824ad73 Overhaul logic to simplify, add in proper validations, fix match splitting 2020-11-16 10:49:17 -05:00
senhuang42
46824cb018 Add new sequence compress api params to cctx 2020-11-16 10:49:17 -05:00
senhuang42
48405b4633 Fix srcSize=0 edge case 2020-11-16 10:49:17 -05:00
senhuang42
022e6d81e7 Fix literals length calculation 2020-11-16 10:49:17 -05:00
senhuang42
dad20b5ccb Remove dstCapacity error check 2020-11-16 10:49:17 -05:00
senhuang42
b8e16a2057 Remove extraneous function in this API 2020-11-16 10:49:17 -05:00
senhuang42
f29507c4fc Add check comparing offset to window size 2020-11-16 10:49:17 -05:00
senhuang42
7a6e46a92f Fix MSAN errors 2020-11-16 10:49:17 -05:00
senhuang42
cc2642bd17 Address edge case with endPosInSequence 2020-11-16 10:49:17 -05:00
senhuang42
fd10007174 Change debug levels to appropriate ones 2020-11-16 10:49:17 -05:00
senhuang42
2db8441245 Add RLE support 2020-11-16 10:49:17 -05:00
senhuang42
dfef298336 Fix various build warnings 2020-11-16 10:49:17 -05:00
senhuang42
2bbdddf24e Add test case to roundtrip using ZSTD_getSequences() and ZSTD_compressSequences() 2020-11-16 10:49:16 -05:00
senhuang42
5fd69f8173 Add documentation for new api functions 2020-11-16 10:49:16 -05:00
senhuang42
e8b7fdb64b Refactor for enhanced code clarity 2020-11-16 10:49:16 -05:00
senhuang42
c675fb46f1 Rename internal function compressSequences(), and promote new *_ext() functions to their actual name 2020-11-16 10:49:16 -05:00
senhuang42
013434e1e4 Add another API function to compress with existing CCTX 2020-11-16 10:49:16 -05:00
senhuang42
c44ce29013 More adjustments to improve code clarity 2020-11-16 10:49:16 -05:00
senhuang42
48f67da854 Pull compressStream2() transparent initialization into its own function 2020-11-16 10:49:16 -05:00
senhuang42
c86151f53c Add initial support for new ZSTD_Sequence mode 2020-11-16 10:49:16 -05:00
senhuang42
e0f26afce9 Add sequence compression format param 2020-11-16 10:49:16 -05:00
senhuang42
f51af9a609 Always ensure sequenceRange updates properly, add more error forwarding 2020-11-16 10:49:16 -05:00
senhuang42
1a449688fd Various minor logical refactors to improve clarity 2020-11-16 10:49:16 -05:00
senhuang42
e5fe485dcc Fix cSize calculation for noCompressBlocks 2020-11-16 10:49:16 -05:00
senhuang42
6145ebb400 Rebased, roundtrips silesia.tar 2020-11-16 10:49:16 -05:00
senhuang42
b5b61cc216 Refactor for better debugging info 2020-11-16 10:49:16 -05:00
senhuang42
293fad6b45 Corrections and edge-case fixes to be able to roundtrip dickens 2020-11-16 10:49:16 -05:00
senhuang42
7eb6fa7be4 Multi-block compression scaffolding - works on single-block files 2020-11-16 10:49:16 -05:00
senhuang42
75b01f34b9 Add support for uncompressible blocks 2020-11-16 10:49:16 -05:00
senhuang42
e04da68157 Enable usage of ZSTD_sequenceRange for single-block compression 2020-11-16 10:49:16 -05:00
senhuang42
337fac216d Add logic to handle ZSTD_sequenceRange 2020-11-16 10:49:16 -05:00
senhuang42
85822ddd53 Add last literals handling like getSequences() 2020-11-16 10:49:16 -05:00
senhuang42
2cff8df1a2 Pull block compression out of main compressSequences() function 2020-11-16 10:49:16 -05:00
senhuang42
cfced9344a Implement ZSTD_updateSequenceRange 2020-11-16 10:49:16 -05:00
senhuang42
b116e1f211 Modify SequenceRange to have posInSequence 2020-11-16 10:49:16 -05:00
senhuang42
d99b675112 Add function definition for sequenceRange updater 2020-11-16 10:49:16 -05:00
senhuang42
74e95c05cc Add ZSTD_SequenceRange to count ranges in array of ZSTD_Sequence 2020-11-16 10:49:16 -05:00
senhuang42
89f3848310 Add support for repcodes 2020-11-16 10:49:16 -05:00
senhuang42
3e930fd044 Code cleanup, add debuglog statments 2020-11-16 10:49:16 -05:00
senhuang42
086513b5b9 Implement first pass at compressSequences() 2020-11-16 10:49:16 -05:00
senhuang42
a9327b1e9b Add initial function prototype for ZSTD_compressSequences_ext (to be renamed later) 2020-11-16 10:33:35 -05:00
animalize
52f8c07a3f Clamp compression level in ZSTD_getCParams_internal() function 2020-11-14 13:26:08 +08:00
senhuang42
9d936d61d2 Reduce number of memcpy() calls 2020-11-13 19:43:30 -05:00
senhuang42
be4ac6c5bc Use existing repcode update function to implement updates 2020-11-12 16:51:12 -05:00
senhuang42
674c9b9235 Add in proper block repcode histories 2020-11-12 15:34:37 -05:00
senhuang42
06c7f14066 Let block reps persist 2020-11-12 12:24:44 -05:00
senhuang42
396275068c Fix incorrect repcode setting 2020-11-12 11:57:01 -05:00
senhuang42
1a8af0de73 Improve unit test 2020-11-12 11:09:09 -05:00
senhuang42
4d4fd2c55f Overhaul repcode handling logic 2020-11-12 10:59:35 -05:00
sen
f62edf0fe9
Merge pull request #2381 from senhuang42/expand_sequence_extraction_api
Add enum to define ZSTD_Sequence type and update sequence extraction API
2020-11-06 13:00:31 -05:00
senhuang42
7d1dea070c Update unit tests 2020-11-06 11:10:37 -05:00
senhuang42
779df995c6 Implement mergeGeneratedSequences() 2020-11-06 10:55:46 -05:00
senhuang42
51abd58208 Rename getSequences() to generateSequences() 2020-11-06 10:53:22 -05:00
Luke Pitt
eac309c71b Add ZSTD_getDictID_fromCDict function to experimental section 2020-11-04 11:37:37 +00:00
senhuang42
f782cac3d4 Change block delimiter removing to linear time approach 2020-11-02 17:06:20 -05:00
senhuang42
3434049c1f Use ZSTD_memmove() instead of memmove() 2020-11-02 11:43:19 -05:00
senhuang42
d4d0346b40 Update name of enum, clarify documentation 2020-11-02 11:38:17 -05:00
senhuang42
e6178f837f Revert unnecessary seqCollector adjustment 2020-11-02 10:59:20 -05:00
senhuang42
e8501e00b8 Fix incorrect index increment in merge algorithm 2020-11-02 10:58:41 -05:00
senhuang42
a36fdada57 Add algorithm to remove all delimiters 2020-11-02 10:46:52 -05:00
senhuang42
435a3a0428 Update seqCollector definition 2020-11-02 10:19:26 -05:00
senhuang42
3327932609 Update ZSTD_getSequences function signature 2020-11-02 10:17:59 -05:00
Nick Terrell
7205e609a9
Merge pull request #2354 from terrelln/stable-buffer
Add ZSTD_c_stable{In,Out}Buffer and optimize when set
2020-10-30 15:06:56 -07:00
sen
c37c714ef1
Merge pull request #2376 from senhuang42/clarify_sequence_extraction_api
Refine external ZSTD_Sequence API
2020-10-30 15:47:25 -04:00
Nick Terrell
d4e021fe35 [lib] Avoid allocating the input buffer when ZSTD_c_stableInBuffer is set
We don't use it when we have a stable input buffer, so don't allocate
it. I had to slightly modify `ZSTD_copyCCtx()` by storing the
`ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is
no longer the correct signal for the buffered mode.
2020-10-30 10:55:34 -07:00
Nick Terrell
24f72789e2 [lib] Skip the input window buffer when ZSTD_c_stableInBuffer is set
Compress directly from the `ZSTD_inBuffer`. We still allocate the input
buffer. A following commit will remove that allocation.
2020-10-30 10:55:34 -07:00
Nick Terrell
6bd6b6f7d3 [cwksp] Return NULL when 0 bytes are requested
This ensures that the buffer is never used.
2020-10-30 10:55:34 -07:00
Nick Terrell
fcf81cee5e [lib] Avoid allocating output buffer when ZSTD_c_stableOutBuffer is set
We compress directly to the `ZSTD_outBuffer` so we don't need to
allocate it.
2020-10-30 10:55:34 -07:00
Nick Terrell
6d5dc93d4e [lib] Compress directly into output when ZSTD_c_stableOutBuffer is set
When we have a stable output buffer always compress directly into the
`ZSTD_outBuffer`. We are allowed to return `dstSizeTooSmall`.
2020-10-30 10:55:34 -07:00
Nick Terrell
987cb4ca6a [lib] Take the shortcut when ZSTD_c_stableOutBuffer is set
When we have a stable output buffer take the single-pass shortcut.
It is okay to return `dstSizeTooSmall` if the output buffer isn't
big enough, because we know it will never grow.
2020-10-30 10:55:34 -07:00
Nick Terrell
809b2f2071 [lib] Set ZSTD_c_stable{In,Out}Buffer in ZSTD_compress2()
Sets these parameters in ZSTD_compress2() then resets them to their
orignal values after the compression call.

An alternative design could be to add a flush mode `ZSTD_e_singlePass`
which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single
compression call, by directly setting the applied parameters. I've opted
for the smaller change, but this is open for discussion.
2020-10-30 10:55:34 -07:00
Nick Terrell
c74be3f6de [lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set
Adds the validation of the input/output buffers only. They are still
unused.
2020-10-30 10:55:34 -07:00
Nick Terrell
e3e0775cc8 [API] Add ZSTD_c_stable{In,Out}Buffer parameters
This commit adds the parameters and sets the value in the CCtxParams
but it does not do anything with the value.
2020-10-30 10:54:39 -07:00
Nick Terrell
e2581d9572 [lib] Set appliedParams in zstdmt mode
Previously only `nbWorkers` was set. Set all parameters, because that is
what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer`
parameters.
2020-10-30 10:54:38 -07:00
senhuang42
536e89c723 Sequence extractor should update CBlockState 2020-10-30 12:13:19 -04:00
senhuang42
32cac2627a Emit last literals of 0 size as well, to indicate block boundary 2020-10-29 16:41:17 -04:00
senhuang42
69bd5f0654 Correct literalsRead calculation to include longLength 2020-10-29 14:49:37 -04:00
senhuang42
59624f3163 Remove implicit typecast to appease appVeyor windows build 2020-10-28 16:25:09 -04:00
senhuang42
3ed5d053d8 Clarify comments in zstd.h some more 2020-10-28 09:53:09 -04:00
Nick Terrell
599ff58e08
Merge pull request #2339 from terrelln/zstdmt-stability
Fix zstdmt stability issues and clean up the zstdmt code
2020-10-27 19:43:13 -07:00
sen
17b700d78a
Merge pull request #2366 from senhuang42/enable_ldm_by_default
Enable LDM by default if window size >= 128MB and strategy uses opt parser
2020-10-27 14:59:28 -04:00
Nick Terrell
0953645837
Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue
Fix long distance matcher OSS-fuzz issue
2020-10-27 11:13:03 -07:00
senhuang42
3163909d14 Remove unused variable position 2020-10-27 12:58:12 -04:00
senhuang42
dc448563e9 Add test compatibility with last literals in sequences 2020-10-27 12:35:28 -04:00
senhuang42
1d221ecc03 Add support for representing last literals in the extracted seqs 2020-10-27 11:19:48 -04:00
senhuang42
9171f920cd Improve documentation of seqStore_t 2020-10-27 10:50:22 -04:00
senhuang42
96b0ff7886 Improve documentation regarding various operations in copyBlockSequences 2020-10-27 10:36:06 -04:00
senhuang42
3a11c7eb03 Modify ZSTD_copyBlockSequences to agree with new API 2020-10-27 10:31:40 -04:00
senhuang42
8bdb32aebe Add a function for LDM enable check 2020-10-20 13:46:02 -04:00
senhuang42
578e889ec1 Move ldm enable to compressStream2() 2020-10-20 13:04:45 -04:00
senhuang42
d28d8a1d72 Include LDM tables size for CCtx size estimation where relevant 2020-10-20 09:21:30 -04:00
senhuang42
b1c7fc5768 Add compatibility for multithreading 2020-10-19 12:07:06 -04:00
senhuang42
590f7f55f0 Add ldm enable condition in ZSTD_resetCCtx_internal 2020-10-19 10:26:17 -04:00
senhuang42
4d01979b62 Expose and call ZSTD_ldm_skipRawSeqStoreBytes() 2020-10-16 20:30:00 -04:00
Yann Collet
a0ec50c2dc
Merge pull request #2355 from senhuang42/change_ldm_mt_config
Reduce --long mode MT jobsize at higher levels
2020-10-16 13:35:50 -07:00
senhuang42
f49926edf4 Change cycleLog adjustment to +3 from +4 2020-10-15 09:56:05 -04:00
senhuang42
ee84817fe7 Reset posInSequence when using ZSTD_referenceExternalSequences() 2020-10-14 22:06:08 -04:00
senhuang42
d0550bb18f Clarify argument names, fix DEBUGLOG() statements 2020-10-14 15:45:43 -04:00
senhuang42
3f99c9b38d Adjust match backwards count args 2020-10-14 15:23:03 -04:00
senhuang42
bf0d559449 Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments() 2020-10-14 12:58:06 -04:00
senhuang42
467e4383b0 Merge branch 'dev' of github.com:senhuang42/zstd into change_ldm_mt_config 2020-10-14 10:17:50 -04:00
Yann Collet
f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell
7e6f91ed84 [minor] Improve docs and add an assert in response to review 2020-10-12 16:43:17 -07:00
senhuang42
354b5f1c0a Use cycleLog instead of chainLog to determine LDM jobLog 2020-10-12 16:09:59 -04:00
Nick Terrell
441ce4178f [zstdmt] Clarify a comment 2020-10-12 12:58:13 -07:00
Nick Terrell
efff5d8b2d [zstdmt] Fix determinism issue with rsyncable mode
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
   for the next sync point.

This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.

Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
2020-10-12 12:55:17 -07:00
Nick Terrell
ede4f97153 [zstdmt] Fix bug where extra empty blocks are emitted
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.

The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
2020-10-12 12:55:17 -07:00
Nick Terrell
c51a9e79b9 [zstdmt] Rip out the zstdmt API
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
2020-10-12 12:55:16 -07:00
Nick Terrell
1784c4b4ab [zstdmt] Remove single-pass shortcut
Simplifies the code and removes blocking from zstdmt.

At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.

Fixes #2327.
2020-10-12 12:53:26 -07:00
Nick Terrell
b55ae009ac [zstdmt] Remove singleBlockingThread mode
This is already handled by zstd, so this logic is never used.
2020-10-12 12:53:26 -07:00
Nick Terrell
d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Nick Terrell
fadaab8c7c [minor improvement] Pass 0 as the content size in the DDS
The DDS structure can't be copied into the working tables like the DMS.
So it doesn't need to account for the source size when sizing its
parameters, just the dictionary size.
2020-10-12 12:47:21 -07:00
Nick Terrell
48ef15fb47 [minor improvement] Pass dictSize when selecting parameters
When selecting parameters in streaming compression with a dictionary use
the dictionary size to select the parameters.
2020-10-12 12:47:19 -07:00
Nick Terrell
012818df99 [refactor] Remove ZSTD_resetCStream_internal()
This function is only called in one place. It isn't a logical separation
of duties, and it was only obsfucating the code now, so inline it.
2020-10-12 12:46:10 -07:00
Nick Terrell
7083f79008 [bug] Fix dictContentType when reprocessing cdict
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).

Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.

Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.

Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
2020-10-12 12:46:10 -07:00
senhuang42
d6911b86be Require LDM matches to be strictly greater in length 2020-10-09 12:56:18 -04:00
Yann Collet
12541931fa
Merge pull request #2328 from marxin/zstd-pool-api
Allow external creation of POOLs that can be shared.
2020-10-09 01:00:50 -07:00
Yann Collet
6fdb0cb8d9
Merge pull request #2303 from senhuang42/let_cdict_take_clevel_priority
For ZSTD_compressStream2(), let cdict take compression level priority
2020-10-09 00:48:30 -07:00
senhuang42
b9c8033cde Define kNullRawSeqStore for every file 2020-10-07 19:02:41 -04:00
senhuang42
a6165c1b28 Change matchState_t::ldmSeqStore to pointer 2020-10-07 14:13:57 -04:00
senhuang42
abce708a56 Move posInSequence correction to correct location 2020-10-07 13:56:25 -04:00
senhuang42
0c515590d8 Replace offCode of largest match if ldm's offCode is superior 2020-10-07 13:56:25 -04:00
senhuang42
0fac8e07e1 Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes 2020-10-07 13:56:25 -04:00
senhuang42
a5500cf2af Refactor separate ldm variables all into one struct 2020-10-07 13:56:25 -04:00
senhuang42
0731b94e7c Use kNullRawSeqStore constant in zstdmt_compress.c 2020-10-07 13:56:25 -04:00
senhuang42
0325d878f2 Remove bubbling down matches with longer offCode and same matchLen 2020-10-07 13:56:25 -04:00
senhuang42
031b7ec15f Disable LDM minMatch adjustment when using opt parser 2020-10-07 13:56:25 -04:00
senhuang42
ddf8a3f1b9 Enable inclusion of mid-flight LDMs in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
88f72ed942 Correct incorrect offcode calculation 2020-10-07 13:56:25 -04:00
senhuang42
d8b43a4202 Add explicit conversion of size_t to U32 2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
senhuang42
c87d2e5866 Prefix new static ldm helpers with ZSTD_opt 2020-10-07 13:56:25 -04:00
senhuang42
429dec4f42 Add DEBUGLOG() calls in ldm helpers 2020-10-07 13:56:25 -04:00
senhuang42
10647924f1 Make function descriptions more accurate 2020-10-07 13:56:25 -04:00
senhuang42
1a687b3fcb Improve documentation of relevant structs 2020-10-07 13:56:25 -04:00
senhuang42
37617e23d7 Correct matchLength calculation and remove unnecessary functions 2020-10-07 13:56:25 -04:00
senhuang42
7dee62c287 Reset ldmSeqStore after initStats_ultra() pass for btultra2 2020-10-07 13:56:25 -04:00
senhuang42
0718aa70df Refactor existing functions to use posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
7348b40a87 Adjustments to ldm_calculateMatchRange() to calculate bounds correctly 2020-10-07 13:56:25 -04:00
senhuang42
a1ef2db5b2 Add ldm_calculateMatchRange() function 2020-10-07 13:56:25 -04:00
senhuang42
ef823e0299 Remove rawSeqStore.base and add rawSeqStore.posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
4793ae3b84 Prevent duplicate LDMs from being inserted 2020-10-07 13:56:25 -04:00
senhuang42
65f9cfeeec Add extra bounds check to prevent heap access after free ASAN error 2020-10-07 13:56:25 -04:00
senhuang42
bff5785fd5 Address mixed variables C90 warning 2020-10-07 13:56:25 -04:00
senhuang42
724b94ed18 ldm_getNextMatch fixed return values 2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68 Cleanups, add comments and explanations 2020-10-07 13:56:25 -04:00
senhuang42
78da2e1808 Fixed sifting algorithm 2020-10-07 13:56:25 -04:00
senhuang42
6ccd97fc96 Fixed end of match boundary update issues 2020-10-07 13:56:25 -04:00
senhuang42
28394b64f2 Add proper bounds check on adding ldms 2020-10-07 13:56:25 -04:00
senhuang42
a2f2b58d04 Add a function ldm_voidSequences() 2020-10-07 13:56:25 -04:00
senhuang42
9c3c7cd20e Fix function argument to getNextMatch() 2020-10-07 13:56:25 -04:00
senhuang42
c8b8572b38 Adjustments to no longer segfault on nci 2020-10-07 13:56:25 -04:00
senhuang42
f57c7e6bbf Add base adjustment correction 2020-10-07 13:56:25 -04:00
senhuang42
5df9b5e05f Add initial getNextMatch() in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
f8ce7cabc3 Added more debugging 2020-10-07 13:56:25 -04:00
senhuang42
84009a076a Add re-copying of ldmSeqStore after processing 2020-10-07 13:56:25 -04:00
senhuang42
42395a70c2 Add debug statements, flesh out functions 2020-10-07 13:56:25 -04:00
senhuang42
dd3dd199bb Get zstd to build with new functions and callsites, fix arguments 2020-10-07 13:56:25 -04:00
senhuang42
766c4a8c28 Implement part of ldm_maybeAddLdm() 2020-10-07 13:56:25 -04:00
senhuang42
84777059d2 Implement ldm_getNextMatch() 2020-10-07 13:56:24 -04:00
senhuang42
28c74bf591 Implement basic splitSequence and skipSequence functions 2020-10-07 13:56:24 -04:00
senhuang42
634ab7830d Flesh out required args for ldm_handleLdm() 2020-10-07 13:56:24 -04:00
senhuang42
db70761032 Add callsites to appropriate locations in ..opt_generic() 2020-10-07 13:56:24 -04:00
senhuang42
aea61e3c91 Add ldm helper function declarations into opt parser 2020-10-07 13:56:24 -04:00
senhuang42
35d9f488f5 Modify codepath to use opt parser exclusively if the compression level is high enough 2020-10-07 13:56:24 -04:00
senhuang42
e1ae398ad5 Add rawSeqStore to match state 2020-10-07 13:56:24 -04:00
Martin Liska
b684900a4a Allow external creation of POOLs that can be shared. 2020-10-07 12:44:33 +02:00
Nick Terrell
27c969ed07 Add comments to ZSTD_getLowest{Match,Prefix}Index()
Clarify how we handle dictionaries in each case.
2020-10-01 13:21:46 -07:00
Yann Collet
cc88eb7594
Merge pull request #2317 from animalize/msvc_inline
Let MSVC force inline ZSTD_hashPtr() function
2020-09-30 08:27:53 -07:00
Nick Terrell
f1cbeec039 [superblock] Reduce stack usage by correctly sizing header buffers 2020-09-24 19:42:04 -07:00
Nick Terrell
6a1e526ea7 [lib] Add ZSTD_COMPRESS_HEAPMODE tuning parameter 2020-09-24 19:42:04 -07:00
Nick Terrell
b841387218 [freestanding] Improve macro resolution to handle #if X 2020-09-24 19:42:04 -07:00
Nick Terrell
caecd8c211 Allow user to override ASAN/MSAN detection
Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for
MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined.
This allows the user to override ASAN/MSAN detection for platforms that
don't support it.
2020-09-24 19:42:04 -07:00
Nick Terrell
88fac5d514 Remove call to memset
The previous commit fixes the test so it errors on calls to mem*()
functions from <string.h>.
2020-09-24 19:42:04 -07:00
Nick Terrell
9261476b7d [lib] Wrap customMem xor checks in parens for readability
This clarifies operator precedence, and quiets cppcheck in
the Kernel Test Robot. I think this is a slight bonus to
readability, so I am accepting the suggestion.
2020-09-23 23:26:07 -07:00
animalize
2e5d73dd72 Use MEM_STATIC FORCE_INLINE_ATTR instead of FORCE_INLINE_TEMPLATE
It adds `__attribute__((unused))` for __GNUC__, to eliminate `-Werror=unused-function` error.
2020-09-21 13:26:38 +08:00
animalize
0a69a6b1ca Let MSVC force inline ZSTD_hashPtr() function
ZSTD_hashPtr() function was not expanded by MSVC, led to low performance compared to GCC.
2020-09-21 10:38:55 +08:00
Felix Handte
200c960f1d
Merge pull request #2311 from felixhandte/ddss-fix-cparam-derivation
Fix Compression Parameter Derivation Bugs Introduced by DDSS Changes
2020-09-18 14:02:14 -04:00
W. Felix Handte
8930c6e551 Use ZSTD_CCtxParams_init() to Init CCtxParams, not memset()
Even if the discrepancies are at the moment benign, it's probably better to
standardize on using the one true initializer, rather than trying (and failing)
to correctly duplicate its behavior.
2020-09-17 12:15:33 -04:00