Commit Graph

1736 Commits

Author SHA1 Message Date
sen
ab216bc2c5
Merge pull request #2559 from senhuang42/add_dict_regression_tests_backup
Add different dict modes to compression ratio regression test, update results.csv
2021-03-25 19:26:06 -04:00
Sen Huang
bbbd578f45 Update results.csv 2021-03-25 11:16:37 -07:00
Sen Huang
f27e326456 Restrict dictmode regression tests only to advanced API, fix some compiler warnings 2021-03-25 10:39:08 -07:00
Sen Huang
1cadf86b39 Add tests to regression tests for dict 2021-03-25 10:39:08 -07:00
sen
b0407b9f0e
Merge pull request #2555 from senhuang42/default_clevel_func
Add ZSTD_defaultCLevel() function to public API
2021-03-25 13:07:28 -04:00
Sen Huang
e398744a35 Add ZSTD_defaultCLevel() function to public API 2021-03-25 08:04:00 -07:00
sen
bf542c8a8d
Merge pull request #2447 from senhuang42/block_splitter_v2
Recursive block splitting
2021-03-24 12:27:22 -04:00
senhuang42
e2bb215117 Add unit tests and fuzzer param 2021-03-24 08:21:09 -07:00
sen
c48889f097
Merge pull request #2538 from senhuang42/monotonicity_test
Add memory monotonicity test over srcSize
2021-03-22 16:54:34 -04:00
Sen Huang
dff4a0e867 Make ZSTD_estimateCCtxSize_internal() loop through all srcSize parameter sets as well 2021-03-21 16:15:31 -07:00
Sen Huang
77ae664ba6 Fix ZSTD_dedicatedDictSearch_isSupported() requirements 2021-03-16 17:36:05 -07:00
Sen Huang
b9dd821441 Add mem monotonicity test over srcSize 2021-03-16 08:24:26 -07:00
Felix Handte
aec1e8c715
Merge pull request #2513 from felixhandte/fix-2493
Avoid Using `stat -c` on NetBSD
2021-02-26 18:02:38 -05:00
W. Felix Handte
221e4659cd Avoid Using stat -c on NetBSD
Addresses #2493. I think. I don't have a NetBSD system to test on.
2021-02-26 13:05:39 -05:00
W. Felix Handte
9b7f9d26d5 Cover These Edge Cases in Tests 2021-02-26 13:01:20 -05:00
Nick Terrell
04139c3ff2 [regression] Update results.csv
Fixes the update from PR #2508. I had accidentally forgotten to rebuild
the library, and the regression test suite isn't hooked up to the new
fancy build system yet.

I've double checked that the results are deterministic.
2021-02-24 19:11:38 -08:00
Yann Collet
61b63e9060
Merge pull request #2492 from niacat/dev
Use standard md5 tool on NetBSD.
2021-02-24 16:38:10 -08:00
Nick Terrell
59b2c596d7 [regression] Update results.csv
9f327c02fd changed the compression method
for LDM, so the results are slightly different.

I've re-tested LDM on some larger inputs and everything seems fine.
These ratio changes just seem to be noise. There is generally a 0.01%
swing in ratio, sometimes better sometimes worse, but never large.
2021-02-23 15:23:08 -08:00
Nick Terrell
91e6480458 [fuzz] Fix compiler detection & update ubsan flags
* Fix compiler version regex, which was broken for multi-digit
  versions.
* Fix compiler detection for gcc.
* Disable `pointer-overflow` instead of `integer-overflow` for gcc
  versions newer than 8.0.0.
2021-02-19 13:19:18 -08:00
Nick Terrell
7736549bea [bug-fix] Make simple single-pass functions ignore advanced parameters
The simple compression functions are intended to ignore the advanced
parameters, but they were accidentally using them. All the
`ZSTD_parameters` were set correctly, but any extra parameters were
used as-is. E.g. `ZSTD_c_format`.

This PR makes all the simple single-pass functions listed below ignore
the advanced parameters, as intended.

* `ZSTD_compressCCtx()`
* `ZSTD_compress_usingDict()`
* `ZSTD_compress_usingCDict()`
* `ZSTD_compress_advanced()`
* `ZSTD_compress_usingCDict_advanced()`

It also adds a test case that ensures that each of these functions
ignore the advanced parameters.
2021-02-12 19:11:23 -08:00
nia
74f85818a6 Use standard md5 tool on NetBSD.
This avoids a GNU coreutils dependency.

-n is used to match the output format of coreutils:
http://man.netbsd.org/md5.1
2021-02-11 10:50:11 +01:00
Nick Terrell
54a4998a80 Add basic tracing functionality 2021-02-05 16:28:52 -08:00
senhuang42
9ae0dd9336 Fix Visual and staticanalyze warnings 2021-01-07 17:58:37 -05:00
senhuang42
17222654bf Add streaming decompression to unit test 2021-01-07 12:29:12 -05:00
senhuang42
22b7bff2bc Add unit test, improve documentation 2021-01-07 12:29:12 -05:00
Nick Terrell
58476bcf7f Don't shrink window log in ZSTD_getCParams()
Treat ZSTD_getCParams() and ZSTD_adjustCParams() in the same way
we treat streaming compression. Choose parameters based on the
dictionary size + source size, and assume the source size is small
if unkown. But, don't shrink the window log down in
ZSTD_adjustCParams_internal().
2021-01-04 15:54:09 -08:00
Nick Terrell
9d31c704d5 Don't shrink window log when streaming with a dictionary
Fixes #2442.

1. When creating a dictionary keep the same behavior as before.
   Assume the source size is 513 bytes when adjusting parameters.
2. When calling ZSTD_getCParams() or ZSTD_adjustCParams() keep
   the same behavior as before.
3. When attaching a dictionary keep the same behavior of ignoring
   the dictionary size. When streaming this will select the
   largest parameters and not adjust them down. But, the CDict
   will use the correctly sized parameters, which seems like the
   right tradeoff.
4. When not attaching a dictionary (either forced not to, or
   using a prefix dictionary) we select parameters based on the
   dictionary size + source size, and assume the source size is
   small, which is the same behavior as before. But, now we don't
   adjust the window log (and hash and chain log) down when the
   source size is unknown.

When the source size is unknown all cdicts should attach, except
when the user disables attaching, or `forceWindow` is used. This
means that when streaming with a CDict we end up in the good case
where we get small CDict parameters, and large source parameters.

TODO: Add a streaming + dictionary regression test case.
2021-01-04 15:54:09 -08:00
Nick Terrell
a98a6e2091 [test][regression] Add no source size with dictionary test
* Add a test that runs without a pledgedSrcSize and with a dictionary.
* Add github.tar data with uses the github dictionary while compressing
  github.tar, instead of each file individually.
2021-01-04 15:54:09 -08:00
Nick Terrell
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
Yann Collet
ff2f888d56 fixed one more minor cast issue
can't use address calculation with `void*`
2020-12-29 11:44:37 -08:00
Yann Collet
7f8be046b9 fixed minor warnings introduced in #2439 2020-12-28 14:07:31 -08:00
Yann Collet
cfff4c1cd5
Merge pull request #2439 from senhuang42/skippable_frame_api
Generate skippable frame API
2020-12-28 11:22:07 -08:00
senhuang42
5c41490bfe Use pre-defined constants 2020-12-21 11:52:05 -05:00
senhuang42
339d8ba103 Add unit test 2020-12-21 11:33:27 -05:00
Yann Collet
9648bf027b try to keep libzstd.a "as is" once created
to be compatible with scenarios such as
`make -j allmost`
2020-12-20 17:10:57 -08:00
Yann Collet
3536e9d5ff removing tests using too much resources for 32-bit address space 2020-12-17 15:44:54 -08:00
Yann Collet
0b39531d75 moving all references to release branch
was previously `master`
2020-12-16 23:00:35 -08:00
Nick Terrell
0be843b200 [tests] Fix playTests.sh with spaces in path 2020-12-10 11:03:47 -08:00
senhuang42
b9ab6bc061 Fix various conversion warnings 2020-12-08 10:07:28 -05:00
Yann Collet
69a04ccf68
Merge pull request #2413 from senhuang42/paramgrill_windows
Paramgrill for windows
2020-12-04 21:38:39 -08:00
Yann Collet
b86e3c9304
Merge pull request #2415 from facebook/fix_aliasing
fix gcc-10 strict aliasing warnings
2020-12-04 21:30:57 -08:00
Yann Collet
5c0a3489a5 fix aliasing warning in decodecorpus 2020-12-04 19:21:40 -08:00
Nick Terrell
c238db046f
Merge pull request #2414 from terrelln/mt-progress
[lib] Ensure that multithreaded compression always makes some progress
2020-12-04 16:30:08 -08:00
Nick Terrell
4c58cb8383 [lib] Ensure that multithreaded compression always makes some progress 2020-12-03 20:25:14 -08:00
senhuang42
260b85acf5 Fix MSVC 2019 warnings 2020-12-03 10:36:45 -05:00
Yann Collet
5de5c1d759 fixed fuzzer multithreading tests 2020-12-02 10:34:12 -08:00
Yann Collet
db21d383b5 fixed fuzzer32 to support multithreading tests
though it still fails on test33:
`test 33: superblock uncompressible data, too many nocompress superblocks`
2020-12-02 09:13:55 -08:00
Yann Collet
f69d8c027d removed fullbench-lib from tests/all
this build works fine on all my systems,
but since to fail on CI environment.
Unclear why there is a difference.
This build test is not relevant anyway.
2020-12-02 00:21:29 -08:00
Yann Collet
9f8b180d5d fixed API documentation 2020-12-02 00:15:07 -08:00
Yann Collet
f8d0b46a9f streamline fuzzer
from fuzzer32
2020-12-01 23:44:16 -08:00
Yann Collet
37165f66b7 better usage of default build rules 2020-12-01 23:36:05 -08:00
Yann Collet
343a75d2ef simplified test makefile
removed gzstd target:
relevant tests are unused and broken anyway
2020-12-01 22:33:45 -08:00
senhuang42
4c5f337248 Use cctx's minMatch instead of global MINMATCH, make fuzzer use validation 2020-11-30 15:41:20 -05:00
Yann Collet
4b5d7e9ddb fix lz4 test messed by console detection 2020-11-30 06:47:16 -08:00
senhuang42
23554ff25f Force CCtx minmatch to be same as generated minmatch 2020-11-23 13:29:20 -05:00
senhuang42
c502cd33e5 Fix generating 1 too few characters in random string generator 2020-11-20 16:58:25 -05:00
senhuang42
5b0c8f0a7c Add appropriate bound to matchlengths, and reduce srcSize max 2020-11-20 16:58:25 -05:00
senhuang42
a73a07b189 Add a bound for matchlength dependent on window size 2020-11-20 16:58:25 -05:00
senhuang42
5c68c5e31e Variety of minor fixups, reduce allocation, make deterministic 2020-11-20 16:58:25 -05:00
senhuang42
59c021f501 Add built binary to .gitignore 2020-11-20 16:58:25 -05:00
senhuang42
26bc0bfdf6 Add new fuzzer to build targets 2020-11-20 16:58:25 -05:00
senhuang42
ed575963c5 Implement new fuzzer for sequence compression 2020-11-20 16:58:25 -05:00
senhuang42
7742f076b4 Add experimental param for sequence validation 2020-11-20 11:57:41 -05:00
senhuang42
05c0229668 Clean up visual conversion warnings 2020-11-18 15:36:29 -05:00
senhuang42
d6d7ba2a1f Modification to offset validation to include entire sequence 2020-11-17 10:13:22 -05:00
senhuang42
55b90ef010 Fix unit tests to agree with new changes 2020-11-16 11:36:37 -05:00
senhuang42
3d26615c84 Adjust unit tests to agree with new sequence generation API 2020-11-16 10:49:17 -05:00
senhuang42
2db8441245 Add RLE support 2020-11-16 10:49:17 -05:00
senhuang42
2bbdddf24e Add test case to roundtrip using ZSTD_getSequences() and ZSTD_compressSequences() 2020-11-16 10:49:16 -05:00
senhuang42
9d936d61d2 Reduce number of memcpy() calls 2020-11-13 19:43:30 -05:00
senhuang42
1a8af0de73 Improve unit test 2020-11-12 11:09:09 -05:00
sen
f62edf0fe9
Merge pull request #2381 from senhuang42/expand_sequence_extraction_api
Add enum to define ZSTD_Sequence type and update sequence extraction API
2020-11-06 13:00:31 -05:00
senhuang42
7d1dea070c Update unit tests 2020-11-06 11:10:37 -05:00
senhuang42
51abd58208 Rename getSequences() to generateSequences() 2020-11-06 10:53:22 -05:00
Luke Pitt
eac309c71b Add ZSTD_getDictID_fromCDict function to experimental section 2020-11-04 11:37:37 +00:00
senhuang42
c54a25b666 Revert compressibility change 2020-11-02 11:38:58 -05:00
senhuang42
d4d0346b40 Update name of enum, clarify documentation 2020-11-02 11:38:17 -05:00
senhuang42
9102f30dbf Update unit test 2020-11-02 11:30:31 -05:00
senhuang42
3327932609 Update ZSTD_getSequences function signature 2020-11-02 10:17:59 -05:00
Nick Terrell
37d546c445
Merge pull request #2379 from terrelln/regression-test
[regression] Updates results.csv & add README
2020-10-30 15:09:38 -07:00
Nick Terrell
7205e609a9
Merge pull request #2354 from terrelln/stable-buffer
Add ZSTD_c_stable{In,Out}Buffer and optimize when set
2020-10-30 15:06:56 -07:00
Nick Terrell
a446fa33dc [regression] Add README explaining the test 2020-10-30 13:55:52 -07:00
Nick Terrell
222916a5d3 [regression] Update results.csv
https://github.com/facebook/zstd/pull/2339 removes the single-pass zstdmt API.
This changes the compressed size, because we no longer take the # of threads into
account when deciding the job size.
2020-10-30 13:54:30 -07:00
sen
c37c714ef1
Merge pull request #2376 from senhuang42/clarify_sequence_extraction_api
Refine external ZSTD_Sequence API
2020-10-30 15:47:25 -04:00
Nick Terrell
2ebf6d5588 [test] Add unit tests for ZSTD_c_stable{In,Out}Buffer 2020-10-30 10:55:34 -07:00
sen
ff93440fc6
Merge pull request #2375 from senhuang42/ldm_oss_fuzz_testcase
Add a test case for LDM + opt parser with small uncompressible block
2020-10-29 09:32:05 -04:00
senhuang42
7198ebb213 Un-mix declarations and code 2020-10-28 18:51:03 -04:00
senhuang42
60a52c29e6 Add check for allocation 2020-10-28 16:22:22 -04:00
Nick Terrell
599ff58e08
Merge pull request #2339 from terrelln/zstdmt-stability
Fix zstdmt stability issues and clean up the zstdmt code
2020-10-27 19:43:13 -07:00
senhuang42
169fc07aa1 Move test to appropriate location 2020-10-27 16:59:43 -04:00
senhuang42
db0b5d7d1e Add test to fuzzer.c 2020-10-27 16:57:24 -04:00
sen
17b700d78a
Merge pull request #2366 from senhuang42/enable_ldm_by_default
Enable LDM by default if window size >= 128MB and strategy uses opt parser
2020-10-27 14:59:28 -04:00
senhuang42
dc448563e9 Add test compatibility with last literals in sequences 2020-10-27 12:35:28 -04:00
Yann Collet
d3f1a9b5bd fix partial-build test
sometimes, the scope difference is solely determined by the list of source files,
not by the flags.
2020-10-22 21:36:09 -07:00
Yann Collet
91a8cb9559 fix DEBUGLEVEL redefinition from tests/ 2020-10-22 00:20:40 -07:00
Yann Collet
494f7169ed fix directory creation for Windows' libzstd 2020-10-22 00:15:31 -07:00
Yann Collet
ca75da8fa3 fix test
DEBUGLEVEL redefinition
2020-10-21 23:51:13 -07:00
Nick Terrell
d6dae2000b
Merge pull request #2365 from senhuang42/move_opt_parser_test_to_long_tests
Move ldm + opt parser no regression test to long tests
2020-10-20 11:34:36 -04:00
senhuang42
81a2c02d8f Move ldm no regression test to fuzzer longtests 2020-10-19 15:28:46 -04:00
senhuang42
df470e176b Add unit test for no cctx requested params change 2020-10-19 10:52:41 -04:00
senhuang42
42d037bdba Add libregression build target, also fix make clean and .gitignore 2020-10-15 10:34:50 -04:00
Yann Collet
f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell
ede4f97153 [zstdmt] Fix bug where extra empty blocks are emitted
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.

The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
2020-10-12 12:55:17 -07:00
Nick Terrell
9ab9229e11 [zstreamtest] Add compression determinism tests
* Run compression twice and check the compressed data is byte-identical.
  The compression loop had to be rewritten to ensure deteriminism. It is
  guaranteed by always making maximal forward progress.
* When nbWorkers > 0, change the number of workers 1/8 of the time.
* Run in single-pass mode 1/4 of the time.

I've run a few hundred thousand iterations of zstreamtest and have seen
no deteriminism issues so far. Before the zstdmt fix that skips the
single-pass shortcut non-determinism showed up in a few hundred
iterations.
2020-10-12 12:55:17 -07:00
Nick Terrell
c51a9e79b9 [zstdmt] Rip out the zstdmt API
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
2020-10-12 12:55:16 -07:00
Nick Terrell
d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Nick Terrell
7083f79008 [bug] Fix dictContentType when reprocessing cdict
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).

Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.

Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.

Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
2020-10-12 12:46:10 -07:00
Yann Collet
b951ad20a2
Merge pull request #2329 from senhuang42/prevent_summary_updates_when_using_stdout
Prevent summary updates when using stdout
2020-10-09 01:01:36 -07:00
Yann Collet
c3ee284ca2
Merge pull request #2319 from facebook/fullbench_stream2
update fullbench for compressStream2()
2020-10-09 00:40:59 -07:00
senhuang42
e96ea5d147 Fix static analyze fuzzer.c error 2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
senhuang42
429dec4f42 Add DEBUGLOG() calls in ldm helpers 2020-10-07 13:56:25 -04:00
senhuang42
cfd2aec1b7 Add unit tests into playTests.sh 2020-10-07 13:56:25 -04:00
senhuang42
7259b258d1 Add callsites to zstdcli.c and tests to playTests.sh 2020-10-07 13:47:38 -04:00
Nick Terrell
0057c4acf7
Merge pull request #2333 from terrelln/stable-dst
Reset all decompression parameters in ZSTD_DCtx_reset()
2020-10-01 18:56:11 -07:00
Nick Terrell
2e7d174130 Reset all decompression parameters in ZSTD_DCtx_reset()
* Reset all decompression parameters in `ZSTD_DCtx_reset()` when
  resetting parameters.
* Add a test case.
2020-10-01 14:19:21 -07:00
Yann Collet
83461ce963
Merge pull request #2322 from senhuang42/guard_against_stdin_for_warning_prompts
Don't let warning messages consume input from stdin
2020-09-30 08:26:50 -07:00
senhuang42
9f7212a48b Update unit tests 2020-09-24 16:44:33 -04:00
Yann Collet
c6c0a57c53
Merge pull request #2315 from senhuang42/allow_zstd_suffix
Support .zstd suffix only for decompression
2020-09-24 09:44:48 -07:00
senhuang42
21cd640b93 Add unit tests to guard against bad stdin 2020-09-22 14:55:41 -04:00
senhuang42
7aa3da1cd7 Use IS_CONSOLE macro to detect that we're indeed using a console 2020-09-22 14:15:52 -04:00
Nick Terrell
973f2adeec [tests] Don't write to stdout 2020-09-22 00:40:27 -07:00
Yann Collet
5618e000bd update fullbench for compressStream2()
makes it possible to measure scenarios such as #2314
2020-09-21 07:19:20 -07:00
Felix Handte
200c960f1d
Merge pull request #2311 from felixhandte/ddss-fix-cparam-derivation
Fix Compression Parameter Derivation Bugs Introduced by DDSS Changes
2020-09-18 14:02:14 -04:00
senhuang42
07034952df Add -f to .zstd decompression CLI test 2020-09-18 13:01:45 -04:00
senhuang42
6b6cc80196 Support .zstd suffix only for decompression 2020-09-18 12:49:51 -04:00
W. Felix Handte
9398acb245 Move Last Two Long Tests in fuzzer.c into Separate --long-tests Section 2020-09-17 13:31:10 -04:00
W. Felix Handte
f23a321781 Update Regression Test Results 2020-09-17 12:23:05 -04:00
Yann Collet
e583e0be8c
Merge pull request #2299 from senhuang42/env_var_num_threads
Allow environment variable to specify number of threads for compression
2020-09-14 14:04:19 -07:00
Yann Collet
dec1a78d3e minor fix casting for Visual 2020-09-14 11:46:23 -07:00
Yann Collet
c91a0855f8 check endDirective in ZSTD_compressStream2()
fix #2297
also :
- `assert()` `endDirective` in `ZSTD_compressStream_internal()`, for debug mode
- add relevant tests
2020-09-14 10:56:08 -07:00
W. Felix Handte
d6246d4a0f Print More During Fuzzer Test to Avoid CI Killing it Due to Timeout
This is kind of hacky. And maybe this test doesn't need to be permanently as
exhaustive as it is now. But while we're actively developing the DDSS, we
should ensure it's compatible across many different modes.
2020-09-10 23:35:42 -04:00
W. Felix Handte
6d3f816b3e Test Fewer Dictionary Sizes 2020-09-10 22:30:52 -04:00
W. Felix Handte
b6df3fd438 Fix Debug Logging in 32-bit Build 2020-09-10 22:10:02 -04:00
W. Felix Handte
2cc2b40a1b Test DDSS A Little More Thoroughly 2020-09-10 22:10:02 -04:00
W. Felix Handte
b81f3a37f9 Easy: Fix Test 2020-09-10 18:51:52 -04:00
W. Felix Handte
2cf6cfc55f Add Fuzzer Test for the Various Dict Attachment Strategies 2020-09-10 18:51:52 -04:00
Nick Terrell
a90779397a [lib] Reduce zstd stack usage by 1KB 2020-09-09 14:35:39 -07:00
senhuang42
a71963c7b8 nbThreads instead of numThreads 2020-09-09 12:40:00 -04:00
senhuang42
0a170b20a8 Add ZSTD_NUMTHREADS tests to playTests.sh 2020-09-08 10:34:50 -04:00
senhuang42
3aec385a10 Fix merge conflicts 2020-08-26 15:43:38 -04:00
Yann Collet
a8c66881e5
Merge pull request #2283 from senhuang42/progress_bars_for_multiple_files
Refreshing progress bar for processing multiple files
2020-08-26 11:54:50 -07:00
Nick Terrell
cf83aceaf3
Merge pull request #2282 from terrelln/ncount-fix
[bug] Fix FSE_readNCount()
2020-08-26 10:31:07 -07:00
senhuang42
a73e131f10 Adjust playTests.sh refuse overwrite test to include -q 2020-08-26 11:40:05 -04:00
Nick Terrell
ae163015b1 [fuzz] Fix stream_decompress timeouts 2020-08-25 17:13:09 -07:00
Nick Terrell
49eeb2d1fc [fuzz] Disable superblock expansion test 2020-08-25 17:13:06 -07:00
Nick Terrell
4193638996 [bug] Fix FSE_readNCount()
* Fix bug introduced in PR #2271
* Fix long-standing bug that is impossible to trigger inside of zstd
* Add a fuzzer that makes sure the normalized count always round trips
  correctly
2020-08-25 15:42:41 -07:00
Yann Collet
f82d9865b9
Merge pull request #2278 from senhuang42/ignore_checksum_advanced_param
New advanced decompression param to ignore checksums
2020-08-25 12:08:53 -07:00
Nick Terrell
614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
senhuang42
dde97de6c4 Only ask to proceed if using --rm, otherwise just display warning. -f bypasses it all. More robust tests 2020-08-24 20:20:39 -04:00
senhuang42
1acf243540 Add a warning whenever (de)compressing multiple files into one source, or into stdout 2020-08-24 19:10:03 -04:00
Nick Terrell
52f33a1da5 Fix compiler warnings 2020-08-24 16:09:45 -07:00
senhuang42
a030560d62 Add new DCtx param: validateChecksum and update unit tests 2020-08-24 17:28:00 -04:00
Nick Terrell
1302f8d676 [fix] Always return dstSize_tooSmall when it is the case 2020-08-24 13:38:13 -07:00
senhuang42
44c54a3e31 Addressing comments: more comments, cleanup, remove extra function, checksum logic 2020-08-24 16:14:19 -04:00
Nick Terrell
8def0e5fd3 Fix up code after reading through 2020-08-24 12:24:45 -07:00
senhuang42
ffaa0df76d Document change in CLI for --no-check during decompression in --help menu 2020-08-24 09:49:12 -04:00
senhuang42
e3f5f9658a Added CLI tests for --no-check, fixed ignore checksum logic 2020-08-22 16:05:40 -04:00
senhuang42
20eb095882 Added unit test to fuzzer.c, changed definition param name 2020-08-22 13:26:33 -04:00
senhuang42
1b34b15e6b Adding CLI capability to invoke decompression with no checksum 2020-08-21 17:49:30 -04:00
senhuang42
6a8dbdcd1f Modify decompression loop to gnore checksums if flag is enabled 2020-08-21 16:46:46 -04:00
Nick Terrell
8f8bd2d1ac [regression] Update results.csv 2020-08-20 12:41:35 -07:00
Nick Terrell
575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Nick Terrell
612e947c5e wire up bmi2 support 2020-08-17 16:35:28 -07:00
Nick Terrell
a8006264cf small blocks benchmark 2020-08-14 18:57:20 -07:00
Yann Collet
23941eec04 added tests for newly enabled syntax
for --patch-from origin
and --filelist list

Also : removed some constrained syntax tests,
as the new argument parsing syntax is more permissive.

For example :
    zstd file -of dest
used to be disallowed.

It's now allowed, and understood as:
    zstd file -o dest -f
2020-07-17 13:31:15 -07:00
Xin Xie
9a8ccd4ba3 Add output-dir-mirror option 2020-06-24 22:12:11 -07:00
Bimba Shrestha
de48f35306 adding --patch-from --stream-size test 2020-06-18 10:28:37 -07:00
Nick Terrell
08981d2638 [lib] Allow compression dictionaries with missing symbols
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.

Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.

Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.

Fixes #2174.
2020-06-12 17:57:19 -07:00
Bimba Shrestha
e2838d9eb9 Spelling mistakes 2020-06-05 05:11:26 -05:00
Shaojing Li
847349195f fix the if statements in posix sh env 2020-06-03 11:36:38 -07:00
Shaojing Li
3a3da1712b check env variables and add default values 2020-06-03 10:49:21 -07:00
Bimba Shrestha
b0f851675a [shellcheck] setting if unset 2020-06-02 09:12:50 -07:00
Bimba Shrestha
151deaf143 [shellcheck] adding quotes to expansion 2020-06-02 09:12:13 -07:00
Yann Collet
26b21e481f fix meson playTests.sh 2020-05-21 15:17:22 -07:00
Nick Terrell
651d3d73e0 [test] Update the ldm loadedDictEnd test to cover zstdmt 2020-05-19 16:14:14 -07:00
Nick Terrell
0dcd3eec43
Merge pull request #2152 from terrelln/simple-rt-bound
[fuzz] Expand the allowedExpansion
2020-05-19 12:56:11 -07:00
Nick Terrell
b82bf711fc [fuzz] Expand the allowedExpansion 2020-05-19 11:42:53 -07:00
Yann Collet
fdc56baa42
fix 22294 (#2151) 2020-05-18 21:05:10 -07:00
Nick Terrell
9778f46014
Merge pull request #2150 from terrelln/ldm-dict-reset
[ldm] Reset loadedDictEnd when the context is reset
2020-05-18 18:33:01 -07:00
Nick Terrell
7b317b4876 [test] Test that the ldm dictionary gets invalidated on reset 2020-05-18 16:00:28 -07:00
Nick Terrell
87dbd6d4bf [test] Improve LDM forceMaxWindow test 2020-05-18 15:11:18 -07:00
W. Felix Handte
d37fcf36eb Don't Use [[ in Shell Scripts 2020-05-18 15:06:56 -04:00
Bimba Shrestha
255e5e3f56
[fuzz] Adding dictionary_stream_round_trip fuzzer (#2140)
* Adding dictionary_stream_round_trip

* fixing memory leak
2020-05-15 13:33:31 -07:00
Nick Terrell
608075abb2 [test][regression] Update results.csv 2020-05-14 17:06:39 -07:00
Nick Terrell
bf0591e1e2 [test] Expose the LDM+MT+dict bug in a unit test 2020-05-14 12:06:55 -07:00
Bimba Shrestha
12071467d3 reverting docs and test 2020-05-13 15:22:07 -05:00
Nick Terrell
c3e921c639
Merge pull request #2131 from terrelln/raw-dict-fuzzer
Fix rare scenario with lazy parser, dictionary, and repcodes
2020-05-12 17:44:31 -07:00
Bimba Shrestha
0453cfa8f5 removing -f test (grep usage not supported on mac) 2020-05-12 15:18:43 -05:00
Nick Terrell
4b88bd3ee0 [lib][fuzz] Assert sequences are valid in round trip tests 2020-05-11 20:38:49 -07:00
Yann Collet
e001715b3d fixed asan test 2020-05-11 20:35:47 -07:00
Yann Collet
20bd246045 blindfix for VS macro redefinition 2020-05-11 19:29:36 -07:00
Nick Terrell
1185dfb8d1 [fuzz] Add raw dictionary content fuzzer 2020-05-11 19:03:33 -07:00
Nick Terrell
301a62fe08 [fuzz] Fix compress bound for dictionary_round_trip 2020-05-11 19:00:52 -07:00
Yann Collet
91ad01218e updated initStatic tests
differentiate small CCtx for small inputs
from full CCtx
from CStream contexts.

Ensure allocation & resize tests are more precise.
2020-05-11 18:50:10 -07:00
Yann Collet
608f1bfc4c fixed context downsize with initStatic
When context is created using initStatic,
no resize is possible.

fix : only bump oversizeDuration when !initStatic
2020-05-11 18:16:38 -07:00
Yann Collet
dd026ca505 re-inforced tests for initStaticCCtx
ensure that `estimateCCtxSize()` works as intended.
2020-05-09 11:30:45 -07:00
Nick Terrell
5717bd39ee [lib] Fix NULL pointer dereference
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.

This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.

* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.

Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.

Credit to OSS-Fuzz.
2020-05-06 12:09:02 -07:00
Bimba Shrestha
e7df0d41bb
Merge pull request #2095 from bimbashrestha/grep
[bugs] zstdgrep/grep inconsistencies
2020-05-06 11:18:15 -05:00
Bimba Shrestha
250184adf6 adding tests back 2020-05-05 16:51:06 -07:00
Felix Handte
7e9aabd652
Merge pull request #2099 from felixhandte/compile-under-pedantic
Compile Under `-pedantic -Werror` and `-std=c90`
2020-05-04 10:07:13 -07:00
Felix Handte
816ed80774
Merge pull request #1984 from MeghnaM/1636-Reduce-stack-usage-of-HUF_sort
Reduce stack usage of HUF_sort()
2020-05-04 08:15:31 -07:00
W. Felix Handte
2cf72d56a6 Try to Fix MSVC Error
It's complaining about the `memcpy`s, saying:

"warning C4090: 'function': different 'const' qualifiers"

Let's try explicitly casting to the argument types...
2020-05-04 10:59:15 -04:00
W. Felix Handte
dacbcd2cc1 Fix Up Some Pointer Handling in Tests 2020-05-04 10:59:15 -04:00
Nick Terrell
e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Meghna Malhotra
cc7c29595d Fixed tests to use correct workspace size 2020-05-01 13:45:48 -07:00
Yann Collet
da2748a855
Merge pull request #2097 from facebook/underlink
Fix underlinked libzstd
2020-04-30 10:16:24 -07:00
Yann Collet
7ea2ae6649 added test linking user program to multi-threaded libzstd 2020-04-28 21:18:29 -07:00
Nick Terrell
0ed07f6dfe
Merge pull request #2094 from terrelln/stable-dst
[lib] Add ZSTD_d_stableOutBuffer + fix single-pass mode for empty frames
2020-04-28 17:53:24 -07:00
Yann Collet
f17ac423b2 new tests created new artifacts
they were not properly ignored
2020-04-28 15:58:22 -07:00
Nick Terrell
108a5572a5
Merge pull request #2048 from nocnokneo/ctest-support
Add CTest support
2020-04-28 11:01:13 -07:00
Nick Terrell
1343b815f8 [fuzz] Fuzz test ZSTD_d_stableOutBuffer 2020-04-27 20:04:04 -07:00
Nick Terrell
f33de06c3e [lib] Fix single-pass mode for empty frames 2020-04-27 20:04:01 -07:00
Nick Terrell
a4ff217baf [lib] Add ZSTD_d_stableOutBuffer 2020-04-27 18:09:44 -07:00
Bimba Shrestha
6b4a3e019f
Merge pull request #2088 from bimbashrestha/bug
[bug] Making compressStream2 fail when passing rawContent but claiming fullDict
2020-04-23 14:16:56 -05:00
Bimba Shrestha
f7a7409a49 adding fail test when passing wrong fullDict using refPrefix 2020-04-21 22:26:48 -07:00
Bimba Shrestha
dba02245bf bash to shell conversion 2020-04-21 20:31:11 -07:00
Bimba Shrestha
0b107188b8 adding test for long mode trigger 2020-04-21 21:09:49 -05:00
Bimba Shrestha
5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Yann Collet
458a1a1723 minor refactor
- fix a few comments
- reorder some parameters, to enforce "mutable references first"
- simplified fwriteSparse()
2020-04-13 14:09:57 -07:00
Bimba Shrestha
794f03459e adding refPrefix 2020-04-06 22:57:49 -07:00
Bimba Shrestha
31e76f1ed4 adding test for dctx size reduction 2020-04-04 08:49:24 -07:00
Nick Terrell
1665462573
Merge pull request #2054 from terrelln/license-fix
Standardize and fix copyright and licenses
2020-03-27 11:00:01 -07:00
Nick Terrell
ef9e6fe227 [test] Fix playTests.sh with space in binary path
playTests.sh didn't work when `ZSTD_BIN` or `DATAGEN_BIN` had a space in
the path name. This happens for me because I split the cmake build
directory by compiler name, like "Clang 9.0.0".

The fix is to replace all instances of `$ZSTD` with the `zstd()`
function, and the replace `$DATAGEN` with `datagen()`. This will allow
us to change how we call zstd/datagen in the future without having to
change every callsite.
2020-03-26 19:52:19 -07:00
Nick Terrell
1f144351b7 [test] Add a test that checks for valid copyright and licenses
Tests all `.h`, `.c`, `.py`, and `Makefile` files for valid copyright
and license lines. Excludes a small number of exceptions (threading, and
divsufsort).

* Copyright does not contains `present`
* Copyright contains `Facebook, Inc`
* Copyright contains the current year
* License contains exactly the lines we expect
2020-03-26 17:02:09 -07:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Taylor Braun-Jones
3cbc3d37e7 Add documentation for -T option 2020-03-23 17:49:04 -04:00
Bimba Shrestha
fae64b3390 Adding test for --[no-]content-size 2020-03-09 14:44:38 -05:00
Nick Terrell
dbd6439bb6 [zstdgrep] Add a simple test 2020-03-02 16:51:34 -08:00
Bimba Shrestha
6a4258a08a
Removing symbols already in unit tests and adding some new unit tests for missing symbols (#1985)
* Removing symbols that are not being tested

* Removing symbols used in zstdcli, fileio, dibio and benchzstd

* Removing symbols used in zbuff and add test-zbuff to travis

* Removing remaining symbols and adding unit tests instead

* Removing symbols test entirely
2020-02-05 16:55:00 -08:00
Bimba Shrestha
2f10019b92 Adding --show-default-cparams (show cparams before compressing 2020-01-30 14:12:03 -08:00
Bimba Shrestha
9b049836c9 Typo baseline_build -> baseline_label 2020-01-28 21:39:20 -08:00
Bimba Shrestha
8fe562a770 [automated_benchmarking] Make arguments optional and add --dict argument (#1968)
* Make arugments optional and add --dict argument

* Removing accidental print statement

* Change to more likely scenario for dictionary compression benchmark
2020-01-28 11:29:43 -08:00
Nick Terrell
009f388457 Fix playTests.sh for 32-bit mode 2020-01-17 14:20:44 -08:00
Nick Terrell
a11a9271d6 Fix lowLimit underflow in overflow correction 2020-01-17 12:10:18 -08:00
Nick Terrell
3ed0f65158 [cmake] Add playTests.sh as a test 2020-01-13 14:16:15 -08:00
Nick Terrell
036b30b555
Fix super block compression and stream raw blocks in decompression (#1947)
Super blocks must never violate the zstd block bound of input_size + ZSTD_blockHeaderSize. The individual sub-blocks may, but not the super block. If the superblock violates the block bound we are liable to violate ZSTD_compressBound(), which we must not do. Whenever the super block violates the block bound we instead emit an uncompressed block.

This means we increase the latency because of the single uncompressed block. I fix this by enabling streaming an uncompressed block, so the latency of an uncompressed block is 1 byte. This doesn't reduce the latency of the buffer-less API, but I don't think we really care.

* I added a test case that verifies that the decompression has 1 byte latency.
* I rely on existing zstreamtest / fuzzer / libfuzzer regression tests for correctness. During development I had several correctness bugs, and they easily caught them.
* The added assert that the superblock doesn't violate the block bound will help us discover any missed conditions (though I think I got them all).

Credit to OSS-Fuzz.
2020-01-10 18:02:11 -08:00
Bimba Shrestha
f25a6e9f8f Adding new cli endpoint --patch-from= (#1940)
* Adding new cli endpoint --diff-from=

* Appveyor conversion nit

* Using bool set trick instead of direct set

* Removing --diff-from and only leaving --diff-from=#

* Throwing error when both dictFileName vars are set

* Clean up syntax

* Renaming diff-from to patch-from

* Revering comma separated syntax clean up

* Updating playtests with patch-from

* Uncommenting accidentally commented

* Updating remaining docs and var names to be patch-from instead of diff-from

* Constifying

* Using existing log2 function and removing newly created one

* Argument order (moving prefs to end)

* Using comma separated syntax

* Moving to outside #ifndef
2020-01-10 14:25:24 -08:00
Nick Terrell
d1cc9d2797
[fuzz] Allow zero sized buffers for streaming fuzzers (#1945)
* Allow zero sized buffers in `stream_decompress`. Ensure that we never have two
  zero sized buffers in a row so we guarantee forwards progress.
* Make case 4 in `stream_round_trip` do a zero sized buffers call followed by
  a full call to guarantee forwards progress.
* Fix `limitCopy()` in legacy decoders.
* Fix memcpy in `zstdmt_compress.c`.

Catches the bug fixed in PR #1939
2020-01-09 11:38:50 -08:00
Nick Terrell
b77ad810c9
[fuzz] Fix regression_driver.c with directory input (#1944)
The `numFiles` variable wasn't updated, so the fuzzer didn't do anything.
I did two things to fix this:

1. Remove the `numFiles` variable entirely.
2. Error if we can't open a file and print the number of files tested.
2020-01-08 13:20:56 -08:00
Bimba Shrestha
eb76f786bc [bench] Automated benchmarking script (#1906)
* Initial revised automated benchmarking script

* Updating nb_iterations and making loop infinite

* Allowing benchmarking params to be changed from cli

* Renaming old speed test

* Removing numpy dependency for cli

* Change filename and benchmakr on pr level

* Moving build outside loop and adding iterations param

* Moving benchmarking to seperate travis ci test

* Fixing typo and using unused variable

* Added mode labels and updated README accordingly

* Adding new mode 'current' that compraes facebook:dev against current hash

* Typo

* Reverting previous accidental diff

* Typo

* Adding frequency config variable to prevent github from blacklisting

* Added new argument for frequency of fetching new prs

* Updating documentation
2020-01-06 14:19:11 -08:00
Bimba Shrestha
b1f53b1a10 [fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936)
* Adding fail logging for superblock flow

* Dividing by targetCBlockSize instead of blockSize

* Adding new const and using more acurate formula for nbBlocks

* Only do dstCapacity check if using superblock

* Remvoing disabling logic

* Updating test to make it catch more extreme case of previou bug

* Also updating comment

* Only taking compressEnd shortcut on non-superblock
2020-01-03 16:53:51 -08:00
Felix Handte
6f4341c432 Fix playTests.sh Under QEMU (#1923) 2019-12-26 11:16:23 -08:00
Bimba Shrestha
56415efc76 Constifying, malloc check and naming nit 2019-12-17 17:16:51 -08:00
Bimba Shrestha
989ce13e19 One more type conversion 2019-12-13 16:50:21 -08:00
Bimba Shrestha
4399eed42e Adding explict cast to satisfy appveyor ci 2019-12-13 16:38:11 -08:00
Bimba Shrestha
db5124ef6e More void* issues. Just replacing with BYTE* 2019-12-13 16:24:49 -08:00
Bimba Shrestha
49b2bf7106 'void* size issue' fix 2019-12-13 16:06:57 -08:00
Bimba Shrestha
e3cd2785e2 Add test to catch too many noCompress superblocks on streaming 2019-12-13 15:31:29 -08:00
Yann Collet
d73e2fb465
Merge pull request #1891 from bimbashrestha/oss
[fuzz] Superblock fuzz issues
2019-12-10 13:17:00 -08:00