* adding long support for patch-from
* adding refPrefix to dictionary_decompress
* adding refPrefix to dictionary_loader
* conversion nit
* triggering log mode on chainLog < fileLog and removing old threshold
* adding refPrefix to dictionary_round_trip
* adding docs
* adding enableldm + forceWindow test for dict
* separate patch-from logic into FIO_adjustParamsForPatchFromMode
* moving memLimit adjustment to outside ifdefs (need for decomp)
* removing refPrefix gate on dictionary_round_trip
* rebase on top of dev refPrefix change
* making sure refPrefx + ldm is < 1% of srcSize
* combining notes for patch-from
* moving memlimit logic inside fileio.c
* adding display for optimal parser and long mode trigger
* conversion nit
* fuzzer found heap-overflow fix
* another conversion nit
* moving FIO_adjustMemLimitForPatchFromMode outside ifndef
* making params immutable
* moving memLimit update before createDictBuffer call
* making maxSrcSize unsigned long long
* making dictSize and maxSrcSize params unsigned long long
* error on files larger than 4gb
* extend refPrefix test to include round trip
* conversion to size_t
* making sure ldm is at least 10x better
* removing break
* including zstd_compress_internal and removing redundant macros
* exposing ZSTD_cycleLog()
* using cycleLog instead of chainLog
* add some more docs about user optimizations
* formatting
playTests.sh didn't work when `ZSTD_BIN` or `DATAGEN_BIN` had a space in
the path name. This happens for me because I split the cmake build
directory by compiler name, like "Clang 9.0.0".
The fix is to replace all instances of `$ZSTD` with the `zstd()`
function, and the replace `$DATAGEN` with `datagen()`. This will allow
us to change how we call zstd/datagen in the future without having to
change every callsite.
* Adding new cli endpoint --diff-from=
* Appveyor conversion nit
* Using bool set trick instead of direct set
* Removing --diff-from and only leaving --diff-from=#
* Throwing error when both dictFileName vars are set
* Clean up syntax
* Renaming diff-from to patch-from
* Revering comma separated syntax clean up
* Updating playtests with patch-from
* Uncommenting accidentally commented
* Updating remaining docs and var names to be patch-from instead of diff-from
* Constifying
* Using existing log2 function and removing newly created one
* Argument order (moving prefs to end)
* Using comma separated syntax
* Moving to outside #ifndef
Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table.
* Only setting loaded dict huf table to valid on non-zero
* Adding hasNoZeroWeights test to fse tables
* Forbiding nbBits != 0 when weight == 0
* Reverting the last commit
* Setting table log to 0 when weight == 0
* Small (invalid) zero weight dict test
* Small (valid) zero weight dict test
* Initializing repeatMode vars to check before zero check
* Removing FSE changes to seperate pr
* Reverting accidentally changed file
* Negating bool, using unsigned, optimization nit
/dev/null permissions were modified when using sudo rights.
This fixes this bug during decompression.
More importantly, this patch adds a test, triggered in TravisCI,
ensuring unaltered /dev/null permissions.
date(1) is used to display the last modification time of a file, which
is not supported on OpenBSD, FreeBSD and Darwin. Instead use stat(1).
Tested on OpenBSD.
* [ldm] Fix bug in overflow correction with large job size
* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)
* [test] Add test that exposes the bug
Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
* tests: Fix shellcheck warnings in playTests.sh
* tests: Do not use ../programs which is relative to tests dirs
This commit fixes error when running playTests.sh in Meson.
Mesonbuild runs out of tree, so ./datagen not in `zstd/tests` dir,
it lies in <mesonbuilddir>/tests. This leads to ../programs invalid.
* tests: Replace relative paths for zstd/tests dir
* playTests: Set shell options explicitly, not in shebang
* playTests: Replace echo -e with printf
* meson: Fix test-zstd
Use std=gnu99 to build and test just like `make test`.
* meson: Fix legacy test
* meson: Enable testing in CI
Run build under release mode for faster test time.
* meson: Increase timeout time for test-zstream
Pull request #1499 added a new test, which uses 'head -c'. The '-c'
option is non-portable (not in POSIX). Instead use 'dd'. Similar issue
has been resolved in the past (#1321).
fseek() doesn't indicate when it moves past the end of a file.
Consequently, if a file is truncated within its last block, the error would't be detected.
This PR adds a test scenario that induces this situation using a small compressed file of only one block in size.
This test is added to tests/playTests.sh
Check is implemented by ensuring that the filehandle position is equal to the filesize upon exit.
On Windows, the equivalent of `/dev/null` is `NUL`.
When tests are run under msys2/minGW,
the environment identifies itself as Windows,
hence the script uses `NUL` instead of `/dev/null`
but the environment will consider `NUL` to be a regular file name.
Consequently, `NUL` will be overwritten during tests,
triggering an error.
This patch uses flag `-f` to force such overwrite
passing the test.
Compare the input and output files by their inode number and
refuse to open the output file if the input file is the same.
This doesn't work when (de)compressing multiple files to a single
file, but that is a very uncommon use case, mostly used for
benchmarking by me.
Fixes#1422.
The `--no-progress` flag disables zstd's progress bars, but leaves
the summary.
I've added simple tests to `playTests.sh` to make sure the parsing
works.
Sometimes, it's necessary to test that a certain command fail, as expected.
Such failure is actually a success, and must not stop the flow of tests.
Several tests were prefixed with `!` to invert return code.
This does not work : it effectively makes the tests pass no matter what.
Use instead function die(), which is meant to trap successes, and transform them into errors.
tests/playTests.sh uses 'head -c' in a couple of tests to truncate the
last byte of a file. The '-c' option is non-portable (not in POSIX).
Instead use a wrapper around dd (truncateLastByte).
* Minor fix
* Run non-optimize FASTCOVER 5 times in benchmark
* Merge fastCover into dictBuilder
* Fix mixed declaration issue
* Add fastcover to symbol.c
* Add fastCover.c and cover.h to build
* Change fastCover.c to fastcover.c
* Update benchmark to run FASTCOVER in dictBuilder
* Undo spliting fastcover_param into cover_param and f
* Remove convert param functions
* Assign f to parameter
* Add zdict.h to Makefile in lib
* Add cover.h to BUCK
* Cast 1 to U64 before shifting
* Remove trimming of zero freq head and tail in selectSegment and rebenchmark
* Remove f as a separate parameter of tryParam
* Read 8 bytes when d is 6
* Add trimming off zero frequency head and tail
* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)
* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary
* Change nbDmer to always read 8 bytes even when d=6
* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER
* Update comments and benchmarking result
* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover
* Add dictType enum and fix bug about passing zParam when converting to coverParam
* Combine finalize and skip into a single parameter
* Update acceleration parameters and benchmark on 3 sample sets
* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets
* Initialize variables outside of for loop in benchmark.c
* Update benchmark result for hg-manifest
* Remove cover.h from install-includes
* Add explanation of f
* Set default compression level for trainFromBuffer to 3
* Add assertion of fastCoverParams in DiB_trainFromFiles
* Add checkTotalCompressedSize function + some minor fixes
* Add test for multithreading fastCovr
* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish
* Free segmentFreqs
* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment
* Add FASTCOVER_MEMMULT
* Minor fix
* Update benchmarking result
OpenBSD's port building infrastructure is able to build in a privilege
separated mode. It uses a privilege drop model. Regression tests fail in
this mode as xz/lzma is unable to set file group and errors out with:
xz: tmp.xz: Cannot set the file group: Operation not permitted
gmake[1]: *** [Makefile:307: zstd-playTests] Error 2
Actually it is not a xz/lzma error but a warning causing zstd's
regression test to fail. Proposed fix is to have xz/lzma not set the
exit status to 2 even if a condition worth a warning was detected (-Q
flag).
https://github.com/facebook/zstd/pull/1124 fixes an issue with GNU/Hurd
being unable to write to /dev/zero. Implemented fix is writing to
/dev/random instead.
On OpenBSD a regular user is unable to write to /dev/random because of
permissions set on this device. Result is failing a regression test.
Proposed solution should work for all platforms.
OpenBSD uses md5 instead of md5sum, and has no device called full.
With this patch, make check runs until #1088. With the assumption made
in the issue make check runs succesfully.
Summary:
Allocate a single input buffer large enough to house each job, as well as
enough space for the IO thread to write 2 extra buffers. One goes in the
`POOL` queue, and one to fill, and then block on a full `POOL` queue.
Since we can't overlap with the prefix, we allocate space for 3 extra
input buffers.
Test Plan:
* CI
* With and without ASAN/UBSAN run zstdmt with different number of threads
on two large binaries, and verify that their checksums match.
* Test on the tip of the zstdmt ldm integration.
Reviewers: cyan
Differential Revision: https://phabricator.intern.facebook.com/D7284007
Tasks: T25664120
access negative compression levels from command line
for both compression and benchmark modes.
also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.
added relevant cli tests.
added some test
also updated relevant doc
+ fixed a mistake in `lz4` symlink support :
lz4 utility doesn't remove source files by default (like zstd, but unlike gzip).
The symlink must behave the same.
we want the dictionary table to be fully sorted,
not just lazily filled.
Dictionary loading is a bit more intensive,
but it saves cpu cycles for match search during compression.
zstd streaming API was adding a null-block at end of frame for small input.
Reason is : on small input, a single block is enough.
ZSTD_CStream would size its input buffer to expect a single block of this size,
automatically triggering a flush on reaching this size.
Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`).
The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame.
The solution is to not flush automatically, which is btw the expected behavior.
It happens in this case because blocksize is defined with exactly the same size as input.
Just adding one-byte is enough to stop triggering the automatic flush.
I initially looked at another solution, solving the problem directly in the compression context.
But it felt awkward.
Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame
on reaching its expected end (`pledgedSrcSize`).
This feels awkward, a responsability over-reach, beyond the definition of this API.
ZSTD_compressContinue() is clearly documented as a guaranteed flush,
with ZSTD_compressEnd() generating a guaranteed end.
I faced similar issue when trying to port a similar mechanism at the higher streaming layer.
Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller,
since it did not explicitly requested an end of frame.
The only sensible action remaining after that is to end the frame with no additional input.
This adds additional logic in the ZSTD_CStream state to check this condition.
Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?)
In the end, just enlarging input buffer by 1 byte feels the least intrusive change.
It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code.
The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.