from overlapSizeLog.
Reasoning :
`overlapLog` is already used everwhere, in the code, command line and documentation.
`ZSTD_c_overlapSizeLog` feels unnecessarily different.
ZSTD_compress_generic() is renamed ZSTD_compressStream2().
Note that, for the time being,
the "stable" API and advanced one use different parameter planes :
setting parameters using the advanced API does not influence ZSTD_compressStream()
and using ZSTD_initCStream() does not influence parameters for ZSTD_compressStream2().
When multiple -arch flags are used, the compiler invokes itself once for
each architecture. Apparently, input on stdin is consumed by the
compilation of the first arch and is no longer available to the
compilation of the second arch, which results in a build failure and the
potentially incorrect determination that a feature is not available. So
write the feature detection source to a file instead of using stdin.
answering #1407.
Also : removed obsolete function ZSTD_setDStreamParameter()
which could only be used with one parameter (DStream_p_maxWindowSize).
Now replaced by ZSTD_DCtx_setWindowSize() (which exists since a few revisions)
fix#1379
decodecorpus was generating one extraneous byte when `nbSeq==0`.
This is disallowed by the specification.
The reference decoder was just skipping the extraneous byte.
It is now stricter, and flag such situation as an error.
gcc-8 on Linux doesn't like usage of strncat :
`warning: ‘strncat’ output truncated before terminating nul copying as many bytes from a string as its length`.
Not sure what was wrong, it might be a false positive,
but the logic is simple enough to replaced by a simple `memcpy()`,
thus avoiding the shenanigans of null-terminated strings.
following #1356,
only enable backtrace compilation on linux+glibc.
Also, disable backtrace by default from "release" compilation,
so that less platforms get impacted by the new requirements.
Can be manually enabled/disabled using BACKTRACE=1/0.
and slightly refactored affected function.
Honestly, the formula calculating variance should get a second reviewing round,
it's not clear if it's correct.
some non-trivial changes to platform.h and util.h,
initially related to compilation for Haiku,
but I used this opportunity to make them cleaner
and add some documentation.
Noticed several tests that could be improved
(too harsh conditions, useless exception, etc.)
but I did not dare modifying too many tests just before release.
note : for some reason,
scan-build version on my laptop found problems within fastcover.c
that scan-build on travisCI does not flag.
They are, as usual, false positive :
the analyzer does not understand that a table (`offset`) is correctly filled before usage.
for Linux and Mac OS-X.
Note : the backtraces fires up through a trap
before the sanitizer get a chance to report.
There are situations where the sanitizer report is actually preferable.
It might be good to consider a kind of build macro
which can disable backtrace
when sanitizer is enabled.
from EXIT_IF() to RETURN_IF()
EXIT could be misunderstood as exit(), which terminates program execution.
But the macro only leaves the function, not the program.
- Print signal name to term
- Add -rdynamic option to generate Linux symbol names in backtrace
- Raise default signal after handler to ensure program termination
For OSX and Linux, add a signal handler to SIGABRT, SGIFPE, SIGILL,
SIGSEGV, and SIGBUS. When the program terminates unexpectedly the
handler will print the current stack to the terminal to help determine
the location of the failure.
On OSX the output will look like:
```
Stack trace:
4 zstd 0x000000010927ed96 main + 16886
5 libdyld.dylib 0x00007fff767d1015 start + 1
6 ??? 0x0000000000000001 0x0 + 1
```
On Linux the output will look like:
```
Stack trace:
./zstd() [0x4b8e1b]
./zstd() [0x4b928a]
./zstd() [0x403dc2]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f5e0fbb0445]
./zstd() [0x405754]
```
As is, the code does not function on WIN32.
See also: https://oroboro.com/stack-trace-on-crash/
measures :
- compression ratio with / without dictionary
- create one dictionary per block
- memory budget for dictionaries
- decompression speed, using one different dictionary per block
current limitations :
- only one file
- 4K blocks only
- automatic dictionary built with 4K size
dictionary can be selected on command line, with -D
* Minor fix
* Run non-optimize FASTCOVER 5 times in benchmark
* Merge fastCover into dictBuilder
* Fix mixed declaration issue
* Add fastcover to symbol.c
* Add fastCover.c and cover.h to build
* Change fastCover.c to fastcover.c
* Update benchmark to run FASTCOVER in dictBuilder
* Undo spliting fastcover_param into cover_param and f
* Remove convert param functions
* Assign f to parameter
* Add zdict.h to Makefile in lib
* Add cover.h to BUCK
* Cast 1 to U64 before shifting
* Remove trimming of zero freq head and tail in selectSegment and rebenchmark
* Remove f as a separate parameter of tryParam
* Read 8 bytes when d is 6
* Add trimming off zero frequency head and tail
* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)
* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary
* Change nbDmer to always read 8 bytes even when d=6
* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER
* Update comments and benchmarking result
* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover
* Add dictType enum and fix bug about passing zParam when converting to coverParam
* Combine finalize and skip into a single parameter
* Update acceleration parameters and benchmark on 3 sample sets
* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets
* Initialize variables outside of for loop in benchmark.c
* Update benchmark result for hg-manifest
* Remove cover.h from install-includes
* Add explanation of f
* Set default compression level for trainFromBuffer to 3
* Add assertion of fastCoverParams in DiB_trainFromFiles
* Add checkTotalCompressedSize function + some minor fixes
* Add test for multithreading fastCovr
* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish
* Free segmentFreqs
* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment
* Add FASTCOVER_MEMMULT
* Minor fix
* Update benchmarking result
Per warnings from flawfinder: "Does not check for buffer overflows when
copying to destination [MS-banned] (CWE-120). Consider using snprintf,
strcpy_s, or strlcpy (warning: strncpy easily misused).".
Replaced called to strcpy and strcat in `fileio.c` to calls with a
specified size (`strncpy` and `strncat`).
Tested the changes on OSX, Linux, Windows.
On OSX + Linux, changes were tested with ASAN. The following flags were
used: 'check_initialization_order=1:strict_init_order=1:detect_odr_violation=1:detect_stack_use_after_return=1'
To reproduce warning:
./flawfinder.py ./programs/fileio.c
tells in a non-blocking way if there is something ready to flush right now.
only works with multi-threading for the time being.
Useful to know if flush speed will be limited by lack of production.
Add different constraint types (decompression speed, compression memory, parameter constraints)
Separate search space by strategy + strategy selection
Memoize results
Real random restarts
Support multiple files
Support Dictionary inputs
Debug Macro for extra printing
Additional constraint checking
Minor fixes
more param parsing
Add Memory
Change paramVariation
work on feasibility
reformat bench
Changed Paramgrill to use bench.c benchmarking
customlevel macro
Printing Flag
Minor changes
Explicit casting
Makefile fix
casting, type fix
Printing Flag
Minor Changes
comments, helper fn's
Seperate syntheticTest and fileTableTest (now renamed as benchFiles)
Add incremental display to benchMem
Change to only iterMode for benchFunction
Make Synthetic test's compressibility configurable from cli (using -P#)
In the new advanced API, adjust the parameters even if they are explicitly
set. This mainly applies to the `windowLog`, and accordingly the `hashLog`
and `chainLog`, when the source size is known.
-Remove global variables
-Remove gv setting functions
-Add advancedParams struct
-Add defaultAdvancedParams();
-Change return type of bench Files
-Change cli to use new interface
-Changed error returns to own struct value
-Change default compression benchmark to use decompress_generic
-Add CustomBench function
-Add Documentation for new functions
IS_CONSOLE stolen wholesale from Options.cpp
not sure if i should have extracted that code for DRY-ness
tested in OSX and functionality seems appropriate
unstested in a windows environment
unistd.h is for unix standard tools.
There does not appear to be a simple isatty for windows
this we only run the logic and header include in
non-windows environments
FIO would keep presenting data after an LZ4F decoding error
resulting in a NULL pointer dereference
when associated with older liblz4 version (< v1.8.1.2)
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".
It will also be possible to define larger targetlength for ultra compression mode.
There is no adverse side effect due to removing this limit.
* Replaced a non-breaking space and an en dash with a plain space and
a hyphen.
* This means the files are simple ASCII and less likely to run into
codepage issues.
- do not test level 0, as it is converted into level 3,
which feels strange when compressing multiple levels
- Use direct synchronous mode when a single worker is requested.
access negative compression levels from command line
for both compression and benchmark modes.
also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.
added relevant cli tests.
zstd bench module can focus on decompression speed _only_.
This is useful when trying to measure performance
on large input data compressed using a high level
as compression time becomes problematic (too long).
This mode is triggered by command : zstd -b -d
Problem was : in such a mode,
measured decoding speed was > 10% slower
than in nominal mode (compression + decompression),
making decompression benchmark mode much less useful.
This patch fixes the issue.
It's not completely clear why, but
moving the `memcpy()` operation sooner in the pipeline fixed it.
I can still measure some difference, but it is in the < 2% range,
so it's much more tolerable.
also : it doesn't matter anymore in which order are selected
commands `-b` and `-d`.
The combination always triggers bench_decodeOnly mode.
Silence a Coverity warning about 'windowSize' being uninitialized.
(Yes, nothing that calls this routine actually uses the windowSize
value. Still, appeasing Coverity is pretty harmless in this case.)
by invoking time() once per batch, instead of once per compression / decompression.
Batch is dynamically resized so that each round lasts approximately 1 second.
Also : increases time accuracy to nanosecond
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
added some test
also updated relevant doc
+ fixed a mistake in `lz4` symlink support :
lz4 utility doesn't remove source files by default (like zstd, but unlike gzip).
The symlink must behave the same.
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced
Ingested can be larger than consumed due to buffering effect.
For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.
That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.
This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
this happened on 32-bits build when requiring a too large input buffer,
typically on wlog=29, creating jobs of 2 GB size.
also : zstd32 now compiles with multithread support enabled by default
(can be disabled with HAVE_THREAD=0)
The compression % is no longer correct,
since it's no longer possible to make direct correlation
between nb bytes read and nb bytes written
due to large internal buffer inside CCtx
(exacerbated with --long).
The current "fix" is to no longer display the %.
A more complex solution will have to count exactly how much data has been consumed and compressed internally, within CCtx buffers.
when cli is compiled without MT support,
invoking ZSTD_p_nonBlockingMode result in an error code.
This patch only sets ZSTD_p_nonBlockingMode when ZSTD_MULTITHREAD is set, meaning there is MT support.
The error code could also be intentionnally ignored (there is no side effect).
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.
It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.
Applied within fileio.c, this patch provides non-negligible gains during compression.
Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9
With traditional single-thread blocking mode :
real 0m9.557s
user 0m8.861s
sys 0m0.538s
With new single-worker non blocking mode :
real 0m7.938s
user 0m8.049s
sys 0m0.514s
=> 20% faster
This fixes the following crash:
$ touch exists
$ programs/zstd -r examples/ -o exists
zstd: exists already exists; not overwritten
Segmentation fault (core dumped)
* programs/fileio.c (FIO_compressMultipleFilenames):
Handle the case where we're not overwriting the destination.
Reported at https://bugzilla.redhat.com/1530049
This patch restores capability for each file to receive adapted compression parameters depending on its size.
The bug breaking this feature was relatively silly :
setting a parameter with a value "0" is supposed to be a no-op.
Unfortunately, it would pin down compression parameters as if they were manually set,
preventing later automatic adaptation.
Unfortunately, I'm currently short of a test case that could check this situation and trigger an error.
Compression parameters selection between tableID 0,1,2,3 is largely internal,
leaving no trace to outside world, not even in frame header.
The new macro might be a bit too restrictive.
Systems which do not support new test will simply default to <time.h>'s `clock_t clock()`,
suffering lesser benchmark accuracy.
Should it matter, the detection macro will have to be upgraded.
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.
Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.
The "proper" formula makes it possible to remove this heuristic.
Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).
It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
Fixes issue where, when `zstd --format=lz4` is fed an input larger than 128KB,
the read overruns the input buffer. This changes Zstd to use LZ4 with chained
64KB blocks. This is technically a breaking change in that some third party
LZ4 implementations may not support linked blocks. However, progress should not
be allowed to be stopped by such petty concerns as backwards compatibility!
adapt accuracy depending on value.
makes it possible to have higher accuracy for small value,
notably small compression speed.
This capability is expected to be useful while modifying optimal parser.
Currently, all files are joined by default,
they are compressed separately but benchmarked together,
providing a single final result.
Benchmarking files separately make it possible to accurately measure difference for each file.
This is expected to be useful while tuning optimal parser.
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)
It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).
warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.
This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.
It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
we lose a warning message :
when a job size is chosen < minimum job size for multithreading,
it is automatically resized to minimum size.
If this information is really useful, it should be present in zstd.h now.
removed the other 2 code paths (single thread, and ZSTDMT ones)
keeping only the new advanced API, for easier code coverage.
It shall also fix identified issue with Visual Studio
which doesn't have ZSTD_NEWAPI defined.
UTIL_getFileSize() used to return zero on failure.
This made it impossible to distinguish a failure from a genuine empty file.
Both cases where coalesced.
Adding UTIL_FILESIZE_UNKNOWN constant has many consequences on user code,
since in many places, the `0` was assumed to mean "error".
This is no longer the case, and the error code must be actively checked.
It was multiple reasons stacked :
- Visual use a different code path, because ZSTD_NEWAPI is not defined
- fileio.c sends `0` as `pledgedSrcSize` to mean `ZSTD_CONTENTSIZE_UNKNOWN` (fixed)
- ZSTDMT_resetCCtx() interpreted `0` as "empty" instead of "unknown" (fixed)
when determining compression parameters
to compress one file only.
For multiple files, it still "bets" that files are going to be small.
There was also a bug recently added in ZSTD_CCtx_loadDictionary_advanced()
making it incapable to use pledgedSrcSize to determine compression parameters.
It's not good to mix old and new API
ZSTD_resetCStream() doesn't just set pledgedSrcSize :
it also sets the CCtx for a single thread compression.
Problem is, when 2+ threads are defined in cctx->requestedParams,
ZSTD_compress_generic() will want to start MT compression,
since initialization is supposed to have already happened (thanks to ZSTD_resetCStream())
except that the underlying ZSTDMT_CCtx* object is not created,
resulting in a segfault.
This is an invalid construction
(correct one is to use ZSTD_CCtx_setPledgedSrcSize()).
I haven't found a nice way to mitigate this impact if someone makes the same mistake.
At some point, removing the old API to keep only the new API within fileio.c will limit these risks.
srcSize is read and provided at each file, not at resource creation.
This used to be useful with older API, because it could not re-adapt parameters between sessions.
At some point, it will be better to remove the old code, and only keep the new_api.
It works fine by now.
fixes#874 :
when a frame is not properly terminated by a "last block" signal,
zstd -d used to detect it immediately and error out.
This version will decode and flush the last block, and only then issue an error.
* Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail
on my Mac.
* Maximum window size in 64-bit mode is 2GB, since that is the largest
power of 2 that works with the overflow prevention.
* Allow `--long=windowLog` to set the window log, along with
`--zstd=wlog=#`. These options also set the window size during
decompression, but don't override `--memory=#` if it is set.
* Present a helpful error message when the window size is too large during
decompression.
* The long range matcher defaults to a hash log 7 less than the window log,
which keeps it at 20 for window log 27.
* Keep the default long range matcher window size and the default maximum
window size at 27 for the API and CLI.
* Add tests that use the maximum window size and hash size for compression
and decompression.
Simple makefile change + quick typename change
Test:
make clean
make
# successfully produces binary without lz4 support
make clean
# with flags to pick up my lz4 build
make MOREFLAGS="-L/home/felixh/prog/lz4/lib -I/home/felixh/prog/lz4/lib"
# successfully produces binary with lz4 support
echo "TEST TEST TEST THIS IS A TEST STRING PLEASE TEST THIS PLEASE OK THANK YOU" | \
./lz4/lz4 | \
LD_LIBRARY_PATH=/home/felixh/prog/lz4/lib ./zstd/zstd -d
# successfully prints TEST TEST TEST THIS IS A TEST STRING PLEASE TEST THIS PLEASE OK THANK YOU
for easier invocation.
- no longer expose frequency timer :
it's either useless, or stored internally in a static variable (init is only necessary once).
- UTIL_getTime() provides result by function return.
The timer used was only accurate up to 0.01 seconds. This timer is accurate up to 1 ns.
It is a monotonic timer that measures the real time difference, not on CPU time.
file-information is dependent on decompression functions.
it should only be enabled when ZSTD_NODECOMPRESS is not set.
also : added zstd-compress compilation test into `make shortest`
Doesn't speed optimize this buffer-to-buffer scenario yet.
Still internally defers to streaming implementation.
Also : fixed a long standing bug in ZSTDMT streaming API.
It makes it more difficult to directly cast the result of a function,
requiring to store the result in an intermediate variable.
It does not necessarily help readability,
and this restriction can be difficult to overcome in some constructions,
like some macros.
also : fixed minor Visual conversion warnings in datagencli.c
* upstream/dev: (305 commits)
added test for ZSTD_estimateCStreamSize()
changed variable name, for clarity
fixed ZSTD_estimateCStreamSize()
shortened ZSTD_createCStream_Advanced()
fixed symbols test
added ZSTD_estimateDStreamSize()
changed name frameParams into frameHeader
regroup memory usage function declarations
separated ZSTD_estimateCStreamSize() from ZSTD_estimateCCtxSize()
bumped version number
added ZSTD_estimateCDictSize() and ZSTD_estimateDDictSize()
Updated ZSTD_freeCCtx()
updated ZSTD_estimateCCtxSize()
Updated ZSTD_sizeof_CCtx()
merged CCtx and CStream as a single same object
cli : -d and -t do not stop after a failed decompression
added dev branch CircleCI badge
added dev branch Appveyor badge
keep dev branch status only
creates a binary archive without the `programs` directory
...
It inflates binary sizes, which is negative for the Windows build.
It also makes it impossible to check if 2 different source codes
get nonetheless compiled to the same binary,
since checksum will be different, due to integrated source code.
It now only uses compressionParameters as argument.
It produces many changes throughout user code,
though hopefully they tend to be simple :
just provide the cParams part from existing ZSTD_parameters.
Some programs might depend on ZSTD_createCDict_advanced() to pass frame parameters.
This change will force them to revisit this strategy and fix it,
since frame parameters are effectively silently ignored in current version.
makes it more explicit that it allocates a buffer
and that it's meant to be used for dictionary.
Also : simplified function a bit,
now only works for dictionaries up to DICTSIZE_MAX
now works with the `=` variant, which is the recommended one.
Old variant `--dictID #` still works, for compatibility with existing scripts.
Long term objective is to remove the old variant..
* If zlib/lzma isn't in the usual spot, it won't be used,
even if `$CFLAGS` and `$LDFLAGS` add the location it is in.
* Update the test code snippets to not trigger any warnings.
zstd-internal was intended to be a helper target, but it doesn't help
at all, what it does in practice is a useless rebuild of zstd every time
"make zstd" is invoked.
Fixes: 030ac243a0 ("Changed Makefile to generate zstd with .gz support by default")
when zstd is called through a link named gzip, gunzip or gzcat,
provides the same behavior as the related program.
gzip compresses using --format=gz
both gzip and gunzip enable --rm by default
Replace fseek() in FIO_fwriteSparse() and FIO_fwriteSparseEnd() with macro expanding to 64-bit fseek version provided by the platform (includes fallback workaround using Win32 API).
* struct _stat64 is not defined by (non-w64) MinGW releases, __stat64 should be everywhere
* proper detection of _stat64() availability (as in MinGW sys/stat.h)
- Add ZSTD_findDecompressedSize
- Traverses multiple frames to find total output size
- Add ZSTD_getFrameContentSize
- Gets the decompressed size of a single frame by reading header
- Deprecate ZSTD_getDecompressedSize
the minimum size condition size is applied transparently (no warning, no error)
like previous minimum section size condition (1 KB) which still applies.
There is a large buffering effect when using zstdmt in MT mode.
Consequently, data is read first, pushed to workers,
and only later will the compressed result come out.
That means there is no longer immediate correlation
between amount of data read, and amount of data written.
This patch disables the displaying of % compression
when multi-threading is enabled.
It adds the displaying of total size when it can be determined
(it usually can be determined for files, but not for stdin)
so the user has a sense of "how far from the end" the compression compressed is.
There is no modification to decompression side,
since decompression is only single-threaded for now.
fileio.c was continually pushing more content without giving a chance to flush compressed one.
It would block the job queue when input data was accumulated too fast (requiring to define many threads).
Fixed : fileio flushes whatever it can after each input attempt.
By default, section sizes are 4x window size.
This new setting allow manual selection of section sizes.
The larger they are, the (slightly) better the compression ratio,
but also the higher the memory allocation cost,
and eventually the lesser the nb of possible threads,
since each section is compressed by a single thread.
It also introduces a prototype to set generic parameters,
ZSTDMT_setMTCtxParameter()
The idea is that it's possible to add enums
to extend the list of parameters that can be set this way.
This is more long-term oriented than a fixed-size struct.
Consider it as a test.
Also : fixed corner case, where nb of jobs completed becomes > jobQueueSize
which is possible when many flushes are issued
while there is not enough dst buffer to flush completed ones.
The new strategy involves cutting frame at block level.
The result is a single frame, preserving ZSTD_getDecompressedSize()
As a consequence, bench can now make a full round-trip,
since the result is compatible with ZSTD_decompress().
This strategy will not make it possible to decode the frame with multiple threads
since the exact cut between independent blocks is not known.
MT decoding needs further discussions.