Commit Graph

2136 Commits

Author SHA1 Message Date
Bimba Shrestha
36528b96c4 Manually moving instead of memcpy on decoder and using genBuffer() 2019-10-03 09:26:51 -07:00
Bimba Shrestha
61ec4c2e7f Cleaning sequence parsing logic 2019-10-03 06:42:40 -07:00
Bimba Shrestha
c04245b257 Replacing assert with memory_allocation error code throw 2019-09-23 15:42:16 -07:00
Bimba Shrestha
be0bebd24e Adding test and null check for malloc 2019-09-23 15:08:18 -07:00
Nick Terrell
5cb7615f1f Add UNUSED_ATTR to ZSTD_storeSeq() 2019-09-20 21:37:13 -07:00
Nick Terrell
5dc0a1d659 HINT_INLINE ZSTD_storeSeq()
Clang on Mac wasn't inlining `ZSTD_storeSeq()` in level 1, which was
causing a 5% performance regression. This fixes it.
2019-09-20 16:39:27 -07:00
Bimba Shrestha
f3c4fd17e3 Passing in dummy dst buffer of compressbound(srcSize) 2019-09-20 15:50:58 -07:00
Nick Terrell
44c65da97e Remove literals overread in ZSTD_storeSeq() for ~neutral perf 2019-09-20 12:23:25 -07:00
Nick Terrell
fde217df04 Fix bounds check in ZSTD_storeSeq() 2019-09-20 08:25:12 -07:00
Nick Terrell
67b1f5fc72 Fix too strict assert 2019-09-20 01:23:35 -07:00
Nick Terrell
ddab2a94e8 Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy() 2019-09-20 00:56:20 -07:00
Nick Terrell
efd37a64ea Optimize decompression and fix wildcopy overread
* Bump `WILDCOPY_OVERLENGTH` to 16 to fix the wildcopy overread.
* Optimize `ZSTD_wildcopy()` by removing unnecessary branches and
  unrolling the loop.
* Extract `ZSTD_overlapCopy8()` into its own function.
* Add `ZSTD_safecopy()` for `ZSTD_execSequenceEnd()`. It is
  optimized for single long sequences, since that is the important
  case that can end up in `ZSTD_execSequenceEnd()`. Without this
  optimization, decompressing a block with 1 long match goes
  from 5.7 GB/s to 800 MB/s.
* Refactor `ZSTD_execSequenceEnd()`.
* Increase the literal copy shortcut to 16.
* Add a shortcut for offset >= 16.
* Simplify `ZSTD_execSequence()` by pushing more cases into
  `ZSTD_execSequenceEnd()`.
* Delete `ZSTD_execSequenceLong()` since it is exactly the
  same as `ZSTD_execSequence()`.

clang-8 seeds +17.5% on silesia and +21.8% on enwik8.
gcc-9 sees +12% on silesia and +15.5% on enwik8.

TODO: More detailed measurements, and on more datasets.

Crdit to OSS-Fuzz for finding the wildcopy overread.
2019-09-19 21:07:14 -07:00
Bimba Shrestha
ae6d0e64ae Addressing comments 2019-09-19 15:25:20 -07:00
Yann Collet
bfff5b30a4
Merge pull request #1756 from mgrice/dev
Improvements in zstd decode performance
2019-09-18 11:35:50 -07:00
Yann Collet
243200e5bf minor refactor of ZSTD_fast
- reduced variables lifetime
- more accurate code comments
2019-09-17 14:02:57 -07:00
Bimba Shrestha
76fea3fb99 Resolving appveyor test failure implicit conversion 2019-09-16 14:02:23 -07:00
Bimba Shrestha
a874435478
Merge branch 'dev' into extract_sequences_api 2019-09-16 13:29:59 -07:00
Bimba Shrestha
bff6072e3a Bailing early when collecting sequences and documentation 2019-09-16 08:26:21 -07:00
W. Felix Handte
20c69077d1 Shrink Table Valid End During Alloc Alignment / Phase Change 2019-09-11 17:14:59 -04:00
W. Felix Handte
51d90668ba Add Assertions to Confirm that Workspace Pointers are Correctly Ordered 2019-09-11 17:14:59 -04:00
W. Felix Handte
a10c191613 __msan_poison() Workspace When Preparing for Re-Use 2019-09-11 17:14:45 -04:00
W. Felix Handte
7c57e2b9ca Zero h3size When h3log is 0
This led to a nasty edgecase, where index reduction for modes that don't use
the h3 table would have a degenerate table (size 4) allocated and marked clean,
but which would not be re-indexed.
2019-09-11 13:14:26 -04:00
W. Felix Handte
bc020eec92 Also Shrink Clean Table Area When Reducing Indices 2019-09-11 11:40:57 -04:00
W. Felix Handte
1999b2ed9b Update DEBUGLOG Statements 2019-09-11 11:21:00 -04:00
W. Felix Handte
13e29a56de Shrink Clean Table Area When Copying Table Contents into Context
The source matchState is potentially at a lower current index, which means
that any extra table space not overwritten by the copy may now contain
invalid indices. The simple solution is to unconditionally shrink the valid
table area to just the area overwritten.
2019-09-11 11:18:45 -04:00
W. Felix Handte
edb3ad053e Comments 2019-09-10 18:25:45 -04:00
W. Felix Handte
f31ef28ff8 Only Reset Indexing in ZSTD_resetCCtx_internal() When Necessary 2019-09-10 18:25:45 -04:00
W. Felix Handte
9968a53e91 Remove No-Longer-Used Continuation Functions 2019-09-10 18:25:45 -04:00
W. Felix Handte
1b28e80416 Remove Fast Continue Path in ZSTD_resetCCtx_internal() 2019-09-10 18:25:45 -04:00
W. Felix Handte
ad16eda5e4 ZSTD_reset_matchState Optionally Doesn't Restart Indexing 2019-09-10 18:25:45 -04:00
W. Felix Handte
5b10bb5ec3 Rename ZSTD_compResetPolicy_e Values and Add Comment 2019-09-10 18:25:45 -04:00
W. Felix Handte
0492b9a9ec Accept ZSTD_indexResetPolicy_e Param in ZSTD_reset_matchState() 2019-09-10 18:25:45 -04:00
W. Felix Handte
14c5471d5e Introduce ZSTD_indexResetPolicy_e Enum 2019-09-10 18:25:45 -04:00
W. Felix Handte
17b6da2e0f Track Usable Table Space in Compression Workspace 2019-09-10 18:25:37 -04:00
Yann Collet
22bd158e0f
Merge pull request #1712 from felixhandte/workspace-efficiency-2
Allocate Internal Buffers via Workspace Abstraction
2019-09-10 15:20:29 -07:00
Bimba Shrestha
1407919d13 Addressing comments on parsing 2019-09-10 15:10:50 -07:00
Bimba Shrestha
47199480da Cleaning up parsing per suggestion 2019-09-10 13:18:59 -07:00
W. Felix Handte
a9d373f093 Remove Empty lib/compress/zstd_cwksp.c 2019-09-10 16:03:13 -04:00
Yann Collet
41416f0927
Merge pull request #1773 from bimbashrestha/rle_first_block_decompression_fix
Removing redundant condition in decompression, making first block rle…
2019-09-10 11:17:29 -07:00
Bimba Shrestha
e3c5825918 Fizing litLength == 0 case 2019-09-10 10:38:13 -07:00
Bimba Shrestha
9e7bb55e14 Addressing comments 2019-09-09 20:04:46 -07:00
W. Felix Handte
81208fd7c2 Forward Declare ZSTD_cwksp_available_space to Fix Build 2019-09-09 19:10:09 -04:00
W. Felix Handte
91bf1babd1 Inline Workspace Functions 2019-09-09 18:53:53 -04:00
W. Felix Handte
0db3ffe7ee Forward resetCCtx Errors when Using CDict 2019-09-09 16:47:19 -04:00
W. Felix Handte
eb6f69d978 Fix sizeof_CCtx and sizeof_CDict Calculations for Statically Init'ed Objects 2019-09-09 16:45:17 -04:00
W. Felix Handte
e3703825a8 Fix workspaceTooSmall Calculation 2019-09-09 15:12:14 -04:00
W. Felix Handte
0a65a67901 Shorten &zc->workspace -> ws in ZSTD_resetCCtx_internal() 2019-09-09 14:59:09 -04:00
W. Felix Handte
1120e4d962 Clean Up TODOs and Comments pt. II 2019-09-09 14:04:39 -04:00
W. Felix Handte
c60e1c3be5 Nit 2019-09-09 13:34:08 -04:00
W. Felix Handte
7d7b665c90 Pull Phase Advance Logic Out into Internal Function 2019-09-09 13:34:08 -04:00
W. Felix Handte
8549ae9f1d Hide Workspace Movement Behind Helper Function 2019-09-09 13:34:08 -04:00
W. Felix Handte
2405c03bcd Fix DEBUGLOG Statement Levels 2019-09-09 13:34:08 -04:00
W. Felix Handte
7100d24221 Fix Rescale Continue Special Case 2019-09-09 13:34:08 -04:00
W. Felix Handte
7321e4c9f3 Remove Unused noRealloc CRP Value 2019-09-09 13:34:08 -04:00
W. Felix Handte
901bba4ca6 Re-Implement Workspace Shrinking when Oversized 2019-09-09 13:34:08 -04:00
W. Felix Handte
881bcd80ca Cleanup from Move 2019-09-09 13:34:08 -04:00
W. Felix Handte
b511a84adc Move Workspace Functions to Their Own File 2019-09-09 13:34:08 -04:00
W. Felix Handte
077a2d7dc9 Rename 2019-09-09 13:34:08 -04:00
W. Felix Handte
ebd162194f Clean Up TODOs and Comments 2019-09-09 13:34:08 -04:00
W. Felix Handte
2abe0145b1 Improve Comments a Bit 2019-09-09 13:34:08 -04:00
W. Felix Handte
7a2416a863 Allocate CDict in Workspace (Rather than in Separate Allocation) 2019-09-09 13:34:08 -04:00
W. Felix Handte
65057cf009 Rewrite ZSTD_initStaticCCtx to Alloc CCtx in Workspace 2019-09-09 13:34:08 -04:00
W. Felix Handte
58b69ab15c Only the CCtx Itself Needs to be Cleared during Static CCtx Init 2019-09-09 13:34:08 -04:00
W. Felix Handte
88c2fcd0ee Align Alloc Pointer When Transitioning from Buffers to Aligned Allocs 2019-09-09 13:34:08 -04:00
W. Felix Handte
e936b73889 Remove Overly-Restrictive Assert 2019-09-09 13:34:08 -04:00
W. Felix Handte
75d574368b When Loading Dict By Copy, Always Put it in the Workspace 2019-09-09 13:34:08 -04:00
W. Felix Handte
e69b67e33a Alloc Tables Separately 2019-09-09 13:34:08 -04:00
W. Felix Handte
6177354b36 Begin Introducing Phases 2019-09-09 13:34:08 -04:00
W. Felix Handte
786f2266bb TMP 2019-09-09 13:34:08 -04:00
W. Felix Handte
c25283cf00 Disambiguate 'workspace' and 'entropyWorkspace' 2019-09-09 13:34:08 -04:00
W. Felix Handte
ccaac852e8 Normalize Case 'workSpace' -> 'workspace' 2019-09-09 13:27:18 -04:00
Bimba Shrestha
44e122053b Mentioning cli only in the comment as suggested 2019-09-06 14:48:41 -07:00
Bimba Shrestha
a917cd597d Put back omission for first rle block and updated comment as suggested 2019-09-06 13:44:25 -07:00
Bimba Shrestha
d687d603e4 Removing redundant condition in decompression, making first block rles valid to deocmpress 2019-09-06 10:46:19 -07:00
Varun S Nair
9816560649 Fixing assert and DEBUGLOG due to ZSTD_CCtx_params parameter change to const pointer 2019-09-05 15:47:17 +05:30
Varun S Nair
771645471f Passing ZSTD_CCtx_params by const pointer 2019-09-05 15:28:30 +05:30
Bimba Shrestha
5f8b0f6890 Changing api to get sequences across all blocks 2019-08-30 09:18:44 -07:00
Yann Collet
5198347382
Merge pull request #1744 from bimbashrestha/dev
Generate RLE blocks in the encoder
2019-08-29 15:19:10 -07:00
Bimba Shrestha
623b90f85d Fixing ci-circle test complaints 2019-08-29 13:09:42 -07:00
Bimba Shrestha
ece465644b Adding api for extracting sequences from seqstore 2019-08-29 12:29:39 -07:00
mgrice
b830599582 Improvements in zstd decode performance
Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3).  This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.

Speedup on Xeon E5-2680v4 is in the range of 3-5%.

Measured wildcopy length distributions on silesia.tar:

level	<=8	<=16	<=24	>24
1	78.05%	11.49%	3.52%	6.94%
3	82.14%	8.99%	2.44%	6.43%
6	85.81%	6.51%	2.92%	4.76%
8	83.02%	7.31%	3.64%	6.03%
10	84.13%	6.67%	3.29%	5.91%
15	77.58%	7.55%	5.21%	9.66%
16	80.07%	7.20%	3.98%	8.75%

Test Plan: benchmark silesia, make check
2019-08-29 12:25:56 -07:00
Bimba Shrestha
c3e3c8bf32 Undoing the last commit (that was an accident) 2019-08-29 12:05:47 -07:00
bimbashrestha
4a1ca5e0a8 Adding method for extracting sequences. 2019-08-29 11:55:12 -07:00
bimbashrestha
e5704bbfdf Added test for multiple blocks of zeros and fixed nit about comments 2019-08-28 08:32:34 -07:00
bimbashrestha
96201d9774 Added bool to cctx and fixed some comment nits 2019-08-26 15:30:41 -07:00
bimbashrestha
991cbc9024 Fixing mixed declaration compiler complaint 2019-08-26 15:00:50 -07:00
bimbashrestha
ce264ce53b Forbiding emission of RLE when its the first block 2019-08-26 14:54:29 -07:00
bimbashrestha
33b6446ca7 Removing accidental method call 2019-08-26 14:34:43 -07:00
bimbashrestha
7b041b552e Removing assert for rle that doesn't always hold 2019-08-26 12:26:53 -07:00
bimbashrestha
1f2bf77f2a Using typedef U32 instead of int 2019-08-26 09:00:22 -07:00
bimbashrestha
ba46932492 Removing implicit conversion from const void* to const BYTE* and added constant for threshold 2019-08-26 08:51:34 -07:00
bimbashrestha
0e3ba02cf1 Fixing more test falure errors 2019-08-22 13:54:41 -07:00
bimbashrestha
4faf3a5911 Fixing ci-circle test failure issues 2019-08-22 13:46:15 -07:00
bimbashrestha
cba5350f88 Moving RLE logic to inside ZSTD_compressBlock_internal and adding assert 2019-08-22 12:12:44 -07:00
Nick Magerko
493f95c7df Fix merge conflicts 2019-08-22 11:51:41 -07:00
bimbashrestha
4c90d862e3 Generate RLE blocks in the encoder 2019-08-22 11:27:20 -07:00
Nick Magerko
c7a24d7a14 Define ZSTD_SRCSIZEHINT_MIN as 0 2019-08-20 13:06:15 -07:00
Nick Magerko
2d39b43906 Use int for srcSizeHint when sensible 2019-08-19 16:49:25 -07:00
Nick Magerko
edf2abf106 Fix fall-through case 2019-08-19 12:32:43 -07:00
Nick Magerko
dffbac5f89 Add --size-hint=# option 2019-08-19 11:38:49 -07:00
Yann Collet
782bfb858a fixed very minor inefficiency (nbSeq==127)
The nbSeq "short" format (1-byte)
is compatible with any value < 128.

However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.

Spotted by Intel's Smita Kumar.
2019-08-15 16:41:34 +02:00
Yann Collet
facbe8b2c2 factored the logic selecting lowest match index
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
Yann Collet
0b0b83e8f3 fix test 122
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet
98e7c344cd fixed strategies btopt+ 2019-08-02 14:42:53 +02:00
Yann Collet
b4257b04e7 fixed strategy btlazy2 2019-08-02 14:26:26 +02:00
Yann Collet
5cf1b24aca fixed strategies greedy, lazy & lazy2
restore dictionary compression ratio
2019-08-02 14:21:39 +02:00
Yann Collet
98692c2838 fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3 2019-08-01 15:58:17 +02:00
Yann Collet
be3d2e2de8
Merge pull request #1679 from ephiepark/dev
Restructure the source files
2019-07-19 15:29:07 -07:00
Ephraim Park
1dc98de279 Restructure the source files 2019-07-15 17:39:18 -07:00
Yann Collet
8fb08b68cc
Merge pull request #1681 from facebook/level3
updated double_fast complementary insertion
2019-07-12 16:16:06 -07:00
Nick Terrell
75cfe1dc69
[ldm] Fix bug in overflow correction with large job size (#1678)
* [ldm] Fix bug in overflow correction with large job size

* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

* [test] Add test that exposes the bug

Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
2019-07-12 18:45:18 -04:00
Yann Collet
eaeb7f00b5 updated the _extDict variant of double fast 2019-07-12 14:17:17 -07:00
Yann Collet
e8a7f5d3ce double-fast: changed the trade-off for a smaller positive change
same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).
2019-07-12 11:34:53 -07:00
mgrice
812e8f2a16 perf improvements for zstd decode (#1668)
* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
 via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Yann Collet
d1327738c2 updated double_fast complementary insertion
in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).

More details in the PR.
2019-07-11 15:25:22 -07:00
Yann Collet
b01c1c679f
Merge pull request #1675 from ephiepark/dev
Factor out the logic to build sequences
2019-07-10 13:32:31 -07:00
Yann Collet
096714d1b8
Merge pull request #1671 from ephiepark/dev
Adding targetCBlockSize param
2019-07-03 17:47:44 -07:00
Ephraim Park
f57ac7b09e Factor out the logic to build sequences 2019-07-03 15:42:38 -07:00
Ephraim Park
9007701670 Adding targetCBlockSize param 2019-07-03 15:41:52 -07:00
Nick Terrell
6c92ba774e
ZSTD_compressSequences_internal assert op <= oend (#1667)
When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.
2019-07-02 15:45:47 -07:00
Yann Collet
857e608b51
Merge pull request #1658 from facebook/memset
memset() rather than reduceIndex()
2019-07-01 15:01:43 -07:00
Yann Collet
621adde3b2 changed naming to ZSTD_indexTooCloseToMax()
Also : minor speed optimization :
shortcut to ZSTD_reset_matchState() rather than the full reset process.
It still needs to be completed with ZSTD_continueCCtx() for proper initialization.

Also : changed position of LDM hash tables in the context,
so that the "regular" hash tables can be at a predictable position,
hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.
2019-06-24 14:39:29 -07:00
Yann Collet
45c9fbd6d9 prefer memset() rather than reduceIndex() when close to index range limit
by disabling continue mode when index is close to limit.
2019-06-21 16:19:21 -07:00
Yann Collet
944e2e9e12 benchfn : added macro macro CONTROL()
like assert() but cannot be disabled.
proper separation of user contract errors (CONTROL())
and invariant verification (assert()).
2019-06-21 15:58:55 -07:00
Nick Terrell
674534a700 [zstd] Fix data corruption in niche use case
* Extract the overflow correction into a helper function.
* Load the dictionary `ZSTD_CHUNKSIZE_MAX = 512 MB` bytes at a time
  and overflow correct between each chunk.

Data corruption could happen when all these conditions are true:

* You are using multithreading mode
* Your overlap size is >= 512 MB (implies window size >= 512 MB)
* You are using a strategy >= ZSTD_btlazy
* You are compressing more than 4 GB

The problem is that when loading a large dictionary we don't do
overflow correction. We can only load 512 MB at a time, and may
need to do overflow correction before each chunk.
2019-06-21 15:47:31 -07:00
Nick Terrell
4156060ca4 [zstdmt] Update assert to use ZSTD_WINDOWLOG_MAX 2019-06-21 15:39:33 -07:00
Nick Terrell
95e2b430ea [opt] Add asserts for corruption in ZSTD_updateTree() 2019-06-21 15:22:29 -07:00
Yann Collet
9af909bf35
Merge pull request #1624 from facebook/smallwlog
Improves compression ratio for small windowLog
2019-06-14 17:28:21 -07:00
Nick Terrell
cdb9481e38 [libzstd] Optimize ZSTD_insertBt1() for repetitive data
We would only skip at most 192 bytes at a time before this diff.
This was added to optimize long matches and skip the middle of the
match. However, it doesn't handle the case of repetitive data.

This patch keeps the optimization, but also handles repetitive data
by taking the max of the two return values.

```
> for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 | command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done
strategy=1
0.27
strategy=2
0.23
strategy=3
0.27
strategy=4
0.43
strategy=5
0.56
strategy=6
0.43
strategy=7
0.34
strategy=8
0.34
strategy=9
0.35
```

At level 19 with multithreading the compressed size of `silesia.tar` regresses 300 bytes, and `enwik8` regresses 100 bytes.
In single threaded mode `enwik8` is also within 100 bytes, and I didn't test `silesia.tar`.

Fixes Issue #1634.
2019-06-05 20:34:00 -07:00
Yann Collet
80d6ccea79 removed UINT32_MAX
apparently not guaranteed on all platforms,
replaced by UINT_MAX.
2019-05-31 17:27:07 -07:00
Yann Collet
fce4df3ab7 fixed wrong assert in double_fast 2019-05-31 17:06:28 -07:00
Yann Collet
a968099038 minor code cleaning for new index invalidation strategy 2019-05-31 16:52:37 -07:00
Yann Collet
d605f482c7 make double_fast compatible with new index invalidation strategy 2019-05-31 16:50:04 -07:00
Yann Collet
a30febaeeb Made fast strategy compatible with new offset validation strategy
fast mode does the same thing as before :
it pre-emptively invalidates any index that could lead to offset > maxDistance.
It's supposed to help speed.

But this logic is performed inside zstd_fast,
so that other strategies can select a different behavior.
2019-05-31 16:34:55 -07:00
Yann Collet
58adb1059f extended exact window size to greedy/lazy modes 2019-05-31 16:08:48 -07:00
Yann Collet
bc601bdc6d first implementation of small window size for btopt
noticeably improves compression ratio
when window size is small (< 18).

enwik7	level 19

windowLog	`dev`	`smallwlog`	improvement
23	3.577	3.577	0.02%
22	3.536	3.538	0.06%
21	3.462	3.467	0.14%
20	3.364	3.377	0.39%
19	3.244	3.272	0.86%
18	3.110	3.166	1.80%
17	2.843	3.057	7.53%
16	2.724	2.943	8.04%
15	2.594	2.822	8.79%
14	2.456	2.686	9.36%
13	2.312	2.523	9.13%
12	2.162	2.361	9.20%
11	2.003	2.182	8.94%
2019-05-31 15:55:12 -07:00
Yann Collet
b13a9207f9
Merge pull request #1623 from facebook/fullbench
fullbench minor improvements
2019-05-31 14:40:19 -07:00
Yann Collet
ed38b645db fullbench: pass proper parameters in scenario 43 2019-05-29 15:26:06 -07:00
Yann Collet
9719fd616c removed nextToUpdate3 from ZSTD_window
it's now a local variable of ZSTD_compressBlock_opt()
2019-05-28 16:18:12 -07:00
Yann Collet
33dabc8c80 get bt matches : made it a bit clearer which parameters are input and output 2019-05-28 16:11:32 -07:00
Yann Collet
327cf6fac1 nextToUpdate3 does not need to be maintained outside of zstd_opt.c
It's re-synchronized with nextToUpdate at beginning of each block.
It only needs to be tracked from within zstd_opt block parser.

Made the logic clear, so that no code tried to maintain this variable.

An even better solution would be to make nextToUpdate3
an internal variable of ZSTD_compressBlock_opt_generic().
That would make it possible to remove it from ZSTD_matchState_t,
thus restricting its visibility to only where it's actually useful.

This would require deeper changes though,
since the matchState is the natural structure to transport parameters into and inside the parser.
2019-05-28 15:26:52 -07:00
Yann Collet
6453f8158f complementary code comments
on variables used / impacted during maxDist check
2019-05-28 14:12:16 -07:00
Yann Collet
4baecdf72a added comments to better understand enforceMaxDist() 2019-05-28 13:15:48 -07:00
Nick Terrell
a17fe4c9e5 [visual] Fix unreachable code warning 2019-04-16 11:32:35 -07:00
Nick Terrell
de0499f7fa [libzstd] Require ZSTD_MULTITHREAD to create a ZSTDMT_CCtx
ZSTDMT was broken when compiled without ZSTD_MULTITHREAD defined,
because `ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, nbWorkerss)`
failed. It was detected by the MSVC test which runs the fuzzer with
multithreading disabled.

This is a very niche use case of a deprecated API, because the API is
inefficient and synchronous, since `threading.h` will be synchronous.
Users almost certainly don't want this, and anyone who tested their code
should realize that it is broken. Therefore, I think it is safe to
require `ZSTD_MULTITHREAD` to be defined to use ZSTDMT.
2019-04-15 23:04:46 -07:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell
48a6427d22 [libzstd] Fix ZSTD_compress2() for multithreaded compression
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.

Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`

* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
  `ZSTD_compressBound()` output space.
2019-04-09 16:24:17 -07:00
Nick Terrell
641e594309 [libzstd] Remove ZSTDMT from the shared object
* Remove ZSTDMT from the shared object by default.
* Provide a macro `ZSTD_LEGACY_MULTITHREADED_API` to override it.
* Document it in `lib/README.md`.
2019-04-07 18:47:52 -07:00
Nick Terrell
72a3fbc0e4
Merge pull request #1562 from terrelln/2fast
[libzstd] Speed up single segment zstd_fast by 5%
2019-04-03 18:08:15 -07:00
Nick Terrell
95624b77e4 [libzstd] Speed up single segment zstd_fast by 5%
This PR is based on top of PR #1563.

The optimization is to process two input pointers per loop.
It is based on ideas from [igzip] level 1, and talking to @gbtucker.

| Platform                | Silesia     | Enwik8 |
|-------------------------|-------------|--------|
| OSX clang-10            | +5.3%       | +5.4%  |
| i9 5 GHz gcc-8          | +6.6%       | +6.6%  |
| i9 5 GHz clang-7        | +8.0%       | +8.0%  |
| Skylake 2.4 GHz gcc-4.8 | +6.3%       | +7.9%  |
| Skylake 2.4 GHz clang-7 | +6.2%       | +7.5%  |

Testing on all Silesia files on my Intel i9-9900k with gcc-8

| Silesia File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| silesia.tar  | +0.17%       | +6.6%        |
| dickens      | +0.25%       | +7.0%        |
| mozilla      | +0.02%       | +6.8%        |
| mr           | -0.30%       | +10.9%       |
| nci          | +1.28%       | +4.5%        |
| ooffice      | -0.35%       | +10.7%       |
| osdb         | +0.75%       | +9.8%        |
| reymont      | +0.65%       | +4.6%        |
| samba        | +0.70%       | +5.9%        |
| sao          | -0.01%       | +14.0%       |
| webster      | +0.30%       | +5.5%        |
| xml          | +0.92%       | +5.3%        |
| x-ray        | -0.00%       | +1.4%        |

Same tests on Calgary. For brevity, I've only included files
where compression ratio regressed or was much better.

| Calgary File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| calgary.tar  | +0.30%       | +7.1%        |
| geo          | -0.14%       | +25.0%       |
| obj1         | -0.46%       | +15.2%       |
| obj2         | -0.18%       | +6.0%        |
| pic          | +1.80%       | +9.3%        |
| trans        | -0.35%       | +5.5%        |

We gain 0.1% of compression ratio on Silesia.
We gain 0.3% of compression ratio on enwik8.
I also tested on the GitHub and hg-commands datasets without a dictionary,
and we gain a small amount of compression ratio on each, as well as speed.

I tested the negative compression levels on Silesia on my
Intel i9-9900k with gcc-8:

| Level | Ratio Change | Speed Change |
|-------|--------------|--------------|
| -1    | +0.13%       | +6.4%        |
| -2    | +4.6%        | -1.5%        |
| -3    | +7.5%        | -4.8%        |
| -4    | +8.5%        | -6.9%        |
| -5    | +9.1%        | -9.1%        |

Roughly, the negative levels now scale half as quickly. E.g. the new
level 16 is roughly equivalent to the old level 8, but a bit quicker
and smaller.  If you don't think this is the right trade off, we can
change it to multiply the step size by 2, instead of adding 1. I think
this makes sense, because it gives a bit slower ratio decay.

[igzip]: https://github.com/01org/isa-l/tree/master/igzip
2019-04-02 19:02:50 -07:00
Nick Terrell
56682a7709 Fix ZSTD_estimateCStreamSize_usingCCtxParams()
It wasn't using the ZSTD_CCtx_params correctly. It must actualize
the compression parameters by calling ZSTD_getCParamsFromCCtxParams()
to get the real window log.

Tested by updating the streaming memory usage example in the next
commit. The CHECK() failed before this patch, and passes after.

I also added a unit test to zstreamtest.c that failed before this
patch, and passes after.
2019-04-01 18:02:52 -07:00
Nick Terrell
f00407b640 Split out zstd_fast dict match state function 2019-03-29 10:39:16 -06:00
Nick Terrell
6b053b9f60 [lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
* After loading a dictionary only create the cdict once we've started the
  compression job. This allows the user to pass the dictionary before they
  set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.
2019-03-21 16:13:53 -07:00
Nick Terrell
e55da9e963 Wrap the new advanced api completely 2019-03-21 10:54:40 -07:00
Nick Terrell
787b76904a [libzstd] Allow compression parameters to be set with a cdict
The order you set parameters in the advanced API is not supposed to matter.
However, once you call `ZSTD_CCtx_refCDict()` the compression parameters
cannot be changed. Remove that restriction, and document what parameters
are used when using a CDict.

If the CCtx is in dictionary mode, then the CDict's parameters are used.
If the CCtx is not in dictionary mode, then its requested parameters are
used.
2019-03-13 16:10:05 -07:00
Nick Terrell
0594e8135b [libzstd] Free local cdict when referencing cdict
We no longer care about the `cdictLocal` after calling
`ZSTD_CCtx_refCDict()`, so we should free it to save some
memory.
2019-03-13 14:54:31 -07:00
Nick Terrell
7ad7ba3178 [libzstd] Rename ZSTD_CCtxParam_* to ZSTD_CCtxParams_* 2019-02-19 17:44:52 -08:00
Nick Terrell
f4abba02ba [libzstd] Clean up parameter code
* Move all ZSTDMT parameter setting code to ZSTD_CCtxParams_*Parameter().
  ZSTDMT now calls these functions, so we can keep all the logic in the
  same place.
* Clean up `ZSTD_CCtx_setParameter()` to only add extra checks where needed.
* Clean up `ZSTDMT_initJobCCtxParams()` by copying all parameters by default,
  and then zeroing the ones that need to be zeroed. We've missed adding several
  parameters here, and it makes more sense to only have to update it if you
  change something in ZSTDMT.
* Add `ZSTDMT_cParam_clampBounds()` to clamp a parameter into its valid
  range. Use this to keep backwards compatibility when setting ZSTDMT parameters,
  which clamp into the valid range.
2019-02-19 13:22:37 -08:00
Nick Terrell
3d7377b874 [libzstd] Handle uncompressed literals 2019-02-15 14:58:11 -08:00
Nick Terrell
f9513115e4 [libzstd] Add ZSTD_c_literalCompressionMode flag
It controls the literals compression. It is either
`auto`, `huffman`, or `uncompressed`. It defaults to
`auto`, which is the current behavior.
2019-02-13 14:59:22 -08:00
W. Felix Handte
501eb25102 Rename FORWARD_ERROR -> FORWARD_IF_ERROR 2019-01-29 12:56:07 -05:00
W. Felix Handte
03e040a966 Replace Uses of CHECK_E with RETURN_ERROR_IF(*_isError(... 2019-01-28 17:33:01 -05:00
W. Felix Handte
64bb6640f2 Replace CHECK_F Uses in zstdmt_compress.c and zstd_ddict.c 2019-01-28 17:15:57 -05:00
W. Felix Handte
cafc3b1bcb Also Convert zstd_compress.c 2019-01-28 17:05:18 -05:00
Yann Collet
f9e4f89252 improved comments for adjustCParams() and getCParams() 2019-01-02 12:18:40 -08:00
Yann Collet
e980ba212f
Merge pull request #1471 from facebook/nofloat
guard functions using floating point for debug mode only
2018-12-23 12:35:51 -08:00
Yann Collet
aae5bc538a
Merge pull request #1470 from facebook/U32
fix confusion between unsigned <-> U32
2018-12-23 12:35:39 -08:00
Yann Collet
c9dfb7e445 guard functions using floating point for debug mode only
they are only used to print debug messages.
Requested in #1386,
2018-12-22 09:09:40 -08:00
Yann Collet
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
c8d1fda982 update aarch64 test to xenial
in an attempt to circumvent the `ld` bug
2018-12-21 15:08:48 -08:00
Yann Collet
8f35c7f94c
Merge pull request #1466 from facebook/noDictPresent
fixed : better error message
2018-12-20 19:01:27 -08:00
Yann Collet
41b45b84a1
Merge pull request #1465 from facebook/noFilePresent
fixed : detection of non-existing file
2018-12-20 17:21:04 -08:00
Yann Collet
ed2fb6bd57 fixed : better error message when dictionary missing
during benchmark.
Also : refactored ZSTD_fillHashTable(),
just for readability (it does the same thing)
2018-12-20 17:20:07 -08:00
Yann Collet
95784c654c fixed shadowing of stat variable
some standard lib declares a `stat` variable at global scope
shadowing local declarations ....
2018-12-20 14:56:44 -08:00
Yann Collet
2898afab52 fixed OSSfuzz 11849
The problem was already masked,
due to no longer accepting tiny blocks for statistics.

But in case it could still happen with not-so-tiny blocks,
there is a stricter control which ensures that
nothing was already loaded prior to statistics collection.
2018-12-19 16:54:15 -08:00
Yann Collet
8e0e495ce8 fixed: compression ratio discrepancy
depending on initialization,
the first byte of a new frame was invalidated or not.

As a consequence, one match opportunity was available or not,
resulting in slightly different compressed sizes
(on average, 1 or 2 bytes once every 20 frames).

It impacted ratio comparison between one-shot and streaming modes.

This fix makes the first byte of a new frame always a valid match.
Now compressed size is always the same.
It also improves compressed size by a negligible amount.
2018-12-19 10:11:06 -08:00
Yann Collet
d0e15f8d32
Merge pull request #1458 from terrelln/estimate
[libzstd] Fix estimate with negative levels
2018-12-18 15:12:21 -08:00
Yann Collet
04baecaeed
Merge pull request #1457 from facebook/btultra2.1
btultra2 and very small input
2018-12-18 14:46:55 -08:00
Nick Terrell
d7def456d8 [libzstd] Fix estimate with negative levels
* Fix `ZSTD_estimateCCtxSize()` with negative levels.
* Fix `ZSTD_estimateCStreamSize()` with negative levels.
* Add a unit test to test for this error.
2018-12-18 14:24:49 -08:00
Yann Collet
ef984e7307 fix debug levels
as reported by @terrelln.
2 is reserved for temporary usage only.
2018-12-18 13:40:07 -08:00
Yann Collet
635783da12 btultra2 and very small srcSize
When srcSize is small,
the nb of symbols produced is likely too small to warrant dedicated probability tables.
In which case, predefined distribution tables will be used instead.

There is a cheap algorithm in btultra initialization :
it presumes default distribution will be used if srcSize <= 1024.

btultra2 now uses the same threshold to shut down probability estimation,
since measured frequencies won't be used at entropy stage,
and therefore relying on them to determine sequence cost is misleading,
resulting in worse compression ratios.

This fixes btultra2 performance issue on very small input.

Note that, a proper way should be
to determine which symbol is going to use predefined probaility
and which symbol is going to use dynamic ones.
But the current algorithm is unable to make a "per-symbol" decision.
So this will require significant modifications.
2018-12-18 12:32:58 -08:00
Yann Collet
373ff8b983 play around with rescale weights 2018-12-17 15:48:34 -08:00
Yann Collet
8be145a8c1 fixed default job size 2018-12-13 16:38:08 -08:00
Yann Collet
62180b27d5 zstdmt parameter getter/setter use int 2018-12-13 15:47:34 -08:00
Yann Collet
34f01e600f fixed multiple conversions
from 64-bit to 32-bit
2018-12-13 14:02:22 -08:00
Yann Collet
f2f86d369b Merge branch 'btultra2' into ovlog_def 2018-12-12 20:58:14 -08:00
Yann Collet
9a92ed401d updated compression results.csv
and fixed nit
2018-12-12 20:30:09 -08:00
Yann Collet
9792acda3b Merge branch 'dev' into btultra2 2018-12-12 20:18:27 -08:00
Yann Collet
7bb8dfc62f new overlapLog default values
varies between 6 and 9, depending on strategy
2018-12-11 18:10:29 -08:00
Yann Collet
eee789b7ea continued: changed to overlapLog
in deeper code layer.
for consistency.
2018-12-11 17:41:42 -08:00
Yann Collet
9b784dec7f changed parameter name to ZSTD_c_overlapLog
from overlapSizeLog.

Reasoning :
`overlapLog` is already used everwhere, in the code, command line and documentation.
`ZSTD_c_overlapSizeLog` feels unnecessarily different.
2018-12-11 16:55:33 -08:00
Yann Collet
5e6aaa3abb fixed btultra2 usage with prefix
notably while using multi-threading
2018-12-10 18:45:03 -08:00
Yann Collet
3619c34399 fix assert position within ZSTD_compress2() 2018-12-10 17:42:35 -08:00
Yann Collet
c226a7b9f3 fixed ZSTD_compress2()
as suggested by @terrelln
2018-12-10 17:33:49 -08:00
Yann Collet
37e314a68d updated clevel table for large inputs 2018-12-09 22:38:05 -08:00
Yann Collet
c9c4c7ec8c update clevel table for 256K 2018-12-08 21:40:08 -08:00
Yann Collet
8075d75f9c update clevel table for 128K 2018-12-08 10:42:55 -08:00
Yann Collet
95b152ab33 updated clevel table for 16K
to introduce btultra2
2018-12-07 20:12:43 -08:00
Yann Collet
d613fd9afe linked btultra2 as strategy9
and ensure zstdbench detects out-of-bound parameters
2018-12-06 19:27:37 -08:00
Yann Collet
ae370b0e12 minor bound refinements 2018-12-06 16:51:17 -08:00
Yann Collet
39e28982cf introduced constants ZSTD_STRATEGY_MIN and ZSTD_STRATEGY_MAX 2018-12-06 16:16:16 -08:00
Yann Collet
c3c3488981 fixed c++ assignment to enum 2018-12-06 15:57:55 -08:00
Yann Collet
be9e561da4 changed ZSTD_c_compressionStrategy into ZSTD_c_strategy
also : fixed paramgrill, and limit conditions
2018-12-06 15:00:52 -08:00
Yann Collet
e9448cdf4c introduced strategy btultra2
note : not yet applied on any compression level
2018-12-06 13:38:09 -08:00
Yann Collet
3583d19c4e changed parameter names from ZSTD_p_* to ZSTD_c_*
for naming consistency
2018-12-05 17:26:02 -08:00
Yann Collet
aec945f0dc implemented ZSTD_dParam_getBounds()
and ZSTD_DCtx_setParameter()
2018-12-04 15:35:37 -08:00
Yann Collet
6ced8f7c7c joined normal streaming API with advanced one 2018-12-03 14:22:38 -08:00
Yann Collet
d8e215cbee created ZSTD_compress2() and ZSTD_compressStream2()
ZSTD_compress_generic() is renamed ZSTD_compressStream2().

Note that, for the time being,
the "stable" API and advanced one use different parameter planes :
setting parameters using the advanced API does not influence ZSTD_compressStream()
and using ZSTD_initCStream() does not influence parameters for ZSTD_compressStream2().
2018-11-30 11:25:56 -08:00
Yann Collet
d4d4e109e9 getParameter fills an int*
rather than an unsigned*
for consistency
since type of setParameter() changed to int.
2018-11-21 15:37:26 -08:00
Yann Collet
41c7d0b1e1 changed hashEveryLog into hashRateLog 2018-11-21 14:36:57 -08:00
Yann Collet
5d3592398d fixed fall-through 2018-11-20 16:09:33 -08:00
Yann Collet
5c6d4b18ac completed implementation of ZSTD_cParam_getBounds()
for all parameters
2018-11-20 16:06:00 -08:00
Yann Collet
2e7fd6a2cb fixed remaining searchLength invocations 2018-11-20 15:13:27 -08:00
Yann Collet
e874dacc08 changed searchLength into minMatch
refactored all relevant API and calls
for consistency.
2018-11-20 14:56:07 -08:00
Yann Collet
114bd4346e changed enum type name to ZSTD_ResetDirective
for naming consistency :
types should start with a capital letter (after prefix)
2018-11-20 12:00:20 -08:00
Yann Collet
3b838abf97 ZSTD_CCtx_setParameter : value argument is now int
for compatibility with compression level
2018-11-20 11:53:01 -08:00
Yann Collet
5c68639186 updated ZSTD_DCtx_reset()
signature and behavior is now the same as ZSTD_CCtx_reset()
2018-11-15 16:12:39 -08:00
Yann Collet
06c8d5a4f4 Merge branch 'dev' into advancedAPI
fixed rsyncable
2018-11-15 10:51:24 -08:00
Nick Terrell
b9693d3a49 [lib] Add rsyncable mode
- Add rsyncable mode to multithreaded mode
- Factor out LDM's hash function for reuse
2018-11-14 16:59:57 -08:00
Yann Collet
7b0391e37e finalized retrofit of ZSTD_CCtx_reset()
updated all depending sources
2018-11-14 13:05:35 -08:00
Yann Collet
ff8d371708 modified ZSTD_CCtx_reset()
which now accepts an enum,
to distinguish between resetting the session, or the parameters (or both).

removed ZSTD_CCtx_resetParameters(), which is redundant.

start replacing invocation of ZSTD_CCtx_reset*() functions

Updated advanced API documentation

trimmed down amount of API staged in RC,
in particular, all functions related to ZSTD_CCtxParams()
seem too advanced.
2018-11-14 12:33:57 -08:00
Yann Collet
d7e10a774a added constant ZSTD_WINDOWLOG_LIMIT_DEFAULT
answering #1407.

Also : removed obsolete function ZSTD_setDStreamParameter()
which could only be used with one parameter (DStream_p_maxWindowSize).
Now replaced by ZSTD_DCtx_setWindowSize() (which exists since a few revisions)
2018-11-13 18:12:34 -08:00
Yann Collet
b83d1e7714 removed some static const variables
and replaced by traditional macro constants.

Unfortunately, C doesn't consider `static const` to mean "constant"
2018-11-13 16:56:32 -08:00
Yann Collet
f28af025d9
Merge pull request #1413 from felixhandte/attach-dict-fix-unsigned-compare
Fix #1412: Perform Signed Comparison When Setting Attach Dict Param
2018-11-12 17:53:11 -08:00
Yann Collet
626040ab53 changed PREFETCH() macro into PREFETCH_L2()
which is more accurate
2018-11-12 17:05:32 -08:00
W. Felix Handte
5faef4d378 Const 2018-11-12 14:48:42 -08:00
W. Felix Handte
2d9332eb21 Fix Types 2018-11-12 12:52:31 -08:00
W. Felix Handte
4127de5fa6 Switch Enum to Only Non-Negative Values, Update Comments 2018-11-12 12:47:47 -08:00
W. Felix Handte
596f7d1256 Fix #1412: Perform Signed Comparison When Setting Attach Dict Param 2018-11-12 12:07:57 -08:00
Yann Collet
e0701d3c5d
Merge pull request #1404 from facebook/T36302429
fixed T36302471
2018-11-06 11:53:20 -08:00
Yann Collet
3e5cdf1b6a fixed T36302429 2018-11-05 17:50:30 -08:00
Yann Collet
2caa995558 just add an assert() in ZSTD_insertBtAndGetAllMatches()
to express a condition on ll0 .
May help static analyzer as in #1397
2018-11-05 17:13:32 -08:00
Yann Collet
3a90229616
Merge pull request #1395 from facebook/decompressblock
created zstd_decompress_block module
2018-10-29 16:28:09 -07:00
Yann Collet
8d56f4baee added a few comments for clarifications 2018-10-26 15:21:52 -07:00
Yann Collet
7b74405150 refactor HUF_compress_internal for clarity
changed workspace parameter convention
to always provide workspaceSize,
so that size can be explicitly checked.

Also, use more enum to make the meaning of some parameters more explicit.
2018-10-26 13:21:37 -07:00
W. Felix Handte
b8235be865 Avoid Searching Dictionary in ZSTD_btlazy2 When an Optimal Match is Found
Bailing here is important to avoid reading past the end of the input buffer.
2018-10-08 15:59:32 -07:00
W. Felix Handte
d121b3451c Clean Up Debug Log Statements 2018-10-08 15:59:32 -07:00
W. Felix Handte
08da9ad316 Remove Unused Variable 2018-10-08 15:59:32 -07:00
Yann Collet
22ddf3523a fixed msan warning
on btlazy2 strategy with dictAttach
2018-10-02 18:20:20 -07:00
Yann Collet
228c6e5147
Merge pull request #1317 from felixhandte/split-logs
Independent Dictionary and Working Context Table Logs
2018-10-01 17:20:12 -07:00
W. Felix Handte
5b296869df Revert Ability to Set HashLog and ChainLog on Context When Dict is Attached
This capability is not needed / used in the current unit of work. I'll
re-introduce it later, when we start allowing users to override the deduced
working context logs.
2018-10-01 13:28:13 -07:00
W. Felix Handte
c2369fedc4 Restore Passing CParams to ZSTD_insertAndFindFirstIndex_internal 2018-09-28 17:12:54 -07:00
W. Felix Handte
bad74c4781 Use Working Ctx Logs when not in DMS Mode
We pre-hash the ptr for the dict match state sometimes. When that actually
happens, a hashlog of 0 can produce undefined behavior (right shift a long
long by 64). Only applies to unoptimized compilations, since when
optimizations are applied, those hash operations are dropped when we're not
actually in dms mode.
2018-09-28 17:12:54 -07:00
W. Felix Handte
c38acff94f When Attaching Dictionary, Size Working Tables Based on Input Size Only 2018-09-28 17:12:54 -07:00
W. Felix Handte
9d87d50878 Remove Log Overriding for the Time Being 2018-09-28 17:12:54 -07:00
W. Felix Handte
77fd17d93f Remove Strategy-Dependency in Making Attachment Decision 2018-09-28 17:12:54 -07:00
W. Felix Handte
00c088b32d Support Split Logs in ZSTD_btopt..ZSTD_btultra 2018-09-28 17:12:54 -07:00
W. Felix Handte
0783492178 Bump Split Log Support to ZSTD_btultra 2018-09-28 17:12:54 -07:00
W. Felix Handte
e4ac4a0f16 Support Split Logs in ZSTD_greedy..ZSTD_btlazy2 2018-09-28 17:12:54 -07:00
W. Felix Handte
e710dc3369 Bump Split Log Support to ZSTD_btlazy2 2018-09-28 17:12:54 -07:00
W. Felix Handte
22fcb8d4c7 Support Split Logs in ZSTD_dfast 2018-09-28 17:12:54 -07:00
W. Felix Handte
a232b3bb7c Bump Split Log Support to ZSTD_dfast 2018-09-28 17:12:54 -07:00
W. Felix Handte
fe96e98f81 Support a Separate Hash Log in ZSTD_fast 2018-09-28 17:12:54 -07:00
W. Felix Handte
bc880ebe8f Stop Passing in hashLog and stepSize to ZSTD_compressBlock_fast_generic 2018-09-28 17:12:54 -07:00
W. Felix Handte
b3107c7799 Temporary Commit to Retain Requested Hash and Chain Logs During Dict Attach 2018-09-28 17:12:54 -07:00
W. Felix Handte
34e0193129 Allow Setting Hash and Chain Logs on Contexts with Attached CDict 2018-09-28 17:12:54 -07:00
W. Felix Handte
eae8232f50 For Supported Strategies, Attach Dict Even When Params Don't Match 2018-09-28 17:12:54 -07:00
W. Felix Handte
01ff945eae Split Attach and Copy Reset Strategies into Separate Implementation Functions 2018-09-28 17:12:54 -07:00
W. Felix Handte
a6d6bbeae1 Pull Attachment Decision into Separate Function 2018-09-28 17:12:54 -07:00
W. Felix Handte
b7fba599ae And Then Avoid the Unused Parameter Warning 2018-09-28 17:12:54 -07:00
W. Felix Handte
1f188ae655 Move Asserts into Function to Avoid Unused Function Warning 2018-09-28 17:12:54 -07:00
W. Felix Handte
7212b5e5c2 Move Match State CParams Setting into resetCCtx and continueCCtx 2018-09-28 17:12:54 -07:00
W. Felix Handte
01e34d365b Strengthen Assertion to Assert Equality 2018-09-28 17:12:53 -07:00
W. Felix Handte
50cc1cf4d5 Remove CParams Arg from ZSTD_ldm_blockCompress 2018-09-28 17:12:53 -07:00
W. Felix Handte
14764de49f Stop Separately Passing CParams in ZSTD_lazy Internal Functions 2018-09-28 17:12:53 -07:00
W. Felix Handte
97149f22c3 Stop Separately Passing CParams in ZSTD_opt Internal Functions 2018-09-28 17:10:42 -07:00
W. Felix Handte
dcdf437fed Also Remove CParams from Table Filling Functions' Args 2018-09-28 17:10:42 -07:00
W. Felix Handte
3483f89101 Also Assert Equivalency When Filling MatchState with Prefix 2018-09-28 17:10:42 -07:00
W. Felix Handte
6cb2454646 Remove CParams from Block Compressor Functions' Args 2018-09-28 17:10:42 -07:00
W. Felix Handte
03103269de Assert ctx and ms cparams Equivalency 2018-09-28 17:10:42 -07:00
W. Felix Handte
4e3ecee9ed Remove cParams from CDict 2018-09-28 17:10:42 -07:00
W. Felix Handte
76ef87ed9d Add ZSTD_compressionParameters to ZSTD_matchState_t 2018-09-28 17:10:42 -07:00
Nick Terrell
6391cd1030 [zstd] Fix newly added test case 2018-09-28 12:09:28 -07:00
Nick Terrell
a180ea07c4 Restore ZSTD_noCompressBlock() for clarity 2018-09-27 16:06:02 -07:00
Nick Terrell
f2d6db45cd [zstd] Add -Wmissing-prototypes 2018-09-27 15:24:48 -07:00
Yann Collet
2a5cd8535a
Merge pull request #1342 from facebook/fixcatyd
fix : huge (>4GB) chain of blocks
2018-09-27 10:20:14 -07:00
Yann Collet
404a7bfed0 moved again overflow correction
cannot work from within ZSTD_compressBlock()
2018-09-26 18:06:53 -07:00
Yann Collet
0e2dbac18a changed overflow correction place
keep one in compress_frameChunk(),
so that it's tested at every loop
in case some user simply some large mulit-GB input in a single invocation.

Add one in ZSTD_compressBlock(),
since compressBlock() explicitly skips frameChunk().
2018-09-26 15:35:38 -07:00
Yann Collet
f98c69d77c fix : huge (>4GB) stream of blocks
experimental function ZSTD_compressBlock() is designed for very small data in mind,
for situation where saving the ~12 bytes of frame header can actually make a difference.

Some systems though may have to deal with small and large data entangled.
If it's larger than a block (> 128KB), compressBlock() cannot compress them in one round.

That's why it's possible to compress in multiple rounds.
This is a chain of compressed blocks.

Some users push this capability to the limit, encoding gigantic chain of blocks.
On crossing the 4GB limit, some internal overflow occurs.

This fix moves the overflow correction mechanism higher in the call chain,
so that it's applied also to gigantic chains of blocks.

Added a test case in fuzzer.c, which crashes before the fix, and pass now.
2018-09-26 14:24:28 -07:00
Yann Collet
04f47bbdd2 Merge branch 'dev' into adapt 2018-09-24 16:56:45 -07:00
Yann Collet
c484345a82 Merge branch 'mingw' into adapt 2018-09-21 16:00:46 -07:00
Yann Collet
bfff4f4809 ensure all writes to job->cSize are mutex protected
even when reporting errors,
using a macro for code brevity, as suggested by @terrelln,
2018-09-21 16:00:39 -07:00
Yann Collet
32b7cf1bcf fixed tautological tests
involving ZSTD_TARGETLENGTH_MIN (== 0)
2018-09-21 15:04:43 -07:00
Yann Collet
c044345f8f Merge branch 'mingw' into minclevel 2018-09-21 14:56:57 -07:00
Yann Collet
de6c75e4e5
Merge pull request #1318 from felixhandte/shadow-dict-matches
Don't Search Dictionary Context When Working Context Search Resulted in Mismatch
2018-09-21 12:15:33 -07:00
Yann Collet
a54c86cfc6 defined a minimum negative level
which can be probed using new function ZSTD_minCLevel().

Also : redefined ZSTD_TARGETLENGTH_MIN/MAX for consistency

used the opportunity to bump version number to v1.3.6
2018-09-20 16:52:03 -07:00
Yann Collet
7992942d66 fixed complex tsan issue
when job->consumed == job->src.size , compression job is presumed completed,
so it must be the very last action done in worker thread.
2018-09-20 13:47:31 -07:00
Yann Collet
6b07a66aec fixed minor reporting discrepancy in MT mode 2018-09-19 16:30:55 -07:00
Yann Collet
ca02ebee07 removed static variables
so that --adapt can work on multiple input files too
2018-09-19 15:25:50 -07:00
Yann Collet
2f78228f65 Merge branch 'dev' into adapt 2018-09-19 12:43:42 -07:00
ko-zu
18b4a1da61 Fix clang build
Fix dixygen comment
Fix clang binary path
2018-09-16 10:27:02 +09:00
W. Felix Handte
b76c888497 ZSTD_dfast: Don't Search Dict Context When Mismatch Was Found 2018-09-14 15:24:25 -07:00
W. Felix Handte
b048af5999 ZSTD_fast: Don't Search Dict Context When Mismatch Was Found 2018-09-14 15:23:35 -07:00
Yann Collet
31ebb26945
Merge pull request #1301 from terrelln/lit-size
[zstd] Fix seqStore growth
2018-08-28 17:10:25 -07:00
Nick Terrell
5e580de6da [zstd] Fix seqStore growth
We could undersize the literals buffer by up to 11 bytes,
due to a combination of 2 bugs:
* The literals buffer didn't have `WILDCOPY_OVERLENGTH` extra
  space, like it is supposed to.
* We didn't check the literals buffer size in `ZSTD_sufficientBuff()`.
2018-08-28 13:24:44 -07:00
Yann Collet
b37a0a6bde
Merge pull request #1298 from facebook/bench
Refactored bench.c
2018-08-28 12:25:02 -07:00
Yann Collet
af23d39eb8
Merge pull request #1297 from felixhandte/check-offset-table
Fix Missing Offset Table Check
2018-08-24 17:36:44 -07:00
W. Felix Handte
37f17ee237 Mark Repeated Offset Table as Needing Check 2018-08-24 14:33:34 -07:00
Nick Terrell
e34e917655 Fix compiler warning 2018-08-23 17:48:06 -07:00
Nick Terrell
924944e471 [zstd] Reuse the ZSTD_CCtx more often with small data. 2018-08-23 17:48:06 -07:00
Yann Collet
2e45badff4 refactored bench.c
for clarity and safety, especially at interface level
2018-08-23 14:21:18 -07:00
Yann Collet
c71c4f23d7 fix "unused parameter" in single-thread mode
within newly added ZSD_toFlushNow()
2018-08-20 11:40:10 -07:00
Yann Collet
105677c6db created ZSTDMT_toFlushNow()
tells in a non-blocking way if there is something ready to flush right now.
only works with multi-threading for the time being.

Useful to know if flush speed will be limited by lack of production.
2018-08-17 18:11:54 -07:00
Yann Collet
1515f0bb0d fixed more issues detected by recent version of scan-build
test run on Linux
2018-08-16 15:20:25 -07:00
Yann Collet
3692c31598 Merge branch 'dev' into scanbuild 2018-08-15 13:50:49 -07:00
Yann Collet
6e66bbf5dd fixed several minor issues detected by scan-build
only notable one :
writeNCount() resists better vs invalid distributions
(though it should never happen within zstd anyway)
2018-08-14 16:55:35 -07:00
Yann Collet
3e4617ef54 frameProgression reports nbActiveWorkers and output flushed 2018-08-14 11:49:25 -07:00
Yann Collet
e7a49c6683 introduced command --adapt 2018-08-11 20:48:06 -07:00
Yann Collet
2dd76037be zstd cli can increase level when input is too slow 2018-08-09 15:51:30 -07:00
Yann Collet
79a35ac20d minor code comments improvements 2018-08-09 15:16:31 -07:00
W. Felix Handte
2ca7c69167 Fix CDict Attachment to Handle CDicts with Non-Zero Starts
CDicts were previously guaranteed to be generated with `lowLimit=dictLimit=0`.
This is no longer true, and so the old length and index calculations are no
longer valid. This diff fixes them to handle non-zero start indices in CDicts.
2018-08-07 18:14:14 -07:00
Yann Collet
5808027abf Merge branch 'dev' into fix1241 2018-08-03 16:08:33 -07:00
Nick Terrell
dc5a67cb7b Disallow tableLog == srcLog 2018-08-02 11:12:17 -07:00
cyan4973
aade1e5904 Merge branch 'dev' into fix1241 2018-07-30 16:30:35 +02:00
Nick Terrell
9889bca530 [FSE] Fix division by zero
When the primary normalization method fails, and
`(1 << tableLog) == (maxSymbolValue + 1)`, and every symbol gets assigned
normalized weight 1 or -1 in the first loop, then the next division can
raise `SIGFPE`.
2018-07-27 17:30:03 -07:00
Yann Collet
6e490a2f09
Merge pull request #1237 from terrelln/init-cstream-adv
Set requestedParams in ZSTD_initCStream*()
2018-07-18 16:33:30 +02:00
cyan4973
9597b438e9 fix #1241
Ensure that first input position is valid for a match
even during first usage of context
by starting reference at 1
(avoiding the problematic 0).
2018-07-17 18:52:57 +02:00
cyan4973
53e1f0504e zstdmt debug traces compatibles with mingw
since mingw does not have `sys/times.h`,
remove this path when detecting mingw compilation.
2018-07-17 14:39:44 +02:00
Nick Terrell
6d222c437c Set requestedParams in ZSTD_initCStream*()
The correct parameters are used once, but once `ZSTD_resetCStream()` is
called the default parameters (level 3) are used. Fix this by setting
`requestedParams` in the `ZSTD_initCStream*()` functions.

The added tests both fail before this patch and pass after.
2018-07-12 18:35:55 -07:00
Yann Collet
4489daec09 slightly adjusted default-distribution threshold
depending on strategy.
fast favors faster compression and decompression speeds.
2018-06-26 20:10:45 -07:00
Nick Terrell
b426bcc097
[zstdmt] Fix jobsize bugs (#1205)
[zstdmt] Fix jobsize bugs

* `ZSTDMT_serialState_reset()` should use `targetSectionSize`, not `jobSize` when sizing the seqstore.
  Add an assert that checks that we sized the seqstore using the right job size.
* `ZSTDMT_compressionJob()` should check if `rawSeqStore.seq == NULL`.
* `ZSTDMT_initCStream_internal()` should not adjust `mtctx->params.jobSize` (clamping to MIN/MAX is okay).
2018-06-25 15:21:08 -07:00
Yann Collet
3b53bfe4f3
Merge pull request #1200 from felixhandte/zstd-attach-dict-pref
Add CCtx Param Controlling Dict Attachment Behavior
2018-06-25 12:42:31 -07:00
Yann Collet
3934e010a2
Merge pull request #1197 from facebook/poolResize
Thread Pool resize
2018-06-22 14:20:07 -07:00
Yann Collet
fbd5dfc1b1 changed POOL_resize() return type to int
return is now just en error code.
This guarantee that `ctx` remains valid after POOL_resize().
Gets rid of internal POOL_free() operation.
2018-06-22 12:14:59 -07:00
Yann Collet
1d5648ca10
Merge pull request #1196 from felixhandte/zstd-btopt-in-place-dict
ZSTD_btopt: Support Searching the Dictionary Context In-Place
2018-06-22 11:53:23 -07:00
Yann Collet
f6242d30b7
Merge pull request #1202 from facebook/barelyCompressible
Increase threshold detection of poorly compressible data
2018-06-22 11:52:52 -07:00
Yann Collet
698fd00afb huf: increase threshold detection of poorly compressible data 2018-06-21 18:32:38 -07:00
W. Felix Handte
01bb1c1016 Add CCtx Param Controlling Dict Attachment Behavior 2018-06-21 17:29:25 -04:00
W. Felix Handte
3e91dc4d6a Add Repcode Bounds Check 2018-06-21 15:54:41 -04:00
W. Felix Handte
5bd3d4b7d2 Add Debug Log Statement 2018-06-21 15:54:07 -04:00
W. Felix Handte
3caba150c6 Fix dmsBtLow Test 2018-06-21 15:53:40 -04:00
W. Felix Handte
5da9bbc38e Conceivably Dedup ZSTD_noDict and ZSTD_dictMatchState _insertBt1 Impls
By reverting to the bool extDict flag, we call ZSTD_insertBt1 with the same
const args in both non-extDict dictModes.
2018-06-21 11:20:01 -04:00
W. Felix Handte
5d81f71e83 Consistency in Guarding DMS-Only Variable Initializations 2018-06-20 16:54:53 -04:00
W. Felix Handte
9c14eafe3d Also Use matchLow for HC3 Match 2018-06-20 15:51:14 -04:00
W. Felix Handte
0a6cf7cd1d Minor Changes 2018-06-20 15:27:23 -04:00
W. Felix Handte
ae1f3898a2 Remove Dead(!) HC3 DMS Lookup 2018-06-20 15:27:12 -04:00
Yann Collet
93702a7a62
Merge pull request #1198 from facebook/msdebug
made Visual Studio compatible with DEBUGLEVEL >= 2
2018-06-20 12:26:31 -07:00
cyan4973
ae0b7ffa0a made Visual Studio compatible with DEBUGLEVEL >= 2 2018-06-20 09:45:02 -07:00
Yann Collet
166901dc72 reduced POOL_resize() restriction
It's not necessary to ensure that no job is ongoing.
The pool is only expanded, existing threads are preserved.
In case of error, the only option is to return NULL and terminate the thread pool anyway.
2018-06-19 18:07:18 -07:00
Yann Collet
066fbbfe1c make zstdmt resize its context
when nbThreads change.

Technically, it only expands.
But when instructed to use less threads,
the thread pool will limit nb of concurrent threads.
2018-06-19 17:28:56 -07:00
Yann Collet
6768cf53fd
Merge pull request #1190 from terrelln/ldm-adjust
Adjust advanced parameters to source size
2018-06-19 14:40:56 -07:00
W. Felix Handte
03c39c540b Fix Incorrect Param 2018-06-19 15:36:33 -04:00
W. Felix Handte
de639502aa Update Dict Attachment Cut-Offs 2018-06-19 15:36:13 -04:00
W. Felix Handte
f0a13bcd68 Make Sure Position 0 Gets Into the Tree 2018-06-19 15:10:06 -04:00
W. Felix Handte
87fe4788a3 Fix Compression Ratio Regression #1 2018-06-19 13:01:21 -04:00
W. Felix Handte
4bb79f9c55 Misc Changes 2018-06-19 13:01:21 -04:00
W. Felix Handte
2091f34e9e Find Proper Matches 2018-06-19 13:01:21 -04:00
W. Felix Handte
64348a15f1 Misc Fixes 2018-06-19 13:01:21 -04:00
W. Felix Handte
ade8586ce6 Find mls == 3 Matches 2018-06-19 13:01:21 -04:00
W. Felix Handte
ce743312e2 Fix Typo 2018-06-19 13:01:21 -04:00
W. Felix Handte
a075864756 Switch != ZSTD_extDict to == ZSTD_noDict 2018-06-19 13:01:21 -04:00
W. Felix Handte
1e03377bde Implement RepCode Check 2018-06-19 13:01:21 -04:00
W. Felix Handte
ccbf067973 Add _dictMatchState Functions 2018-06-19 13:01:21 -04:00
W. Felix Handte
d5d8240967 Convert extDict Flag to dictMode Enum 2018-06-19 13:01:21 -04:00
W. Felix Handte
93c3184d44 Attach Dicts when Using ZSTD_btopt and ZSTD_btultra 2018-06-19 13:01:21 -04:00
Nick Terrell
3841dbac84 Adjust advanced parameters to source size
In the new advanced API, adjust the parameters even if they are explicitly
set. This mainly applies to the `windowLog`, and accordingly the `hashLog`
and `chainLog`, when the source size is known.
2018-06-18 15:49:31 -07:00
Yann Collet
e30f13bde0
Merge pull request #1185 from felixhandte/zstd-btlazy-in-place-dict
ZSTD_btlazy2: Support Searching the Dictionary Context In-Place
2018-06-18 13:29:44 -07:00
Yann Collet
9698d2fb72
Merge pull request #1189 from facebook/hist
histogram module
2018-06-14 20:39:52 -04:00
Yann Collet
6901c94cd6 avoid duplicate code comments
when a function is decribed in hist.h,
do not describe it again in hist.c
to avoid future doc synchronization issues.
2018-06-14 19:47:05 -04:00
Yann Collet
a71513bec6
Merge pull request #1184 from facebook/debug
Grouped debug functions into debug.h
2018-06-14 16:21:53 -04:00
W. Felix Handte
0c654d22c8 Force Inline BtFindBestMatch 2018-06-14 14:54:39 -04:00
Yann Collet
2d76defbfe grouped all histogram functions into hist.c
renamed functions with HIST_* prefix
2018-06-13 19:49:31 -04:00
W. Felix Handte
0551de4b5a Search Dict for Matches 2018-06-13 16:06:28 -04:00
W. Felix Handte
ace9cfa950 Attach Dicts when Using ZSTD_btlazy2 2018-06-13 16:06:28 -04:00
Yann Collet
fa41bcc2c2 grouped debug functions into debug.h
There were 2 competing set of debug functions
within zstd_internal.h and bitstream.h.
They were mostly duplicate, and required care to avoid messing with each other.

There is now a single implementation, shared by both.

Significant change :
The macro variable ZSTD_DEBUG does no longer exist,
it has been replaced by DEBUGLEVEL,
which required modifying several source files.
2018-06-13 15:43:09 -04:00
W. Felix Handte
d53200a846 Fix Cast Warning 2018-06-13 14:58:36 -04:00
W. Felix Handte
b82063b266 Extend Dictionary Matches Backwards 2018-06-13 14:58:36 -04:00
W. Felix Handte
d53a04211c Update Dictionary Attachment Cutoff Values Again 2018-06-13 14:58:36 -04:00
W. Felix Handte
2162aa9f18 Do Not Inline DMS Search Function 2018-06-13 14:58:36 -04:00
W. Felix Handte
338bede9b5 Also Implement Depth Repcode Checks 2018-06-13 14:58:36 -04:00
W. Felix Handte
555ab9f8cf Apply Match Continuation Bug Fix 2018-06-13 14:58:36 -04:00
W. Felix Handte
c87dd2121d Update Dictionary Attachment Cutoff Values 2018-06-13 14:58:36 -04:00
W. Felix Handte
6204b6d592 Check Dict Match State in ZSTD_HcFindBestMatch_generic 2018-06-13 14:58:36 -04:00
W. Felix Handte
211a61b69b Focus on Non-BT Impls for the Moment 2018-06-13 14:58:36 -04:00
W. Felix Handte
2e93736a77 Remove Pre-Existing Repcode Check 2018-06-13 14:58:36 -04:00
W. Felix Handte
3b82a23a35 Second Repcode Check 2018-06-13 14:58:36 -04:00
W. Felix Handte
a2a24bebec First Repcode Check 2018-06-13 14:58:36 -04:00
W. Felix Handte
f74c2cd673 Disallow Too-Long Repcodes When Using an Attached Dict 2018-06-13 14:58:36 -04:00
W. Felix Handte
c14db94450 Rename base -> prefixLowest 2018-06-13 14:58:36 -04:00
W. Felix Handte
5d90708a0a Go Back to Separate Intermediate Functions for Different Dict Modes 2018-06-13 14:58:36 -04:00
W. Felix Handte
f84fc63a43 Further Templatize Intermediate Functions on dictMode 2018-06-13 14:58:36 -04:00
W. Felix Handte
529d3a5acd Convert Existing U32 extDict Vars to ZSTD_dictMode Enums 2018-06-13 14:58:36 -04:00
W. Felix Handte
33e2240fac Attach Dict When Using ZSTD_lazy Strategies 2018-06-13 14:58:36 -04:00
W. Felix Handte
90cfc799e5 Add _dictMatchState Stubs for ZSTD_lazy Functions 2018-06-13 14:58:36 -04:00
W. Felix Handte
a85ecb32bd Add dictMode Param to ZSTD_compressBlock_lazy_generic 2018-06-13 14:58:36 -04:00
Yann Collet
b2632bcf6c
Merge pull request #1174 from duc0/document_default_level
Expose ZSTD_CLEVEL_DEFAULT and update documentation
2018-06-12 12:09:01 -07:00
Duc Ngo
e34c000e44 Expose ZSTD_CLEVEL_DEFAULT and update documentation 2018-06-08 11:33:44 -07:00
Yann Collet
3050733042 Merge branch 'dev' into negLevels 2018-06-07 15:51:35 -07:00
Yann Collet
c2c47e24e0 support targetlen==0 with strategy==ZSTD_fast
to mean "normal compression",
targetlen >= 1 now means "disable huffman compression of literals"
2018-06-07 15:49:01 -07:00
Yann Collet
a57b4df85f removed literalCompression directive
in this version, literal compression is always disabled for ZSTD_fast strategy.

Performance parity between ZSTD_compress_advanced() and ZSTD_compress_generic()
2018-06-07 15:24:12 -07:00
Yann Collet
8537bfd85c fuzzer: make negative compression level fail
result of ZSTD_compress_advanced()
is different from ZSTD_compress_generic()
when using negative compression levels
because the disabling of huffman compression is not passed in parameters.
2018-06-07 15:12:13 -07:00
Yann Collet
e3c42c739b clean ZSTD_compress() initialization
The (pretty old) code inside ZSTD_compress()
was making some pretty bold assumptions
on what's inside a CCtx and how to init it.

This is pretty fragile by design.
CCtx content evolve.
Knowledge of how to handle that should be concentrate in one place.

A side effect of this strategy
is that ZSTD_compress() wouldn't check for BMI2 capability,
and is therefore missing out some potential speed opportunity.

This patch makes ZSTD_compress() use
the same initialization and release functions
as the normal creator / destructor ones.

Measured on my laptop, with a custom version of bench
manually modified to use ZSTD_compress() (instead of the advanced API) :
This patch :
 1#silesia.tar       : 211984896 ->  73651053 (2.878), 312.2 MB/s , 723.8 MB/s
 2#silesia.tar       : 211984896 ->  70163650 (3.021), 226.2 MB/s , 649.8 MB/s
 3#silesia.tar       : 211984896 ->  66996749 (3.164), 169.4 MB/s , 636.7 MB/s
 4#silesia.tar       : 211984896 ->  65998319 (3.212), 136.7 MB/s , 619.2 MB/s
dev branch :
 1#silesia.tar       : 211984896 ->  73651053 (2.878), 291.7 MB/s , 727.5 MB/s
 2#silesia.tar       : 211984896 ->  70163650 (3.021), 216.2 MB/s , 655.7 MB/s
 3#silesia.tar       : 211984896 ->  66996749 (3.164), 162.2 MB/s , 633.1 MB/s
 4#silesia.tar       : 211984896 ->  65998319 (3.212), 130.6 MB/s , 618.6 MB/s
2018-06-07 14:05:25 -07:00
Yann Collet
f1ea383f45 context can be sized down even with constant parameters
when parameters are "equivalent",
the context is re-used in continue mode,
hence needed workspace size is not recalculated.
This incidentally also evades the size-down check and action.

This patch intercepts the "continue mode"
so that the size-down check and action is actually triggered.
2018-06-06 15:04:12 -07:00
Yann Collet
e5e17d009f changed member name to workSpaceOversizedDuration 2018-06-06 15:00:27 -07:00
Yann Collet
f7392f3dc9 added test case 2018-06-05 14:53:28 -07:00
Yann Collet
3d523c741b added workSpaceTooLarge and workSpaceWasteful
also :
slightly increased speed of test fuzzer.16
2018-06-05 11:42:48 -07:00
Yann Collet
357c648c3f changed a few variable names
to unify naming convention
2018-06-04 17:10:50 -07:00
Yann Collet
2108decb41 Fixed a nasty corruption bug
recently introduce into the new dictionary mode.
The bug could be reproduced with this command :
./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639

error was in function ZSTD_count_2segments() :
the beginning of the 2nd segment corresponds to prefixStart
and not the beginning of the current block (istart == src).
This would result in comparing the wrong byte.
2018-06-01 18:54:34 -07:00
Yann Collet
7c33b48221
Merge pull request #1151 from felixhandte/zstd-dfast-in-place-dict-goto
ZSTD_dfast: Support Searching the Dictionary Context In-Place (Alternate `goto` Implementation)
2018-05-31 17:37:09 -07:00
W. Felix Handte
48deab92de Allow Different Dict Attachment Cut-Offs for Different Strategies 2018-05-31 17:37:44 -04:00
W. Felix Handte
f86796639e Remove Incorrect and Extraneous Repcode Bounds Check 2018-05-31 17:02:29 -04:00
Yann Collet
809f2f9322 minor update of literal cost function
just assert() there is no negative cost evaluation for literals
2018-05-29 15:34:50 -07:00
Yann Collet
463a0fe38b simplified optimal parser
removed "cached" structure.
prices are now saved in the optimal table.

Primarily done for simplification.
Might improve speed by a little.
But actually, and surprisingly, also improves ratio in some circumstances.
2018-05-29 14:07:25 -07:00
Yann Collet
bb6eaf6495
Merge pull request #1153 from facebook/dynThreshold
changed dynamic fse threshold for offset
2018-05-26 08:43:45 -07:00
Yann Collet
e916c365a1 fixed minor visual warning 2018-05-25 20:43:09 -07:00
Yann Collet
a7fdceeccd changed dynamic fse threshold for offset
recent experienced showed that
default distribution table for offset
can get it wrong pretty quickly with the nb of symbols,
while it remains a reasonable choice much longer for lengths symbols.

Changed the formula,
so that dynamic threshold is now 32 symbols for offsets.
It remains at 64 symbols for lengths.

Detection based on defaultNormLog
2018-05-25 17:41:16 -07:00
Yann Collet
4b3a36d5d8 Merge branch 'dev' into lowCompression 2018-05-25 15:45:03 -07:00
Yann Collet
5f177f1c53 btultra accepts blocks with poorer compression ratio
zstd rejects blocks which do not compress by at least a certain amount.
In which case, such block is simply emitted uncompressed (even if a little bit of compression could be achieved).
This is better for decompression speed, hence for energy.

The logic is controlled by ZSTD_minGain().
The rule is applied uniformly, at all compression levels.

This change makes btultra accepts blocks with poor compression ratios.
We presume that users of btultra mode prefers compression ratio over some decompress speed gains.

The threshold for minimum gain is lowered for btultra
from s>>6 (~1.5% minimum gain)
to s>>7 (~0.8% minimum gain).

This is a prudent change.
Not sure if it's large enough.
2018-05-25 15:19:52 -07:00
Yann Collet
e2c0e3d437 slightly nudge choices towards less sequences
also slightly improve some strange detrimental corner cases.
2018-05-25 14:52:21 -07:00
W. Felix Handte
5b292b5685 Check Long + 1 Matches in Both Prefix and Dict in Bothe Short Match Paths 2018-05-25 13:13:57 -04:00
W. Felix Handte
88b733b380 Interleave Prefix and Dict Searches 2018-05-25 13:13:57 -04:00
W. Felix Handte
1850025156 Refactor ZSTD_dfast to Use gotos 2018-05-25 13:13:57 -04:00
W. Felix Handte
43606f9c83 ... When I Said "HashTable", I Meant "ChainTable" 2018-05-25 13:13:28 -04:00
W. Felix Handte
ec7efe88f5 Fix Off-By-One Error 2018-05-25 13:13:28 -04:00
W. Felix Handte
2bfe43267e Disallow Too-Long Repcodes When Using an Attached Dict 2018-05-25 13:13:28 -04:00
W. Felix Handte
b97ad3f457 Port Changes Made to ZSTD_fast to ZSTD_dfast 2018-05-25 13:13:28 -04:00
W. Felix Handte
2313cca1b7 Implement Second Repcode Check 2018-05-25 13:13:28 -04:00
W. Felix Handte
0998f10813 Implement First Repcode Check 2018-05-25 13:13:28 -04:00
W. Felix Handte
50c5b2bb90 Find Dict Hash Table Matches 2018-05-25 13:13:28 -04:00
W. Felix Handte
7a25f7ef5b Existing Repcode Check Only Applies to noDict Case 2018-05-25 13:13:28 -04:00
W. Felix Handte
8b241da4df Properly Initialize Repcode Values 2018-05-25 13:13:28 -04:00
W. Felix Handte
7097a03749 Add Necessary Dict Variables 2018-05-25 13:13:28 -04:00
W. Felix Handte
aacbbf4f9a Rename 'lowest' to 'localLowest' to Prepare to Introduce Dict Indices 2018-05-25 13:13:28 -04:00
W. Felix Handte
c10d1b4011 Skeleton for In-Place Impl for ZSTD_dfast 2018-05-25 13:13:28 -04:00
Yann Collet
f6ad59ab5c Merge branch 'dev' into staticDictCost 2018-05-24 16:21:02 -07:00
Yann Collet
b5ef32fea7 Merge branch 'dev' into fracFse 2018-05-24 14:09:49 -07:00
Yann Collet
776128d16f fix corner case when requiring cost of an FSE symbol
ensure that, when frequency[symbol]==0,
result is (tableLog + 1) bits
with both upper-bit and fractional-bit estimates.

Also : enable BIT_DEBUG in /tests
2018-05-24 13:59:11 -07:00
Yann Collet
08c5be5db3
Merge pull request #1117 from felixhandte/zstd-fast-in-place-dict
ZSTD_fast: Support Searching the Dictionary Context In-Place
2018-05-23 19:32:25 -07:00
Nick Terrell
06b70179da
Work around bug in zstd decoder (#1147)
Work around bug in zstd decoder

Pull request #1144 exercised a new path in the zstd decoder that proved to
be buggy. Avoid the extremely rare bug by emitting an uncompressed block.
2018-05-23 18:02:30 -07:00
W. Felix Handte
d9c7e67125 Assert that Dict and Current Window are Adjacent in Index Space 2018-05-23 17:53:03 -04:00
W. Felix Handte
298d24fa57 Make loadedDictEnd an Index, not the Dict Len 2018-05-23 17:53:03 -04:00
W. Felix Handte
7ef85e0618 Fixes in re Comments 2018-05-23 17:53:03 -04:00
W. Felix Handte
582b7f85ed Don't Attach Empty Dict Contents
In weird corner cases, they produce unexpected results...
2018-05-23 17:53:03 -04:00
W. Felix Handte
9c92223468 Avoid Undefined Behavior in Match Ptr Calculation 2018-05-23 17:53:03 -04:00
W. Felix Handte
a44ab3b475 Remove Out-of-Date Comment 2018-05-23 17:53:03 -04:00
W. Felix Handte
95bdf20a87 Moar Renames 2018-05-23 17:53:03 -04:00
W. Felix Handte
7e0402e738 Also Attach Dict When Source Size is Unknown 2018-05-23 17:53:03 -04:00
W. Felix Handte
3ba70cc759 Clear the Dictionary When Sliding the Window 2018-05-23 17:53:03 -04:00
W. Felix Handte
b05ae9b608 Refine ip Initialization to Avoid ARM Weirdness 2018-05-23 17:53:03 -04:00
W. Felix Handte
1a7b34ef28 Use New Index Invariant to Simplify Conditionals 2018-05-23 17:53:03 -04:00
W. Felix Handte
2d598e6fed Force Working Context Indices Greater than Dict Indices 2018-05-23 17:53:03 -04:00
W. Felix Handte
d005e5daf4 Whitespace Fix 2018-05-23 17:53:03 -04:00
W. Felix Handte
154eb09419 Switch to Original Match Calc for noDict Repcode Check 2018-05-23 17:53:03 -04:00
W. Felix Handte
191fc74a51 Rename 'hasDict' to 'dictMode' 2018-05-23 17:53:03 -04:00
W. Felix Handte
ae4fcf7816 Respond to PR Comments; Formatting/Style/Lint Fixes 2018-05-23 17:53:03 -04:00
W. Felix Handte
ca26cecc7a Rename and Reformat 2018-05-23 17:53:03 -04:00
W. Felix Handte
66bc1ca641 Change Cut-Off to 8 KB 2018-05-23 17:53:03 -04:00
W. Felix Handte
c31ee3c7f8 Fix Rep Code Initialization 2018-05-23 17:53:03 -04:00
W. Felix Handte
b67196f30d Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff 2018-05-23 17:53:03 -04:00
W. Felix Handte
265c2869d1 Split Wrapper Functions to Cause Inlining 2018-05-23 17:53:03 -04:00
W. Felix Handte
6929964d65 Add bounds check in repcode tests 2018-05-23 17:53:03 -04:00
W. Felix Handte
70a537d1d7 Initial Repcode Check Support for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
W. Felix Handte
8d24ff0353 Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
W. Felix Handte
d18a405779 Refer to the Dictionary Match State In-Place (Sometimes) 2018-05-23 17:53:03 -04:00
Nick Terrell
e3959d5eba Fixes 2018-05-22 16:06:33 -07:00
Yann Collet
7a8b3496b4 Merge branch 'dev' into staticDictCost 2018-05-22 15:10:05 -07:00
Yann Collet
a8ddf1d370 disable 2-passes strategy 2018-05-22 15:06:36 -07:00
Nick Terrell
49cf880513 Approximate FSE encoding costs for selection
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.

* The cost of `set_basic` is computed using the cross-entropy cost
  function `ZSTD_crossEntropyCost()`, using the normalized default count
  and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
  previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
  `ZSTD_entropyCost()`, together with the cost of writing the normalized
  count `ZSTD_NCountCost()`.
2018-05-22 14:33:22 -07:00
Yann Collet
5381369cb1 Merge branch 'dev' into tableLevels 2018-05-18 18:23:27 -07:00
Yann Collet
b0b3fb517d updated compression levels for blocks of 256KB 2018-05-18 17:17:12 -07:00
Yann Collet
5cbef6e094 Merge branch 'dev' into staticDictCost 2018-05-18 16:03:06 -07:00
Yann Collet
a95e9e80d1 adding some debug functions to observe statistics 2018-05-18 14:09:42 -07:00
Yann Collet
af3da079d1 fixed minor conversion warning 2018-05-17 17:27:27 -07:00
Yann Collet
8572b4d09f fixed a pretty complex bug when combining ldm + btultra 2018-05-17 16:13:53 -07:00
Yann Collet
134388ba6b collect statistics for first block in ultra mode
this patch makes btultra do 2 passes on the first block,
the first one being dedicated to collecting statistics
so that the 2nd pass is more accurate.

It translates into a very small compression ratio gain :

enwik7, level 20:
blocks  4K : 2.142 -> 2.153
blocks 16K : 2.447 -> 2.457
blocks 64K : 2.716 -> 2.726

On the other hand, the cpu cost is doubled.

The trade off looks bad.
Though, that's ultimately a price to pay to reach better compression ratio.
So it's only enabled when setting btultra.
2018-05-17 12:24:30 -07:00
Yann Collet
a243020d37 slightly improved weight calculation
translating into a tiny compression ratio improvement
2018-05-17 11:19:44 -07:00
Yann Collet
63eeeaa1dd update table levels for blocks <= 16K
also : allow hlog to be slighly larger than windowlog,
as it's apparently good for both speed and compression ratio.
2018-05-16 16:13:37 -07:00
Yann Collet
18fc3d3cd5 introduced bit-fractional cost evaluation
this improves compression ratio by a *tiny* amount.
It also reduces speed by a small amount.

Consequently, bit-fractional evaluation is only turned on for btultra.
2018-05-16 14:53:35 -07:00
Nick Terrell
30d9c84b1a Fix failing Travis tests 2018-05-15 09:46:20 -07:00
Yann Collet
0b31304c8d Merge branch 'dev' into staticDictCost 2018-05-14 18:09:26 -07:00
Yann Collet
2c26df0e13 opt: removed static prices
after testing, it's actually always better to use dynamic prices
albeit initialised from dictionary.
2018-05-14 18:04:08 -07:00
Yann Collet
f372ffc64d
Merge pull request #1127 from facebook/staticDictCost
Improved optimal parser with dictionary
2018-05-14 17:45:50 -07:00
Yann Collet
c9227ee16b update table for 128 KB blocks 2018-05-13 17:15:07 -07:00
Yann Collet
b4250489cf update compression levels for large inputs 2018-05-13 01:53:38 -07:00
Yann Collet
761758982e replaced FSE_count by FSE_count_simple
to reduce usage of stack memory.

Also : tweaked a few comments, as suggested by @terrelln
2018-05-11 16:03:37 -07:00
Yann Collet
99ddca43a6 fixed wrong assertion
base can actually overflow
2018-05-10 19:48:09 -07:00
Yann Collet
09d0fa29ee minor adjusting of weights 2018-05-10 18:13:48 -07:00
Yann Collet
1a26ec6e8d opt: init statistics from dictionary
instead of starting from fake "default" statistics.
2018-05-10 17:59:12 -07:00
Yann Collet
74b1c75d64 btopt : minor adjustment of update frequencies 2018-05-10 16:32:36 -07:00
Yann Collet
ac6105463a opt: minor improvements to log traces
slight improvement when using fractional-bit evaluation (opt:dictionay)
2018-05-09 15:46:11 -07:00
Yann Collet
c39061cb7b fixed declaration-after-statement warning 2018-05-09 12:07:25 -07:00
Yann Collet
4d5bd32a00 added traces to look at symbol costs
evaluation looks correct.
2018-05-09 12:00:12 -07:00
Yann Collet
c0da0f5e9e switchable bit-approximation / fractional-bit accuracy modes
also : makes it possible to select nb of fractional bits.
2018-05-09 10:48:09 -07:00
Yann Collet
ba2ad9b6b9 implemented fractional bit cost evaluation
for FSE symbols.

While it seems to work, the gains are negligible compared to rough maxNbBits evaluation.
There are even a few losses sometimes, that still need to be explained.
Furthermode, there are still cases where btlazy2 does a better job than btopt,
which seems rather strange too.
2018-05-08 17:43:13 -07:00
Yann Collet
1aff63b114 opt: shift all costs by 8 bits (* 256)
making it possible to represent fractional bit costs.
2018-05-08 16:19:04 -07:00
Yann Collet
6a3c34aa58 opt: estimate cost of both Hufman and FSE symbols
For FSE symbols : provide an upper bound,
in nb of bits,
since cost function is not able to store fractional bit costs.
2018-05-08 16:11:21 -07:00
Yann Collet
338f738c24 pass entropy tables to optimal parser
for proper estimation of symbol's weights
when using dictionary compression.

Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
2018-05-08 15:37:06 -07:00
Yann Collet
a155061328 minor code refactor for readability
removed some useless operations from optimal parser
(should not change performance, too small a difference)
2018-05-08 12:32:44 -07:00
Yann Collet
ad4524d605 fix ZSTD_compressBlock() associated with CDict
reported by @let-def.

It's actually a bug in ZSTD_compressBegin_usingCDict()
which would pass a wrong pledgedSrcSize value (0 instead of ZSTD_CONTENTSIZE_UNKNOWN)
resulting in wrong window size, resulting in downsized seqStore,
resulting in segfault when writing into the seqStore later in the process.

Added a test in fuzzer to cover this use case (fails before the patch).
2018-05-07 12:54:13 -07:00
Nick Terrell
ca77822ddf Fix parameter adjustment with dictionary
The new advanced API basically set `requestedParams = appliedParams` when
using a dictionary. This halted all parameter adjustment, which can hurt
compression ratio if, for example, the window log is small for the first
call, but the rest of the files are large.

This patch fixes the bug, and checks that the `requestedParams` don't change
in the new advanced API when using a dictionary, and generally in the fuzzer.
2018-04-25 16:32:29 -07:00
Nick Terrell
c0987986e5 Only reset CDict in ZSTD_CCtx_resetParameters() 2018-04-13 11:26:40 -07:00
Nick Terrell
9f76eebd17 Add ZSTD_CCtx_resetParameters() function
* Fix docs for `ZSTD_CCtx_reset()`.
* Add `ZSTD_CCtx_resetParameters()`.

Fixes #1094.
2018-04-12 16:54:07 -07:00
Nick Terrell
3c3f59e68f
Enforce pledgeSrcSize whenever known (#1106)
The test fails before the patch and passes after.

Fixes #1095.
2018-04-12 16:02:03 -07:00
Nick Terrell
280a236e9e Add ZSTD_CCtx(Param)?_getParameter() function
Closes #1096.
2018-04-12 11:50:12 -07:00
Nick Terrell
295ab0dbfa Only load extra table positions for CDicts
Zstdmt uses prefixes to load the overlap between segments. Loading extra
positions makes compression non-deterministic, depending on the previous
job the context was used for. Since loading extra position takes extra
time as well, only do it when creating a `ZSTD_CDict`.

Fixes #1077.
2018-04-02 14:41:30 -07:00
Yann Collet
29b021f9a0
Merge pull request #1067 from facebook/targetLength
removed limit ZSTD_TARGETLENGTH_MAX
2018-03-22 10:38:33 -07:00
Nick Terrell
ad344033df Fix broken assertion
The `avgJobSize` must not be lower than 256 KB for single-pass mode.
In `zstd.h` we say the minimum value for `ZSTD_p_jobSize` is 1 MB,
so ensure that we always pick a size >= 1 MB.

Found by libFuzzer fuzzer tests with large input limits.
2018-03-21 16:20:30 -07:00
Yann Collet
153bc1c004 removed limit ZSTD_TARGETLENGTH_MAX
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".

It will also be possible to define larger targetlength for ultra compression mode.

There is no adverse side effect due to removing this limit.
2018-03-21 15:50:05 -07:00
Yann Collet
a99c4a3621 Merge branch 'dev' into advancedDecompress 2018-03-21 06:08:28 -07:00
Yann Collet
87b0cf05bd
Merge pull request #1057 from facebook/lrmSettings
LRM parameters
2018-03-21 05:59:39 -07:00
Yann Collet
d1bf609abf
Merge pull request #1059 from terrelln/mt-ldm
Integrate ldm with zstdmt
2018-03-20 17:50:20 -07:00
Yann Collet
878728dc26 fixed several comments by @terrelln 2018-03-20 16:35:14 -07:00
Yann Collet
e1c52faace
Merge pull request #1060 from facebook/compressImpl
merge bmi2 implementation of encodeSequence into zstd_compress.c
2018-03-20 16:19:42 -07:00
Nick Terrell
a3b76a77ef Quiet appveyor warnings 2018-03-20 15:34:40 -07:00
Yann Collet
6873fec658 changed dictMore for dictContentType
which seems clearer to describe what the variable/argument is about.
2018-03-20 15:13:14 -07:00
Nick Terrell
136b9e2392 Fix external sequence corner cases
* Clear external sequences when we reset the `ZSTD_CCtx`.
* Skip external sequences when a block is too small to compress.
2018-03-20 14:50:28 -07:00
Yann Collet
451357f37f
Merge pull request #1058 from facebook/cctxParams
updated CCtxParams API
2018-03-20 12:36:12 -07:00
Yann Collet
2ed5af0766 merge bmi2 implementation of encodeSequence into zstd_compress.c 2018-03-19 19:10:31 -07:00
Nick Terrell
d19f803a3b Fix window size for 1 worker + flushing 2018-03-19 18:56:39 -07:00
Nick Terrell
24d9edbdd8 Set ldmParams to 0 when disabled 2018-03-19 18:23:54 -07:00
Nick Terrell
4b92574feb Fix corner cases exposed by zstreamtest 2018-03-19 17:54:04 -07:00
Nick Terrell
94c77710a9 Integrate ldm with zstdmt
Integrate ldm into zstdmt by running it in serial and in order in the first
step of each job, in the same place as the hash gets updated. The input
buffer is sized to fit the whole LDM window and 2 full buffers of slack.
Input buffers cannot be reused until the LDM step is done with them.
After the LDM step is finished, the jobs don't actually have access to the
full window, only the overlap.

Tested on a few different multi-GB files with and without sanitizers,
and with different numbers of threads.
2018-03-19 16:29:03 -07:00
Nick Terrell
aa4dbd09a1
Pull job/overlap log logic into common function (#1055)
Prepares for LDM integration by separating the job size and overlap logic
into helper functions.
2018-03-19 15:56:36 -07:00
Yann Collet
c8b3d389fd updated CCtxParams API
to respect naming convention :
ZSTD_CCtxParams_*()
2018-03-19 15:07:26 -07:00
Yann Collet
6f4d0778a5 make it possible to express compression parameters in any order 2018-03-19 14:41:23 -07:00
Nick Terrell
2253d01b27 Move XXH64_update() into worker threads
* Computes the XXH hash in the worker threads.
* Workers get a sequence number and wait until ther number shows up. On
  error, ensures that its sequence is finished, so future threads don't
  get blocked.
* Sets up for ldm integration, which will go in the same spot.
2018-03-19 11:08:27 -07:00
Yann Collet
9618c0c804 make it possible to specify LDM parameters in any order 2018-03-19 11:07:04 -07:00
Yann Collet
ec0959e701
Merge branch 'dev' into mt-single 2018-03-18 01:06:31 -07:00
Nick Terrell
4af1fafeb8 Restore setting loadedDictEnd
Setting `loadedDictEnd` was accidently removed from `ZSTD_loadDictionaryContent()`,
which means that dictionary compression will only be able to reference the parts of
the dictionary within the window. The spec allows us to reference the entire
dictionary so long as even one byte is in the window.

`ZSTD_enforceMaxDist()` incorrectly always allowed offsets up to `loadedDictEnd`
beyond the window, even once the dictionary was out of range.

When overflow protection kicked in, the check `current > loadedDictEnd + maxDist`
is incorrect if `loadedDictEnd` isn't reset back to zero. `current` could be reset
below the value, which would incorrectly allow references beyond the window. This
bug is present in `master`, but is very hard to trigger, since it requires both
dictionaries and data which triggers overflow correction.
2018-03-16 14:54:06 -07:00
Nick Terrell
f15a17e19f Use a single buffer in zstdmt
Summary:
Allocate a single input buffer large enough to house each job, as well as
enough space for the IO thread to write 2 extra buffers. One goes in the
`POOL` queue, and one to fill, and then block on a full `POOL` queue.
Since we can't overlap with the prefix, we allocate space for 3 extra
input buffers.

Test Plan:
* CI
* With and without ASAN/UBSAN run zstdmt with different number of threads
  on two large binaries, and verify that their checksums match.
* Test on the tip of the zstdmt ldm integration.

Reviewers: cyan

Differential Revision: https://phabricator.intern.facebook.com/D7284007

Tasks: T25664120
2018-03-15 16:21:33 -07:00
Yann Collet
192542b63c
Merge pull request #1047 from facebook/hufCompress
removed huf_compress_impl.h
2018-03-15 14:14:03 -07:00
Nick Terrell
a271399c97 Expose reference external sequence API
Summary:
* Expose the reference external sequences API for zstdmt.
  Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.

Expose reference external sequence API

* Expose the reference external sequences API for zstdmt.
* Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.

Test Plan:
* CI
* Test the zstdmt ldm integration stacked on top of this diff

Reviewers: cyan

Differential Revision: https://phabricator.intern.facebook.com/D7283968

Tasks: T25664120
2018-03-14 18:07:53 -07:00
Nick Terrell
1908c92c46 Merge remote-tracking branch 'upstream/dev' into extern-seq
* upstream/dev:
  Fix overflow protection with wlog=31
2018-03-14 17:26:31 -07:00
Yann Collet
a909c293c6 Merge branch 'dev' into hufCompress 2018-03-14 16:11:25 -07:00
Nick Terrell
a9a6dcba63 Expose reference external sequence API
* Expose the reference external sequences API for zstdmt.
  Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.
2018-03-14 12:29:31 -07:00
Nick Terrell
33fb966e56 Fix overflow protection with wlog=31
The overflow protection is broken when the window log is `> (3U << 29)`, so 31.
It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`,
and the the assertion `current > newCurrent` fails. This happens when the same
context is used many times over, but with a large window log, like in zstdmt.

Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`.

The added test fails before the patch, and passes after.
2018-03-14 11:45:44 -07:00
Yann Collet
50f763ec44 fixed several comments are underlined by @terrelln 2018-03-13 14:23:14 -07:00
Yann Collet
a95a88af57 removed huf_compress_impl.h
re-imported all functions inside huf_compress.c
for easier source editing.

Also updated a bunch of code comments
for clarification.
2018-03-13 14:14:05 -07:00
Yann Collet
2291b85a1e changed ZSTD_p_literalCompression into ZSTD_p_compressLiterals
prefer verb+object construction
2018-03-12 11:44:10 -07:00
Yann Collet
6a9b41b731 create command --fast[=#]
access negative compression levels from command line
for both compression and benchmark modes.

also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.

added relevant cli tests.
2018-03-11 20:01:23 -07:00
Yann Collet
a146ee04ae added negative compression levels
negative compression level trade compression ratio for more compression speed.
They turn off huffman compression of literals,
and use row 0 as baseline with a stepSize = -cLevel.

added associated test in fuzzer

also added : new advanced parameter ZSTD_p_literalCompression
2018-03-11 05:21:53 -07:00
Yann Collet
facc09aa03 minor compression level adaptation
level 12 compresses slightly more and faster
due to better btlazy2 mode
2018-03-11 03:06:52 -07:00
Yann Collet
ccb7184a76
Merge pull request #1026 from terrelln/lrm-window
LDM manages its own window round buffer
2018-02-27 17:09:10 -08:00
Nick Terrell
0a0e64c641 LDM manages its own window round buffer 2018-02-27 12:13:23 -08:00
Yann Collet
2c4d3f339a
Merge pull request #1025 from facebook/huf
Huf
2018-02-27 09:57:01 -08:00
Yann Collet
33a3f18848 fixed wrong size test 2018-02-26 18:27:51 -08:00
Yann Collet
6cdf690441 minor cleaning of huff0
Update code documentation, and properly names a few "magic constants".
Also, HUF_compress_internal() gets a cleaner way
to determine size of tables inside workspace.
2018-02-26 14:52:23 -08:00
Nick Terrell
7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
Yann Collet
50bc2ce95e
Merge pull request #1021 from terrelln/lrm-split
Split block compresser out of long range matcher
2018-02-23 17:36:51 -08:00
Yann Collet
653383f74a minor nit from Mac XCode 2018-02-22 15:44:26 -08:00
Nick Terrell
7e2bf4ebad Remove long range matcher immediate repcode check
The compression ratio gets about 0.01% worse on the files I tested, but the
code is much simpler.
2018-02-22 15:18:47 -08:00
Nick Terrell
af866b3a58 Split block compresser out of long range matcher
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
  stores them in a table. It should work with any chunk size, but
  is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
  instead of encoding the literals directly, it passes them to a
  secondary block compressor. The code to handle chunk sizes greater
  than the block size is currently commented out, since it is unused.
  The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
  `targetLength`. Also don't emit repcode matches in the LDM block
  compressor. Enabling the LDM with the optimal parser now actually improves
  the compression ratio.
* The compression ratio is very similar to before. It is very slightly
  different, because the repcode handling is slightly different. If I remove
  immediate repcode checking in both branches the compressed size is exactly
  the same.
* The speed looks to be the same or better than before.

Up Next (in a separate PR)
--------------------------

Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
2018-02-22 15:18:41 -08:00
Yann Collet
9c5a8040a9 fixed huf_compress workspace size 2018-02-21 11:34:49 -08:00
Yann Collet
010ba5f71f
Merge pull request #1017 from terrelln/c-bmi2
[compress] Support BMI2
2018-02-20 15:34:59 -08:00
Nick Terrell
6e128d3534 [BMI2] Add comments to the bmi2 variable in the contexts 2018-02-20 14:12:11 -08:00
Nick Terrell
b58f01537e [compress] Support BMI2 2018-02-14 19:20:32 -08:00
Yann Collet
5cb1144872 fixed --single-thread
was incorrectly set to -T0 (use as many cores as possible) previously
2018-02-13 14:56:35 -08:00
Yann Collet
5f7495371e Merge branch 'dev' into fasterDec 2018-02-10 14:24:44 -08:00
Yann Collet
9945e60ac4 Merge branch 'dev' into flexibleLevel 2018-02-10 11:54:49 -08:00
Yann Collet
c72091556b fixed minor nit as per @terrelln's comments 2018-02-09 09:46:08 -08:00
Yann Collet
95424409ea addBits and baseline into FSE decoding table
note : unfinished
- need new default tables
- need modify long mode
2018-02-09 04:25:15 -08:00
Yann Collet
de68c2ff10 Merged ZSTD_preserveUnsortedMark() into ZSTD_reduceIndex()
as it's faster, due to one memory scan instead of two
(confirmed by microbenchmark).

Note : as ZSTD_reduceIndex() is rarely invoked,
it does not translate into a visible gain.
Consider it an exercise in auto-vectorization and micro-benchmarking.
2018-02-07 14:22:35 -08:00
Yann Collet
0170cf9a7a minor : modified ZSTD_preserveUnsortedMark() to be more vectorization friendly 2018-02-05 11:46:02 -08:00
Yann Collet
5188749e1c ensure compression parameters are updated when only compression level is changed 2018-02-02 16:31:20 -08:00
Yann Collet
4b525af53a zstdmt: applies new parameters on the fly
when invoked from ZSTD_compress_generic()
2018-02-02 15:58:13 -08:00
Yann Collet
90eca318a7 fileio: create dedicated function to generate zstd frames
like other formats
2018-02-02 14:24:56 -08:00
Yann Collet
209df52ba2 Changed nbThreads for nbWorkers
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
2018-02-01 19:29:30 -08:00
Yann Collet
60fa90b6c0 zstdmt: added ability to change compression parameters during compression 2018-02-01 16:13:31 -08:00
Nick Terrell
48acaddff9 Test for incorrect pledgeSrcSize earlier 2018-02-01 12:04:05 -08:00
Yann Collet
727bb7f090
Merge pull request #1008 from terrelln/hlog3
Fix hashLog3 size when copying cdict tables
2018-01-31 12:49:07 -08:00
Nick Terrell
ab3346af07 Fix hashLog3 size when copying cdict tables 2018-01-31 11:12:17 -08:00
Yann Collet
823a28a1f4
Merge pull request #1000 from facebook/progressiveFlush
Progressive flush
2018-01-30 22:49:47 -08:00
Yann Collet
2cb0740b6b zstdmt: changed naming convention
to avoid confusion with blocks.

also:
- jobs are cut into chunks of 512KB now, to reduce nb of mutex calls.
- fix function declaration ZSTD_getBlockSizeMax()
- fix outdated comment
2018-01-30 14:43:36 -08:00
Yann Collet
ba0cd8cf78 fixed minor conversion warning for C++ compilation mode 2018-01-26 18:18:42 -08:00
Yann Collet
caf9e96dc3 job mutex creation is checked 2018-01-26 18:09:25 -08:00
Yann Collet
9c40ae7ff1 zstdmt: there is now one mutex/cond per job 2018-01-26 17:55:08 -08:00
Yann Collet
77e36273de zstdmt: minor code refactor for clarity 2018-01-26 17:08:58 -08:00
Yann Collet
27c5853c42 zstdmt: job table correctly cleaned after synchronous ZSTDMT_compress() 2018-01-26 14:35:54 -08:00
Yann Collet
0d426f6b83 zstdmt : refactor a few member names
for clarity
2018-01-26 13:00:14 -08:00
Yann Collet
79b6e28b0a zstdmt : flush() only lock to read shared job members
Other job members are accessed directly.
This avoids a full job copy, which would access everything,
including a few members that are supposed to be used by worker only,
uselessly requiring additional locks to avoid race conditions.
2018-01-26 12:15:43 -08:00
Yann Collet
d2b62b6fa5 minor : ZSTDMT_writeLastEmptyBlock() is a void function
because it cannot fail
2018-01-26 11:06:34 -08:00
Yann Collet
fca13c6855 zstdmt : fixed memory leak
writeLastEmptyBlock() must release srcBuffer
as mtctx assumes it's done by job worker.

minor : changed 2 job member names (src->srcBuffer, srcStart->prefixStart) for clarity
2018-01-26 10:44:09 -08:00
Yann Collet
8e128eaf05 zstdmt : refactor job members
grouped by sharing properties
2018-01-26 10:20:38 -08:00
Yann Collet
777d3c1559 fixed minor declaration-after-statement warning 2018-01-25 17:45:18 -08:00
Yann Collet
a1d4041e69 zstdmt: removed job->jobCompleted
replaced by equivalent signal job->consumer == job->srcSize.

created additional functions
ZSTD_writeLastEmptyBlock()
and
ZSTDMT_writeLastEmptyBlock()
required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker.

It avoids creating a job with srcSize==0.
2018-01-25 17:35:49 -08:00
Yann Collet
1272d8e760 zstdmt:: renamed mutex and cond to underline they are context-global 2018-01-25 14:52:34 -08:00
Yann Collet
5f349b129c zstdmt : correctly set end of frame 2018-01-23 15:52:40 -08:00
Yann Collet
c1cc57f270 zstdmt : fix end condition (ZSTD_e_end)
When ZSTD_e_end directive is provided,
the question is not only "are internal buffers completely flushed",
it is also "is current frame completed".

In some rare cases,
it was possible for internal buffers to be completely flushed,
triggering a @return == 0,
but frame was not completed as it needed a last null-size block to mark the end,
resulting in an unfinished frame.
2018-01-23 15:19:11 -08:00
Yann Collet
de5e38a7a6 zstdmt: fixed minor race condition
no real consequence, but pollute tsan tests :

job->dstBuff is being modified inside worker,
while main thread might read it accidentally
because it copies whole job.
But since it doesn't used dstBuff, there is no real consequence.

Other potential solution : only copy useful data, instead of whole job
2018-01-23 14:03:07 -08:00
Yann Collet
ebd955e26a zstdmt : fixed ending frame with 0-size block 2018-01-23 13:12:40 -08:00
Yann Collet
6711396d97 zstreamtest : fixed test 32 : multi-thread compression
using ZSTD_compress_generic(,,ZSTD_e_end)
Since it already provides ZSTD_e_end as directive,
it should not be followed by ZSTDMT_endStream().
2018-01-19 22:20:53 -08:00
Yann Collet
a7ef3a219c zstdmt : fixed last job size 2018-01-19 18:19:09 -08:00
Yann Collet
3ad7d4951c zstdmt : finally vanquished an elusive and rare race condition 2018-01-19 17:35:08 -08:00
Yann Collet
940634a610 zstdmt : simplify job creation
job will not be created when not enough room within job Table
2018-01-19 13:25:06 -08:00
Yann Collet
dc69623453 zstdmt: fixed corruption issue in ZSTDMT_endStream()
when invoked directly.
2018-01-19 12:41:56 -08:00
Yann Collet
70f81d6030 zstdmt uses POOL_tryAdd() to call a new worker
so that it's no longer a blocking call.
This makes it possible to stream out data gradually,
while waiting for a worker to become available.
2018-01-19 10:01:40 -08:00
Yann Collet
d19dc1903c
Merge pull request #995 from facebook/progressiveMT
Progressive mt
2018-01-18 17:59:49 -08:00
Yann Collet
6f7280fb33 fixed frame checksum issue
and race conditions
2018-01-18 16:20:26 -08:00
Yann Collet
4f43ef731d Merge branch 'dev' into constCDict 2018-01-18 13:36:43 -08:00
Yann Collet
ef97d5a287 Merge branch 'progressiveMT' into progressiveFlush 2018-01-18 13:35:24 -08:00
Yann Collet
b6ab232f2d Merge branch 'dev' into progressiveMT 2018-01-18 13:34:56 -08:00
Nick Terrell
9d96761520 Set repcodes for empty ZSTD_CDict
When the dictionary is <= 8 bytes, no data is loaded from the dictionary.
In this case the repcodes weren't set, because they were inserted after the
size check. Fix this problem in general by first setting the cdict state to
a clean state of an empty dictionary, then filling the state from there.
2018-01-18 13:28:30 -08:00
Yann Collet
c7190c69cc fixes for @terrelln comments 2018-01-18 11:15:23 -08:00
Yann Collet
1b5d80d633 zstdmt: added ability to flush current job before it's completed
however, zstdmt may still wait on next available worker,
so it's not smooth yet.
2018-01-18 11:03:27 -08:00
Yann Collet
aa79c18e3f fixed a few access contention
passes thread sanitizer test
2018-01-17 17:18:19 -08:00
Yann Collet
394eec697b Introduce ZSTD_getFrameProgression()
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced

Ingested can be larger than consumed due to buffering effect.

For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.

That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.

This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
2018-01-17 16:39:02 -08:00
Yann Collet
f3b8f90b6d changed initStatic?Dict() return type to const ZSTD_?Dict*
ZSTD_create?Dict() is required to produce a ?Dict* return type
because `free()` does not accept a `const type*` argument.
If it wasn't for this restriction, I would have preferred to create a `const ?Dict*` object
to emphasize the fact that, once created, a dictionary never changes
(hence can be shared concurrently until the end of its lifetime).

There is no such limitation with initStatic?Dict() :
as stated in the doc, there is no corresponding free() function,
since `workspace` is provided, hence allocated, externally,
it can only be free() externally.

Which means, ZSTD_initStatic?Dict() can return a `const ZSTD_?Dict*` pointer.

Tested with `make all`, to catch initStatic's users,
which, incidentally, also updated zstd.h documentation.
2018-01-17 14:08:48 -08:00
Yann Collet
b86865323a Merge branch 'dev' into progressiveMT
fixed minor conflict on cdict
2018-01-17 13:51:03 -08:00
Yann Collet
d14cc881b0 zstdmt : fixed very large window sizes
would create too large buffers,
since default job size == window size * 4.

This would crash on 32-bit systems.

Also : jobSize being a 32-bit unsigned, it cannot be >= 4 GB,
so the formula was failing for large window sizes >= 1 GB.
Fixed now : max job Size is 2 GB, whatever the window size.
2018-01-17 12:39:58 -08:00
Yann Collet
58dd7de640 zstdmt: fixed an endless loop on allocation failure
this happened on 32-bits build when requiring a too large input buffer,
typically on wlog=29, creating jobs of 2 GB size.

also : zstd32 now compiles with multithread support enabled by default
(can be disabled with HAVE_THREAD=0)
2018-01-17 12:10:15 -08:00
Nick Terrell
16bd0fd4df Reduce size of ZSTD_CDict
Shaves 492,076 B off of the `ZSTD_CDict`.
The size of a `ZSTD_CDict` created from a 112,640 B dictionary is:

| Level | Before (B) | After (B) |
|-------|------------|-----------|
| 1     | 648,448    | 156,412   |
| 3     | 1,140,008  | 647,932   |
2018-01-17 11:50:49 -08:00
Yann Collet
cb57c107ff zstdmt: minor variable renaming, for clarity 2018-01-17 11:39:07 -08:00
Yann Collet
1dba98d563 introduced parameter ZSTD_p_nonBlockingMode
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.

It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.

Applied within fileio.c, this patch provides non-negligible gains during compression.

Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9

With traditional single-thread blocking mode :
real    0m9.557s
user    0m8.861s
sys     0m0.538s

With new single-worker non blocking mode :
real    0m7.938s
user    0m8.049s
sys     0m0.514s

=> 20% faster
2018-01-16 16:15:47 -08:00
Yann Collet
6025465e42 ZSTDMT : minor CCtx memory optimization
can be useful when a compression job only has small amount of data to compress.
2018-01-16 15:34:41 -08:00
Yann Collet
2e23333094 ZSTDMT can now work in non-blocking mode with 1 thread
it still fallbacks to single-thread blocking invocation
when input is small (<1job)
or when invoking ZSTDMT_compress(), which is blocking.

Also : fixed a bug in new block-granular compression routine.
2018-01-16 15:28:43 -08:00
Yann Collet
8e83c5c910 Merge branch 'dev' into progressiveMT 2018-01-16 12:54:33 -08:00
Nick Terrell
aae267a2e1 Reorganize block state 2018-01-16 11:17:50 -08:00
Nick Terrell
887cd4e35e Split ZSTD_CCtx into smaller sub-structures 2018-01-16 11:17:50 -08:00
Yann Collet
9477f6529d
Merge pull request #984 from terrelln/dict-load
Load more dictionary positions into table if empty
2018-01-13 13:20:42 -08:00
Yann Collet
58ecf13e02 zstdmt : can compress at block granularity
offering perspective of more accurate progression report.
2018-01-13 13:18:57 -08:00
Nick Terrell
9a211d1f05 Load more dictionary positions into table if empty
If the hash table is empty load positions into the hash table
that we would otherwise skip.

| Level | Data Set     | Improvement |
|-------|--------------|-------------|
| 1     | github       | 0.44%       |
| 1     | hg-changelog | 0.13%       |
| 1     | hg-commands  | 1.28%       |
| 1     | hg-manifest  | 0.70%       |
| 3     | github       | 0.74%       |
| 3     | hg-changelog | 0.87%       |
| 3     | hg-commands  | 1.74%       |
| 3     | hg-manifest  | 0.23%       |
2018-01-12 16:17:22 -08:00
Yann Collet
863b2f8db4
Merge pull request #983 from terrelln/dict-wlog
Increase windowLog from CDict based on the srcSize when known
2018-01-12 07:47:43 -08:00
Nick Terrell
b610b777d3 Increase windowLog from CDict based on the srcSize when known 2018-01-11 16:23:21 -08:00
Yann Collet
cacf47cbee Merge branch 'dev' into dubtlazy
and fixed conflicts
2018-01-11 13:25:08 -08:00
Yann Collet
b9a14900ff changed function name to ZSTD_DUBT_findBestMatch() 2018-01-11 12:38:31 -08:00
Yann Collet
e8093dde09 fixed #304
Pathological samples may result in literal section being incompressible.
This case is now detected,
and literal distribution is replaced by one that can be written into the dictionary.
2018-01-11 11:16:32 -08:00
Yann Collet
218e9fe0fc added a test case for dictBuilder failure
cyclic data set makes the entropy stage fails
now, onto a fix for #304 ...
2018-01-11 09:42:38 -08:00
Yann Collet
3ea156368c API doc : grouped ZSTD_initStatic*() together
within "memory management" category.
2018-01-10 08:49:50 -08:00
Yann Collet
b17fb488b0 fixed msan test
a pointer calculation was wrong in a corner case
2018-01-06 20:50:36 +01:00
Yann Collet
a927fae2a1 fixed ZSTD_reduceIndex()
following suggestions from @terrelln.
Also added some comments to present logic behind ZSTD_preserveUnsortedMark().
2018-01-06 12:31:26 +01:00
Yann Collet
00db4dbbb3 fixed minor argument property for Visual 2017-12-30 15:42:28 +01:00
Yann Collet
f597f55675 improved btlazy2 : list of unsorted candidates can reach extDict
It used to stop on reaching extDict, for simplification.
As a consequence, there was a small loss of performance each time the round buffer would restart from beginning.
It's not a large difference though, just several hundreds of bytes on silesia.
This patch fixes it.
2017-12-30 15:12:59 +01:00
Yann Collet
a68b76afef updated compression level table for btlazy2
now selected for levels 13, 14 and 15.

Also : dropped the requirement for monotonic memory budget increase of compression levels,,
which was required for ZSTD_estimateCCtxSize()
in order to ensure that a memory budget for level L is large enough for any level <= L.
This condition is now ensured at run time inside ZSTD_estimateCCtxSize().
2017-12-30 11:40:35 +01:00
Yann Collet
eb52e2f45e simplify ZSTD_preserveUnsortedMark() implementation
since no compiler attempts to auto-vectorize it.
2017-12-30 11:13:52 +01:00
Yann Collet
d228b6b0d0 btlazy2 : optimization for dictionary compression
we want the dictionary table to be fully sorted,
not just lazily filled.
Dictionary loading is a bit more intensive,
but it saves cpu cycles for match search during compression.
2017-12-29 19:14:18 +01:00
Yann Collet
02f64ef955 btlazy2: fixed interaction between unsortedMark and reduceTable 2017-12-29 19:08:51 +01:00
Yann Collet
64482c2c97 fixed bug in dubt
the chain of unsorted candidates could grow beyond lowLimit.
2017-12-29 17:04:37 +01:00
Yann Collet
f36da5b4d9 minor speed optimization : index overflow prevention
new code supposed to be easier to auto-vectorize
2017-12-29 14:40:33 +01:00
Yann Collet
5235d8d6ba first implementation of delayed update for btlazy2
This is a pretty nice speed win.

The new strategy consists in stacking new candidates as if it was a hash chain.
Then, only if there is a need to actually consult the chain, they are batch-updated,
before starting the match search itself.
This is supposed to be beneficial when skipping positions,
which happens a lot when using lazy strategy.

The baseline performance for btlazy2 on my laptop is :
15#calgary.tar       :   3265536 ->    955985 (3.416),  7.06 MB/s , 618.0 MB/s
15#enwik7            :  10000000 ->   3067341 (3.260),  4.65 MB/s , 521.2 MB/s
15#silesia.tar       : 211984896 ->  58095131 (3.649),  6.20 MB/s , 682.4 MB/s
(only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt)

After this patch, and keeping all parameters identical,
speed is increased by a pretty good margin (+30-50%),
but compression ratio suffers a bit :
15#calgary.tar       :   3265536 ->    958060 (3.408),  9.12 MB/s , 621.1 MB/s
15#enwik7            :  10000000 ->   3078318 (3.249),  6.37 MB/s , 525.1 MB/s
15#silesia.tar       : 211984896 ->  58444111 (3.627),  9.89 MB/s , 680.4 MB/s

That's because I kept `1<<searchLog` as a maximum number of candidates to update.
But for a hash chain, this represents the total number of candidates in the chain,
while for the binary, it represents the maximum depth of searches.
Keep in mind that a lot of candidates won't even be visited in the btree,
since they are filtered out by the binary sort.

As a consequence, in the new implementation,
the effective depth of the binary tree is substantially shorter.

To compensate, it's enough to increase `searchLog` value.
Here is the result after adding just +1 to searchLog (level 15 setting in this patch):
15#calgary.tar       :   3265536 ->    956311 (3.415),  8.32 MB/s , 611.4 MB/s
15#enwik7            :  10000000 ->   3067655 (3.260),  5.43 MB/s , 535.5 MB/s
15#silesia.tar       : 211984896 ->  58113144 (3.648),  8.35 MB/s , 679.3 MB/s

aka, almost the same compression ratio as before,
but with a noticeable speed increase (+20-30%).

This modification makes btlazy2 more competitive.
A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
2017-12-28 16:58:57 +01:00
Yann Collet
473362e922
Merge pull request #958 from facebook/continueCCtx
fix a subtle issue in continue mode
2017-12-20 00:12:50 +01:00
Yann Collet
cafedcbbe4 ZSTD_resetCCtx_internal: fixed order of arguments
params1 was swapped with params2.
This used to be a non-issue when testing for strict equality,
but now that some tests look for "sufficient size" `<=`, order matters.
2017-12-19 21:49:04 +01:00
Yann Collet
9096088f45 changed variable name for clarity, suggested by @terrelln 2017-12-19 21:20:46 +01:00
Yann Collet
f299fa39ac fix a subtle issue in continue mode
The deep fuzzer tests caught a subtle bug that was probably there for a long time.
The impact of the bug is not a crash, or any other clear error signal,
rather, it reduces performance, by cutting data into smaller blocks.
Eventually, the following test would fail because it produces too many 1-byte blocks,
requiring more space than buffer can provide :
`./zstreamtest_asan --mt -s3514 -t1678312 -i1678314`

The root scenario is as follows :
- Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1`
- The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`.
  `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`.
  The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes.
- Complete compression
- New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN`
  trigger "continue mode"
- Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only,
  but in fact, it also depends on `pledgedSrcSize`.
- The "old" blocksize (1) is still there,
  next compression will use this value to cut input into blocks,
  resulting in more blocks and worse performance than necessary performance.

Given the scenario, and its possible variants, I'm surprised it did not show up before.
But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger.
The above test is a special corner case, where performance is so impacted that it reaches an error case.

The fix works, but I'm not completely pleased.
I think the current code relies too much on implied relations between variables.
This will likely break again in the future when some related part of the code change.
Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3.
So a longer term fix will have to be considered after the release.

To do : create a reliable test case which triggers this scenario for CI tests.
2017-12-19 09:43:03 +01:00
Yann Collet
5c2f2ebfdb zstdmt via compress_generic: reduce opportunity to free/create mtctx
`zstreamtest --newapi` (and `--opaqueapi`) create and destroy way too many threads
resulting in failure of tsan tests,
and potentially connected to the qemu flaky tests.

This is because, at each test, the nb of threads can be changed (random).

The `--no-big-tests` directive reduce this choice to 1/2 threads,
in order to limit memory usage, especially for qemu and 32-bits builds.
Unfortunately, swapping between 1 and 2 threads is enough to constantly create/destroy new mtctx.

This patch takes advantage of the following property :
via compress_generic, no internal mtctx is needed for nbThreads < 2.
As a consequence, when nbThreads == 2, the currently active mtctx is necessarily good.

This dramatically reduces the nb of thread creations when invoking `zstreamtest --newapi --no-big-tests`
(only when parent cctx itself is created, which is randomized to 1/256 tests).

Expected outcome :
- at a minimum : tsan tests shall now work continuously without exploding the thread counter
- at best : flaky qemu tests on `zstreamtest --newapi --no-big-tests` may stop being flaky, due to less stress from constant thread creation/destruction

Real world impact :
minimal, I don't expect users to constantly change `nbThreads` between each invocation.
If `nbThreads` remains stable, existing implementation re-uses existing mtctx.

Also : `zstreamtest --newapi` but without `--no-big-tests` doesn't benefit as much,
since this test can select a random `nbThreads` value between 1 and 4.
The current patch only reduces opportunity to free/create mtctx (for example : 2->1->2 doesn't need a new mtctx)
but doesn't completely eliminate it, since `nbThreads` can still change between 2/3/4.
A more complete solution could be to only use 2 out of 4 allocated threads, thus keeping the pool at a constant size.
This would require a larger change to `POOL_*` api though.
2017-12-16 12:48:13 -08:00
Yann Collet
3cbfac1cdb updated levels 15-20
taking advantage of `btopt` improved speed to tune parameters.
Levels 16-19 are stronger than previous release, making the graph more favorable.

In theory, I should also update small-size tables,
but I got lazy on that one ...
2017-12-14 23:29:00 -08:00
Yann Collet
8c41a9cb1e
Merge pull request #951 from facebook/lastBlock
saves 3-bytes on small input with streaming API
2017-12-14 15:39:50 -08:00
Yann Collet
a0ac8c895c
Merge pull request #950 from facebook/srcSizeAdaptation
fix adaptation on srcSize
2017-12-14 14:48:31 -08:00
Yann Collet
281f06e01f saves 3-bytes on small input with streaming API
zstd streaming API was adding a null-block at end of frame for small input.

Reason is : on small input, a single block is enough.
ZSTD_CStream would size its input buffer to expect a single block of this size,
automatically triggering a flush on reaching this size.

Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`).
The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame.

The solution is to not flush automatically, which is btw the expected behavior.
It happens in this case because blocksize is defined with exactly the same size as input.
Just adding one-byte is enough to stop triggering the automatic flush.

I initially looked at another solution, solving the problem directly in the compression context.
But it felt awkward.
Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame
on reaching its expected end (`pledgedSrcSize`).
This feels awkward, a responsability over-reach, beyond the definition of this API.
ZSTD_compressContinue() is clearly documented as a guaranteed flush,
with ZSTD_compressEnd() generating a guaranteed end.

I faced similar issue when trying to port a similar mechanism at the higher streaming layer.
Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller,
since it did not explicitly requested an end of frame.
The only sensible action remaining after that is to end the frame with no additional input.
This adds additional logic in the ZSTD_CStream state to check this condition.
Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?)

In the end, just enlarging input buffer by 1 byte feels the least intrusive change.
It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code.

The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.
2017-12-14 11:47:02 -08:00
Yann Collet
c005df136f
Merge pull request #947 from facebook/fix944
Fix #944
2017-12-14 10:01:52 -08:00
Yann Collet
2e97a6d464 fixed minor declaration-after-statement warning 2017-12-13 18:50:05 -08:00
Yann Collet
5432ef6921 fixes adaptation on srcSize
This patch restores capability for each file to receive adapted compression parameters depending on its size.

The bug breaking this feature was relatively silly :
setting a parameter with a value "0" is supposed to be a no-op.
Unfortunately, it would pin down compression parameters as if they were manually set,
preventing later automatic adaptation.

Unfortunately, I'm currently short of a test case that could check this situation and trigger an error.
Compression parameters selection between tableID 0,1,2,3 is largely internal,
leaving no trace to outside world, not even in frame header.
2017-12-13 17:45:26 -08:00
Yann Collet
d23eb9a098 zstreamtest : added missing CHECK_Z() 2017-12-13 15:35:49 -08:00
Nick Terrell
22727a7467 Fix cdict compressor repcodes 2017-12-13 11:31:20 -08:00
Yann Collet
e28305fcca fix #944 : ZSTDMT with large files and dictionary now works correctly
windowLog is now enforced from provided compression parameters,
instead of being copied blindly from `cdict`
where it could be smaller.

also :
- fix a minor bug in zstreamtest --mt : advanced parameters must be set before init
- changed advanced parameter name to ZSTDMT_jobSize
2017-12-12 18:04:58 -08:00
Yann Collet
03832b7aa5 re-added test case
messing with revert ... :(
2017-12-12 14:01:54 -08:00
Yann Collet
8a104fda05 Revert "Created a test case which reliably reproduces bug #944"
This reverts commit 5098d1fbe2.
2017-12-12 12:51:49 -08:00
Yann Collet
5098d1fbe2 Created a test case which reliably reproduces bug #944
in zstreamtest.
2017-12-12 12:48:31 -08:00
Yann Collet
ac8e022806
Merge pull request #943 from facebook/fix942
Fix #942
2017-12-08 13:53:08 -05:00
Yann Collet
dfc697e967 comment clarification 2017-12-08 12:16:49 -05:00
Yann Collet
c029ee1f0b ZSTD_initCStream_srcSize() considers "0" to mean "unknown"
to not break existing programs relying on this behavior.
Might be changed to mean "empty" in the future.
2017-12-07 17:13:10 -05:00
Yann Collet
3aa2b27a89 fix #942 : streaming interface does not compress after ZSTD_initCStream()
While the final result is still, technically, a frame,
the resulting frame expands initial data instead of compressing it.
This is because the streaming API creates a tiny 1-byte buffer for input,
because it believes input is empty (0-bytes),
because in the past, 0 used to mean "unknown" instead.

This patch fixes the issue.
Todo : add a test which traps the issue.
2017-12-07 02:52:50 -05:00
Yann Collet
c173dbd6e7 no longer supported starting C++17 2017-12-04 18:00:53 -08:00
Yann Collet
7e05ef851a Merge branch 'dev' into qemu32panic 2017-12-03 11:14:36 -08:00
Yann Collet
5e1f34b7e4 setParameter : no side-effect on setting a compression parameter
last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow.
It is no a useless operation, so it's removed.
No side-effect left when setting a compression parameter.
2017-12-01 21:17:09 -08:00
Yann Collet
78290874a5 fixed Visual warning on minor interface discrepancy 2017-11-29 17:01:14 -08:00
Yann Collet
d3c59edac9 removed long-range-mode tests from zstreamtest --no-big-tests 2017-11-29 16:42:20 -08:00
Yann Collet
998a93b784 simplified ZSTD_CCtx_setParametersUsingCCtxParams()
Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action.
Any action shall be taken at parameter application only (during init).
It makes it possible to just copy CCtxParams from external container to internal state,
and get rid of the more complex code which was trying to compensate for missing actions.
2017-11-29 16:13:05 -08:00
Yann Collet
f98ee994c4 zstd_opt: added comments, as requested by @terrelln 2017-11-29 15:19:00 -08:00
Yann Collet
bc42bc3b1d removed one invocation of SET_PRICE() macro 2017-11-28 16:08:56 -08:00
Yann Collet
0a0a212934 zstd_opt: changed cost formula
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.

Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.

The "proper" formula makes it possible to remove this heuristic.

Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).

It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
2017-11-28 14:07:03 -08:00
Yann Collet
b71405dc51 removed a bunch of code related to cached literal price
optState was used both to evaluate price
and to cache cost of previously calculated literals.
This created a strong dependency, forcing parser to request cost in a strict order.
This limitation is forbids future parser with skipping capabilities.

After this patch, caching literals price still exists,
but is now explicit, in a stack structure.
2017-11-28 12:32:24 -08:00
Yann Collet
03f30d9dcb separate rawLiterals, fullLiterals and match costs
removed one SET_PRICE() macro invocation
2017-11-28 12:14:46 -08:00
Yann Collet
eee87cd6f2 btopt: minor refactor : removed one SET_PRICE() macro invocation
direct assignment makes operation cleaner.
Also allows some (very minor) optimization (non-measurable)
2017-11-27 17:18:57 -08:00
Yann Collet
e9d1987fd7 btopt: minor speed optimization
matchPrice is always right at beginning
2017-11-27 17:01:51 -08:00
Yann Collet
f8d5c478af fixed comment, reported by @gyscos 2017-11-21 10:36:14 -08:00
Yann Collet
4154aec679 fixed comment, as suggested by @terrelln 2017-11-21 10:26:17 -08:00
Yann Collet
899f2a29f6 strategy ZSTD_btopt pinned to (0) variant (faster one) 2017-11-20 11:53:20 -08:00
Yann Collet
3f457264d1 slightly improved compression speed 2017-11-19 14:40:21 -08:00
Yann Collet
42c1e64270 slightly improved ratio at -22
merging of repcode search into btsearch introduced a small compression ratio regressio at max level :
1.3.2 : 52728769
after repMerge patch : 52760789 (+32020)

A few minor changes have produced this difference.
They can be hard to spot.

This patch buys back about half of the difference,
by no longer inserting position at hc3 when a long match is found there.
It feels strangely counter-intuitive, but works :
after this patch : 52742555 (-18234)
2017-11-19 14:00:55 -08:00
Yann Collet
99435dbbab minor : search early-out on sufficient_len for hc3 and rep
very very small speed and ratio increases
2017-11-19 12:58:04 -08:00
Yann Collet
d100670045 btopt0 : a bit faster and weaker 2017-11-19 10:38:02 -08:00
Yann Collet
e6da37c430 created (hidden) new strategy btopt0
about ~+10% faster but losing ~0.01 compression ratio
(note : amplitude vary a lot depending on files, but direction remains the same)
2017-11-19 10:21:21 -08:00
Yann Collet
e717a5b0dd zstd_opt: minor speed optimization
Calculate reference log2sums only once per serie of sequence
(as opposed to once per sequence)

Also: improved code comments
2017-11-18 16:24:02 -08:00
Yann Collet
a4a20a4b2f fix un-initialized memory warning
harmless, but cleaner
2017-11-17 15:51:52 -08:00
Yann Collet
23767e950a fix one UB pointer arithmetic in encoder
Instead of calculating distance between 2 memory objects, which is UB,
we extract the offset from object 1, and transfer it into object 2.
2017-11-17 13:24:51 -08:00
Yann Collet
11e58d9ba4 fixed minor warning
warning: void function returning a value
(even if the return value is void)
2017-11-16 15:21:30 -08:00
Yann Collet
15768cabb5 fixed some complex scenarios
Fixed : multithreading to compress some small data with dictionary
Fixed : ZSTD_initCStream_usingCDict()
Improved streaming memory usage when pledgedSrcSize is known.
2017-11-16 15:18:18 -08:00
Yann Collet
05dffe43a7 Fixed Btree update
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.

Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.

It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.

Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.

Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.

- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
  Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.
2017-11-16 12:18:56 -08:00
Yann Collet
dfc14579f5 removed wrong assertion 2017-11-15 15:35:56 -08:00
Yann Collet
c55e35b2fc removed a few specialized traces 2017-11-15 15:04:53 -08:00
Yann Collet
61c2d70c86 shortened repcode match finder implementation 2017-11-15 14:37:40 -08:00
Yann Collet
d7e9805028 fixed corruption issue 2017-11-15 13:44:24 -08:00
Yann Collet
046ea53bef still fighting data corruption
due to messed up tree.
Seems to happen when reaching end of buffer.
2017-11-15 11:29:24 -08:00
Yann Collet
4202b2e8a6 merged rep search into btMatchSearch
but there is a tree corruption somewhere ...
bug hunt ongoing
2017-11-14 20:38:52 -08:00
Yann Collet
9a11f70dc3 merged repcode search into BT match search
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)

It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).

warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
2017-11-13 02:23:48 -08:00
Yann Collet
eb47705b18 reduced scope of multiple variables
renamed some variables for better understanding
2017-11-10 08:31:12 -08:00
Yann Collet
100d8ad6be lib/compress: created ZSTD_LLcode() and ZSTD_MLcode()
transform length into code.
Since transformation is needed in several places throughout the code,
better write the logic in one place.
2017-11-08 12:43:05 -08:00
Yann Collet
5aa0352742 zstd_opt: simplified ZSTD_getPrice() and ZSTD_updatePrice() interface
ZSTD_getPrice() and ZSTD_updatePrice() accept normal matchlength as argument
instead of matchlength-MINMATCH,
which makes them easier / more logical to use and read.
Conversion is simply done internally.
2017-11-08 12:23:27 -08:00
Yann Collet
bf730e2044 zstd_opt: refactor code for improved readability
renamed variables to be more meaningful
reduced scope of multiple variables
removed some useless var attribution
2017-11-08 12:07:39 -08:00
Yann Collet
4191efa993 zstd_opt: ensure sufficient_len < ZSTD_OPT_NUM to simplify some tests 2017-11-08 11:24:00 -08:00
Yann Collet
ee441d5d2b renamed zstd_compress.h into zstd_compress_internal.h
to emphasize the fact that all definitions it contains
must remain private, accross lib/compress modules.
2017-11-07 16:15:23 -08:00
Yann Collet
8b6aecf2cb moved a few structures from zstd_internal.h to zstd_compress.h
which is a more precise scope
2017-11-07 16:03:14 -08:00
Yann Collet
150354c5fe minor refactor
added some traces and assert
related to hunting a potential ubsan error in 32-bits more
(it ends up being a compiler-side issue : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82802).

Modified one pointer arithmetic expression for a more conformant way.
2017-11-01 16:57:48 -07:00
Yann Collet
428e8b3bf4 fix : ZSTD_compress_generic(,,,ZSTD_e_end) automatically sets pledgedSrcSize
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.

This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.

It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
2017-11-01 13:15:23 -07:00
Nick Terrell
86b8134cad [libzstd] Fix parameter selection for empty input
ZSTD_compress() and friends would treat an empty input as an unknown size
when selecting parameters. Thus, they would drastically overallocate the
context. Tell ZSTD_getParams() that the source size is 1 when it is empty.
2017-10-25 17:24:15 -07:00
Yann Collet
1ff8a8c109 Merge pull request #891 from facebook/contentSize
Content size
2017-10-17 17:24:51 -07:00
Yann Collet
32c9f715ae fixed : Visual build compressing stdin with multi-threading enabled fails
It was multiple reasons stacked :
- Visual use a different code path, because ZSTD_NEWAPI is not defined
- fileio.c sends `0` as `pledgedSrcSize` to mean `ZSTD_CONTENTSIZE_UNKNOWN`  (fixed)
- ZSTDMT_resetCCtx() interpreted `0` as "empty" instead of "unknown" (fixed)
2017-10-17 14:07:43 -07:00
Yann Collet
13bfe885aa edited ZSTD_initCStream_advanced() comment 2017-10-16 14:06:22 -07:00
Nick Terrell
7f961ba6cd Don't allow default tables to repeat
It isn't useful in any case to repeat default tables.
Saves a few bytes on Silesia, since we don't trigger the dictionary
heuristic.

Before: 211988480 => 73651998 bytes
After:  211988480 => 73651721 bytes
2017-10-16 11:37:56 -07:00
Yann Collet
fc8d293460 dictionary compression use correct file size estimation
when determining compression parameters
to compress one file only.

For multiple files, it still "bets" that files are going to be small.

There was also a bug recently added in ZSTD_CCtx_loadDictionary_advanced()
making it incapable to use pledgedSrcSize to determine compression parameters.
2017-10-14 01:21:43 -07:00
Yann Collet
beb9b4b398 fixed ZSTDMT_initCStream() when contentSizeFlag==1 by default
and a wrong test in zstreamtest --mt
2017-10-13 19:09:30 -07:00
Yann Collet
213ef3b510 fixed ZSTD_initCStream_advanced() behavior, which depends on contentSizeFlag,
and a stream fuzzer test, which was incorrect
(relied on 0 being unconditionnally transformed into `ZSTD_CONTENTSIZE_UNKNOWN`)
2017-10-13 19:01:58 -07:00
Yann Collet
3c1e3f8ec9 contentSizeFlag enabled by default would also fail for streaming and MT operations
fixed
2017-10-13 18:32:06 -07:00
Yann Collet
fb44516641 ensure fParams.contentSizeFlag starts at 1
such default was failing for ZSTD_compressBegin/ZSTD_compressContinue
fixed too
2017-10-13 17:39:13 -07:00
Yann Collet
dd18d73e7e fileio: content size is enabled by default 2017-10-13 16:32:18 -07:00
Nick Terrell
ced6e6189c Add DEBUGLOG() that prints FSE encoding types 2017-10-13 14:55:23 -07:00
Nick Terrell
24ac2dbd2a Fix invalid use of dictionary offcode table
Fixes #888.
2017-10-13 12:47:03 -07:00
Yann Collet
a9e5705077 minor code formatting
added a trace during sequence encoding
2017-10-13 02:36:16 -07:00
Nick Terrell
a86a7097ec Ensure dictionary Huff table can encode any symbol
* Ensure that the dictionary Huffman CTable has maxSymbolValue 255.
* Fix a stack buffer overflow during compression dictionary loading.
2017-10-03 13:22:13 -07:00
Yann Collet
67478f4cb0 fixed minor conversion warnings for printf
in debug mode
2017-10-02 17:28:57 -07:00
Yann Collet
004fd34fd9 Merge pull request #876 from facebook/srcSize
CLI Fix : srcSize written in frame headers when compressing multiple files
2017-10-02 15:02:05 -07:00
Nick Terrell
86e83e926f [libzstd] Set CLEVEL_CUSTOM correctly
In `ZSTD_compressBegin_advanced()`, `ZSTD_parameters` are used to set the
compression parameters, but the level didn't get set to `CLEVEL_CUSTOM`, so
`ZSTD_compressBlock()` used the wrong parameters when checking the source
size.
2017-10-02 13:43:30 -07:00
Yann Collet
6e930c13d1 Merge branch 'dev' into compressBound 2017-10-01 11:24:02 -07:00
Yann Collet
dc404119e5 ZSTD_adjustCParams_internal : minor optimization 2017-09-30 15:02:40 -07:00
Nick Terrell
c5d6dde502 Don't size -= 1 in ZSTD_adjustCParams()
The window size could end up too small if the source size is 2^n + 1.

Credit to OSS-Fuzz
2017-09-30 14:20:06 -07:00
Yann Collet
5b10345b26 added ZSTD_COMPRESSBOUND() as a macro
ZSTD_compressBound() works fine, but is only useful for dynamic allocation.
For static allocation, only a macro can provide the amount during compilation time.
2017-09-29 23:17:41 -07:00
Yann Collet
8afb151c9b cli: fixed wrong initialization in MT mode
It's not good to mix old and new API
ZSTD_resetCStream() doesn't just set pledgedSrcSize :
it also sets the CCtx for a single thread compression.

Problem is, when 2+ threads are defined in cctx->requestedParams,
ZSTD_compress_generic() will want to start MT compression,
since initialization is supposed to have already happened (thanks to ZSTD_resetCStream())
except that the underlying ZSTDMT_CCtx* object is not created,
resulting in a segfault.

This is an invalid construction
(correct one is to use ZSTD_CCtx_setPledgedSrcSize()).
I haven't found a nice way to mitigate this impact if someone makes the same mistake.

At some point, removing the old API to keep only the new API within fileio.c will limit these risks.
2017-09-29 22:14:37 -07:00
Yann Collet
fbd5ab7027 minor fix : no longer use fake srcSize during resource creation
srcSize is read and provided at each file, not at resource creation.
This used to be useful with older API, because it could not re-adapt parameters between sessions.

At some point, it will be better to remove the old code, and only keep the new_api.
It works fine by now.
2017-09-29 19:40:27 -07:00
Yann Collet
db1668a43b fix : srcSize written in frame header when multiple files compressed
This information used to be disabled when nbFiles>1.
It was badly initialized later in the code, resulting in an error.
2017-09-29 18:05:18 -07:00
Yann Collet
7c9669f272 Merge pull request #873 from facebook/shorterTests
Leaner tests
2017-09-29 17:26:46 -07:00
Yann Collet
1416bc0f07 erase existence of a buffer when it's sent out of the pool
In some complex scenario,
the buffer would be freed because it's too large,
another buffer would be allocated, but fail,
trigger an error,
and the general buffer pool would then be freed,
where the definition of the already freed buffer would be found
(beyond total index, but still), and freed again, resulting in double-free error.
2017-09-29 16:27:47 -07:00
Yann Collet
e963800e27 zstdmt : fixed : buffer dst0 wasn't properly set to null after usage
now it's possible to unconditionnally invoke ZSTD_releaseAllJobRessources()
wether previous compression was completed correctly or not.
2017-09-28 23:01:31 -07:00
Yann Collet
754ae5cc0b removed ZSTDMT_waitForAllJobsCompleted() from ZSTDMT_freeCCtx()
as per @terrelln comment
2017-09-28 20:45:31 -07:00
Yann Collet
86b4fe5b45 adjustCParams : restored previous behavior
unknowns srcSize presumed small if there is a dictionary (dictSize>0)
and presumed large otherwise.
2017-09-28 18:14:28 -07:00
Yann Collet
b93598d6a4 zstdmt : reduced maximum nb of threads
to avoid memory address space issues on 32-bits systems
(see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=876416#17)
2017-09-28 13:49:12 -07:00
Yann Collet
e4ec427720 Merge branch 'dev' into shorterTests
fixed conflicts
2017-09-28 12:19:28 -07:00
Yann Collet
8074261d00 zstdmt : move on when not enough memory for a new input buffer
just continue operations without input forward progress,
instead of an error that stops current compression session.
2017-09-28 11:46:19 -07:00
Yann Collet
2cd15dd9a4 fixed minor Visual conversion warning 2017-09-28 02:33:41 -07:00
Yann Collet
377abcc02c zstdmt : better behavior when freeing a context right after a memory allocation error
wait for all jobs to be completed, so that freeing can happen safely
2017-09-28 02:23:44 -07:00
Yann Collet
d6770f80af minor : rewrite unit tests using CHECK_Z macro 2017-09-28 02:14:48 -07:00
Yann Collet
9b5b47ac93 ensure adjustCParams adjust hLog and cLog even without srcSize
It would previously exit when srcSize is unknown.
But in the case of custom parameters,
hLog and cLog can still be too large in comparison with windowLog.

Reduces maximum memory allocated during zstreamtest --newapi
2017-09-28 01:25:40 -07:00
Yann Collet
54a827fff0 Merge branch 'dev' into newFormats
Fixed conflicts in zstdmt_compress.c
2017-09-27 16:39:40 -07:00
Yann Collet
e45a2aea9b Merge pull request #869 from terrelln/dev
[libzstd] pthread function prefixed with ZSTD_
2017-09-27 16:35:08 -07:00
Nick Terrell
b555b7ef41 [libzstd][opt] Simplify repcode logic 2017-09-27 15:30:12 -07:00
Yann Collet
c994932788 fixed ZSTD_format_e value validation 2017-09-27 12:22:22 -07:00
Nick Terrell
6c41adfb28 [libzstd] pthread function prefixed with ZSTD_
* `sed -i 's/pthread_/ZSTD_pthread_/g' lib/{,common,compress,decompress,dictBuilder}/*.[hc]`
* Fix up `lib/common/threading.[hc]`
* `sed -i s/PTHREAD_MUTEX_LOCK/ZSTD_PTHREAD_MUTEX_LOCK/g lib/compress/zstdmt_compress.c`
2017-09-27 11:48:48 -07:00
Yann Collet
ecf1778e23 updated ZSTD_format_e value validation
also updated manual
2017-09-27 11:19:21 -07:00
Yann Collet
4791561c4a silence minor gcc warning -Wempty-body
also silence fuzz test artefacts
2017-09-26 17:57:38 -07:00
Yann Collet
9f0b8dfbe9 Merge branch 'dev' into newFormats 2017-09-26 14:22:39 -07:00
Nick Terrell
c233bdbaee Increase maximum window size
* Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail
  on my Mac.
* Maximum window size in 64-bit mode is 2GB, since that is the largest
  power of 2 that works with the overflow prevention.
* Allow `--long=windowLog` to set the window log, along with
  `--zstd=wlog=#`. These options also set the window size during
  decompression, but don't override `--memory=#` if it is set.
* Present a helpful error message when the window size is too large during
  decompression.
* The long range matcher defaults to a hash log 7 less than the window log,
  which keeps it at 20 for window log 27.
* Keep the default long range matcher window size and the default maximum
  window size at 27 for the API and CLI.
* Add tests that use the maximum window size and hash size for compression
  and decompression.
2017-09-26 14:00:01 -07:00
Yann Collet
586df82a78 Merge pull request #862 from terrelln/static
[zstd] Backport kernel patch from @ColinIanKing
2017-09-25 17:02:40 -07:00
Yann Collet
5d8fdd1641 Merge pull request #855 from terrelln/maxoff
[libzstd] Increase MaxOff
2017-09-25 16:34:29 -07:00
Nick Terrell
76cb38d085 [zstd] Backport kernel patch from @ColinIanKing
* Make the U32 table in `FSE_normalizeCount()` static.
* Patch from https://lkml.kernel.org/r/20170922145946.14316-1-colin.king@canonical.com.
* Clang makes non-static tables static anyways. gcc however, does [weird things](https://godbolt.org/g/fvTcED).
* Benchmarks showed no difference in speed.
2017-09-25 16:18:23 -07:00
Yann Collet
6ee05a02b8 added ZSTD_decompress_generic()
same as ZSTD_decompressStream(),
just for a similar feeling as the compression side, which uses ZSTD_compress_generic()
2017-09-25 15:41:48 -07:00
Yann Collet
62568c9a42 added capability to generate magic-less frames
decoder not implemented yet
2017-09-25 14:26:26 -07:00
Nick Terrell
bbe77212ef [libzstd] Increase MaxOff 2017-09-25 13:36:18 -07:00
Yann Collet
96f0cde31a minor function rename
ZSTD_estimateCStreamSize_advanced_usingCParams -> ZSTD_estimateCStreamSize_usingCParams
_usingX is clear.
_advanced feels redundant
2017-09-24 16:47:02 -07:00
Yann Collet
7c3dea42ce added prototypes for advanced parameters for decompression API
required to decode custom formats
2017-09-24 15:57:29 -07:00
Nick Terrell
d6abb28951 Prepare for ZSTD_WINDOWLOG_MAX == 31 2017-09-21 17:18:41 -07:00
Yann Collet
da74aabc00 Merge pull request #850 from terrelln/fse-optimal
[fse] Fix FSE_optimalTableLog() for srcSize==1
2017-09-19 14:59:21 -07:00
Nick Terrell
6c9ed76676 [ldm] Fix corner case where minMatch < 8
There is a potential read buffer overflow when minMatch < 8.

fix-fuzz-failure
2017-09-19 13:49:37 -07:00
Yann Collet
7d1ff3817b fix ZSTD_sizeof_CCtx() / ZSTD_sizeof_CStream()
previous result was over-estimated
by counting streaming buffers twice
2017-09-18 14:47:34 -07:00
Nick Terrell
cae3e3c652 [fse] Fix FSE_optimalTableLog() for srcSize==1 2017-09-18 14:11:18 -07:00
Yann Collet
539b91ee9b minor : added assert in bt 2017-09-16 23:41:58 -07:00
Yann Collet
335780c427 fixed too strong alignment assert in ZSTD_initStaticCCtx()
64-bits fields are only 32-bits aligned on 32-bits CPU
2017-09-13 16:35:29 -07:00
Yann Collet
f1571dad8f Merge pull request #838 from stellamplau/ldm-mergeDev
Add long distance matcher
2017-09-13 13:24:08 -07:00
Yann Collet
3306bcb0e6 fix #820 : GCC v3.x 32-bits doesn't define 64-bits intrinsic
resulting in undefined symbol error.
Push the requirement to GCC 4 for now.
Another solution, proposed by @NWilson, is to use __LONG_MAX__ instead.
__LONG_MAX__ is a GCC-specific constant, which value is supposed to depend on underlying target hardware (32/64 bits)
Might be better, but seems also more complex, hence more prone to side effects.
Keeping the simple solution for now (just rely on __GNUC__)
2017-09-11 15:17:31 -07:00
Stella Lau
eb3327c10a Merge branch 'dev' of https://github.com/facebook/zstd into ldm-mergeDev 2017-09-11 15:00:01 -07:00
Stella Lau
f902bf9676 Merge branch 'ldm-integrate' into ldm-mergeDev 2017-09-11 14:55:29 -07:00
Yann Collet
f325ee4e84 fixed pass-through warning 2017-09-11 14:37:03 -07:00
Stella Lau
0d1b54db61 Explicitly cast raw numerals when left-shifting 2017-09-11 14:28:18 -07:00
Yann Collet
0d6ecc72a3 makes it possible to compile libzstd in single-thread mode without zstdmt_compress.c (#819) 2017-09-11 14:09:34 -07:00
Yann Collet
3128e03be6 updated license header
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
Stella Lau
360428c5d9 Move ldm functions to their own file 2017-09-06 18:09:26 -07:00
Stella Lau
2b99d696de Remove debug code 2017-09-06 15:57:26 -07:00
Stella Lau
eeff55dfa8 Merge remote-tracking branch 'upstream/dev' into ldm-mergeDev 2017-09-06 15:56:32 -07:00
Yann Collet
ad0046244f Merge pull request #831 from terrelln/split-compress
Split parsers out of zstd_compress.c
2017-09-06 10:01:27 -07:00
Stella Lau
9e4060200b Add tests and fix pointer alignment 2017-09-06 09:14:05 -07:00
Stella Lau
c706de5395 Rename and add short ldm parameters in cli 2017-09-05 21:11:18 -07:00
Stella Lau
98b85426f1 Fix setting of nextToUpdate at end of ldm matcher 2017-09-05 20:41:37 -07:00
Nick Terrell
721726d688 Split parsers out of zstd_compress.c 2017-09-05 17:10:25 -07:00
Stella Lau
08d33fe1c9 Fix parameter handling in copyCCtx with cdict 2017-09-05 15:50:20 -07:00
Stella Lau
fd0071da29 Fix parameter handling with ZSTD_copyCCtx 2017-09-05 15:34:17 -07:00
Stella Lau
67d4a6161c Add ldmBucketSizeLog param 2017-09-02 21:55:29 -07:00
Stella Lau
a1f04d518d Move hashEveryLog to cctxParams and update cli 2017-09-01 15:05:47 -07:00
Stella Lau
767a0b3be1 Move ldm hashLog, bucketLog, and mml to cctxParams 2017-09-01 12:24:59 -07:00
Stella Lau
17d8e0bdcc Merge remote-tracking branch 'upstream/longRangeMatcher' into ldm-integrate 2017-09-01 10:19:38 -07:00
Stella Lau
8081becadc Add long distance matching as a CCtxParam 2017-09-01 09:18:58 -07:00
Yann Collet
369c29dd1a fixed impact of merge conflict for longRange 2017-08-31 18:25:56 -07:00
Yann Collet
d7ad99b2ab Merge branch 'longRangeMatcher' into dev 2017-08-31 18:08:37 -07:00
Stella Lau
6a546efb8c Add long distance matcher
Move last literals section to ZSTD_block_internal
2017-08-31 12:53:19 -07:00
Stella Lau
90a31bfa16 Pass dictMode to ZSTDMT_initCStream; fix nits
- Return error code in estimate{CCtx,CStream}Size functions
2017-08-30 16:19:07 -07:00
Stella Lau
ee65701720 Minor fixes; remove formatting only changes 2017-08-29 20:27:35 -07:00
Stella Lau
a6e20e1bd7 Add test for raw content starting with dict header 2017-08-29 18:36:18 -07:00
Stella Lau
623e3cd40b Use ZSTD_dm_rawContent in zstdmt_compress 2017-08-29 18:04:32 -07:00
Stella Lau
82d636b76a Rename applyCCtxParams() 2017-08-29 18:03:06 -07:00
Stella Lau
4e835720bf Delay creation of ZSTDMT_CCtx 2017-08-29 17:58:32 -07:00
Stella Lau
c7a18b7c21 Localize 'dictMode' from cctx to function param 2017-08-29 15:52:24 -07:00
Stella Lau
c88fb9267f Replace 'byReference' with enum 2017-08-29 11:55:02 -07:00
Stella Lau
b5b9275e67 Rename estimateCCtxSize_advanced() and estimateCStreamSize_advanced() 2017-08-29 10:49:29 -07:00
Stella Lau
0e56a84a1e Fix getting cParams from CCtxParams 2017-08-28 19:25:17 -07:00
Stella Lau
024098a47d Fix parameter retrieval from cdict 2017-08-25 17:58:28 -07:00
Stella Lau
2adde898c8 Fix typo with ZSTDMT_parameter 2017-08-25 16:13:40 -07:00
Stella Lau
18224608ff Remove ZSTD_setCCtxParameter() 2017-08-25 13:58:41 -07:00
Stella Lau
0744592d38 Add function initializing cctxParams from clevel 2017-08-25 13:36:47 -07:00
Stella Lau
9911153723 Move jobSize and overlapLog in zstdmt to cctxParams 2017-08-25 13:14:51 -07:00
Stella Lau
de5193422d Distinguish between jobParams and cctxParams in zstdmt 2017-08-25 11:36:17 -07:00
Stella Lau
eb7bbab36a Remove ZSTD_p_refDictContent and dictContentByRef 2017-08-25 11:11:45 -07:00
Nick Terrell
db3f5372df [zstdmt] Use POOL_create_advanced() 2017-08-24 18:12:28 -07:00
Stella Lau
15fdeb9e41 Enforce nbThreads<=1 for estimateCCtxSize 2017-08-24 16:28:49 -07:00
Stella Lau
2fbf0285b2 Fix interaction with ZSTD_setCCtxParameter() and cleanup 2017-08-24 11:25:41 -07:00
Stella Lau
fd9bf42516 Fix forceWindow and dictMode setting for zstdmt jobs 2017-08-23 19:16:57 -07:00
Stella Lau
bf3108fb50 Ensure zstdmt uses 'job version' of cctx parameters 2017-08-23 17:03:31 -07:00
Stella Lau
1c81f725ff Remove duplicated testing code 2017-08-23 15:47:15 -07:00
Stella Lau
64ce49426b Fix cstream compression level 2017-08-23 12:30:47 -07:00
Stella Lau
5bc2c1e982 Add prototype support for customMem with cctxParams 2017-08-23 12:03:30 -07:00
Yann Collet
e9ce1208a1 Merge pull request #812 from facebook/longRangeFix
fixed extraordinary scenario where all fields use maximum nbBits
2017-08-23 11:35:28 -07:00
Stella Lau
6f1a21c7e9 Remove formatting-only changes 2017-08-23 10:24:19 -07:00
Stella Lau
11303778d0 Add function to make cctxParams from ZSTD_parameters 2017-08-22 14:53:13 -07:00
Stella Lau
23fc0e41fa Remove 'opaque' naming from internal functions 2017-08-22 14:24:47 -07:00
Stella Lau
8fd1636776 Remove unused functions 2017-08-22 13:33:58 -07:00
Yann Collet
6b2b6a9bd5 fixed extraordinary scenario where all fields use maximum possible nb of bits simultaneously
can only happen if windowLog>=27  (level 22 --ultra)
2017-08-22 12:09:21 -07:00
Stella Lau
e50ed1fa3a Fix undefined behavior when srcSize==1 2017-08-22 11:55:42 -07:00
Stella Lau
60e1bc617c Explicitly create a job cctxParam for multithreading 2017-08-21 15:39:37 -07:00
Stella Lau
5b956f4753 Comment out CCtx_param versions of CDict functions 2017-08-21 14:49:16 -07:00
Stella Lau
fd8a25786e Check parameters are valid in initCCtxParams 2017-08-21 13:23:35 -07:00
Stella Lau
1c0dbe81b1 Add documentation for CCtx_params 2017-08-21 13:18:00 -07:00
Stella Lau
939f954285 Pass ZSTD_CCtx_params as const ptr when possible 2017-08-21 12:57:18 -07:00
Stella Lau
560b34f6d2 Return error code when initializing NULL cctxParams 2017-08-21 11:52:26 -07:00
Stella Lau
25be09c6b4 Set some parameters to zero before initializing cdict 2017-08-21 11:35:46 -07:00
Yann Collet
232d62b637 fixed a few headers that were too hastily copy/pasted during last license change 2017-08-21 11:24:32 -07:00
Stella Lau
502031ca10 Use cctxParam version of createCDict internally 2017-08-21 11:00:44 -07:00
Stella Lau
91b30dbe84 Remove test parameter 2017-08-21 10:09:06 -07:00
Stella Lau
f181f33bdf Disable tests and refactor 2017-08-21 01:59:08 -07:00
Stella Lau
023b24e6d4 Add cctx param tests 2017-08-20 22:55:07 -07:00
Yann Collet
7db552676e reduced pool queue to 0 to save memory
fixed : pool performance when jobs are fires fast and queueSize==0
2017-08-19 15:07:54 -07:00
Stella Lau
6cee6e07e5 Add internal createCDict function 2017-08-18 22:48:31 -07:00
Stella Lau
d775519296 Add cctxParam versions of internal functions 2017-08-18 17:37:58 -07:00
Yann Collet
32fb407c9d updated a bunch of headers
for the new license
2017-08-18 16:52:05 -07:00
Stella Lau
63b8c98531 Pass cctx parameters to MTCtx 2017-08-18 16:17:24 -07:00
Stella Lau
399ae013d4 Add function to apply cctx params 2017-08-18 13:01:55 -07:00
Stella Lau
81d89d82a6 Move nbThreads to cctx params 2017-08-18 12:08:57 -07:00
Stella Lau
2300c58a6f Move dictContentByRef to cctx params 2017-08-18 12:03:16 -07:00
Stella Lau
b6cb2ed8cb Move dictMode to cctxParams 2017-08-18 11:43:31 -07:00
Stella Lau
97e27affcb Move compression level to cctx params 2017-08-18 11:20:08 -07:00
Stella Lau
c0221124d5 Add function to set opaque parameters 2017-08-17 19:30:22 -07:00
Stella Lau
4169f49171 Add initialization/allocation functions for opaque params 2017-08-17 18:45:04 -07:00
Stella Lau
ade95b8bed Add opaque interfaces for static initialization 2017-08-17 18:13:08 -07:00
Stella Lau
699f11b4f7 Create opaque parameter structure 2017-08-17 17:33:46 -07:00
Yann Collet
f9e6590715 Merge pull request #796 from terrelln/is-error
[FSE][HUF] Inline error checks
2017-08-15 12:37:28 -07:00
Nick Terrell
07c6ff588e [FSE][HUF] Inline error checks
Caught by Clang's optimization remarks.
2017-08-15 11:23:28 -07:00
Nick Terrell
565e925eb7 [libzstd] Fix FORCE_INLINE macro 2017-08-14 21:12:05 -07:00
Nick Terrell
308047eb5d Fix compression failure on incompressible data
If the destination buffer is the minimum allowed size in
`ZSTD_compressSequences()` (2^17), then if the block isn't compressible
compression might fail with `dstSize_tooSmall`, when it should instead emit
a raw uncompressed block.

Additionally, `ZSTD_compressLiterals()` implicitly called
`ZSTD_noCompressLiterals()` if Huffman compression failed. Make that
explicit.
2017-08-07 11:45:24 -07:00
Yann Collet
e1222544be Merge pull request #753 from paulcruz74/adapt-approach-3
adaptive compression v1
2017-07-27 10:00:10 -07:00
Nick Terrell
ae20d413da [libzstd] Fix CHECK_V_F macros 2017-07-25 12:52:01 -07:00
Paul Cruz
6945b3c43d removed previous version of completion for compression 2017-07-19 11:51:50 -07:00
Yann Collet
b71363b967 check pthread_*_init() success condition 2017-07-19 01:05:40 -07:00
Nick Terrell
cc1522351f [libzstd] Fix bug in Huffman encoding
Summary:
Huffman encoding with a bad dictionary can encode worse than the
HUF_BLOCKBOUND(srcSize), since we don't filter out incompressible
input, and even if we did, the dictionaries Huffman table could be
ill suited to compressing actual data.

The fast optimization doesn't seem to improve compression speed,
even when I hard coded fast = 1, the speed didn't improve over hard coding
it to 0.

Benchmarks:
$ ./zstd.dev -b1e5
Benchmarking levels from 1 to 5
 1#Synthetic 50%     :  10000000 ->   3139163 (3.186), 524.8 MB/s ,1890.0 MB/s
 2#Synthetic 50%     :  10000000 ->   3115138 (3.210), 372.6 MB/s ,1830.2 MB/s
 3#Synthetic 50%     :  10000000 ->   3222672 (3.103), 223.3 MB/s ,1400.2 MB/s
 4#Synthetic 50%     :  10000000 ->   3276678 (3.052), 198.0 MB/s ,1280.1 MB/s
 5#Synthetic 50%     :  10000000 ->   3271570 (3.057), 107.8 MB/s ,1200.0 MB/s
$ ./zstd -b1e5
Benchmarking levels from 1 to 5
 1#Synthetic 50%     :  10000000 ->   3139163 (3.186), 524.8 MB/s ,1870.2 MB/s
 2#Synthetic 50%     :  10000000 ->   3115138 (3.210), 370.0 MB/s ,1810.3 MB/s
 3#Synthetic 50%     :  10000000 ->   3222672 (3.103), 223.3 MB/s ,1380.1 MB/s
 4#Synthetic 50%     :  10000000 ->   3276678 (3.052), 196.1 MB/s ,1270.0 MB/s
 5#Synthetic 50%     :  10000000 ->   3271570 (3.057), 106.8 MB/s ,1180.1 MB/s
$ ./zstd.dev -b1e5 ../silesia.tar
Benchmarking levels from 1 to 5
 1#silesia.tar       : 211988480 ->  73651685 (2.878), 429.7 MB/s ,1096.5 MB/s
 2#silesia.tar       : 211988480 ->  70158785 (3.022), 321.2 MB/s ,1029.1 MB/s
 3#silesia.tar       : 211988480 ->  66993813 (3.164), 243.7 MB/s , 981.4 MB/s
 4#silesia.tar       : 211988480 ->  66306481 (3.197), 226.7 MB/s , 972.4 MB/s
 5#silesia.tar       : 211988480 ->  64757852 (3.274), 150.3 MB/s , 963.6 MB/s
$ ./zstd -b1e5 ../silesia.tar
Benchmarking levels from 1 to 5
 1#silesia.tar       : 211988480 ->  73651685 (2.878), 429.7 MB/s ,1087.1 MB/s
 2#silesia.tar       : 211988480 ->  70158785 (3.022), 318.8 MB/s ,1029.1 MB/s
 3#silesia.tar       : 211988480 ->  66993813 (3.164), 246.5 MB/s , 981.4 MB/s
 4#silesia.tar       : 211988480 ->  66306481 (3.197), 229.2 MB/s , 972.4 MB/s
 5#silesia.tar       : 211988480 ->  64757852 (3.274), 149.3 MB/s , 963.6 MB/s

Test Plan:
I added a test case to the fuzzer which crashed with ASAN before the patch
and succeeded after.
2017-07-18 13:20:40 -07:00
Yann Collet
77d67fb167 Merge pull request #766 from terrelln/real-block-split
[libzstd] Pull optimal parser state out of seqStore_t
2017-07-18 08:26:24 -07:00
Yann Collet
14c83b05c7 Merge pull request #765 from terrelln/real-block-split
[libzstd] Remove ZSTD_CCtx* argument of ZSTD_compressSequences()
2017-07-17 19:25:55 -07:00