townforge/zstd - zstd - Townforge git

Author	SHA1	Message	Date
Yann Collet	6132df8dd3	fix gcc-10 strict aliasing warnings by exposing HUF_CElt declaration.	2020-12-04 16:43:19 -08:00
Yann Collet	68c14bdff2	minor speed improvement to HUF_readCTable() faster by ~+1-2%	2020-12-04 16:33:39 -08:00
Nick Terrell	c238db046f	Merge pull request #2414 from terrelln/mt-progress [lib] Ensure that multithreaded compression always makes some progress	2020-12-04 16:30:08 -08:00
Nick Terrell	4c58cb8383	[lib] Ensure that multithreaded compression always makes some progress	2020-12-03 20:25:14 -08:00
Nick Terrell	6672689e7e	Merge pull request #2406 from terrelln/linux-wrapper-api [linux] Add the linux wrapper API	2020-12-02 16:49:03 -08:00
Nick Terrell	894ae36675	Merge pull request #2390 from animalize/clamp_level Clamp compression level	2020-12-02 14:35:58 -08:00
senhuang42	2cbd038528	Move max nb seq check to per-block	2020-12-02 12:11:32 -05:00
Nick Terrell	3cda5fae77	[minor][lib] Remove double semicolon	2020-12-02 01:08:08 -08:00
senhuang42	3efe9c902b	Add sequence nb validation to compressSequences(), adjust minMatch comparisons	2020-12-01 10:54:45 -05:00
senhuang42	4c5f337248	Use cctx's minMatch instead of global MINMATCH, make fuzzer use validation	2020-11-30 15:41:20 -05:00
sen	c5fbd55dac	Merge pull request #2387 from senhuang42/compress_sequence_API [RFC] New sequence compression API	2020-11-20 16:54:20 -05:00
senhuang42	7742f076b4	Add experimental param for sequence validation	2020-11-20 11:57:41 -05:00
senhuang42	0e32928b7d	Remove unnecessary repcode backup, apply style choices, use function pointer	2020-11-20 11:02:19 -05:00
sen	e924a0fa51	Explicit cast for visual warnings Github has automatic commits now! Cool Co-authored-by: Nick Terrell <nickrterrell@gmail.com>	2020-11-19 17:32:40 -05:00
senhuang42	dcbbf7c09f	Unroll isRLE loop	2020-11-19 12:38:13 -05:00
senhuang42	05c0229668	Clean up visual conversion warnings	2020-11-18 15:36:29 -05:00
senhuang42	d6d7ba2a1f	Modification to offset validation to include entire sequence	2020-11-17 10:13:22 -05:00
senhuang42	8f3136a9c7	Fix assert edge case, improve documentation in zstd.h	2020-11-16 18:05:35 -05:00
senhuang42	f6baad87d6	Fix warnings and make validation enabled by default	2020-11-16 12:00:06 -05:00
senhuang42	55b90ef010	Fix unit tests to agree with new changes	2020-11-16 11:36:37 -05:00
senhuang42	7f563b0519	Add new sequence format as an experimental CCtx param	2020-11-16 10:49:17 -05:00
senhuang42	347824ad73	Overhaul logic to simplify, add in proper validations, fix match splitting	2020-11-16 10:49:17 -05:00
senhuang42	46824cb018	Add new sequence compress api params to cctx	2020-11-16 10:49:17 -05:00
senhuang42	48405b4633	Fix srcSize=0 edge case	2020-11-16 10:49:17 -05:00
senhuang42	022e6d81e7	Fix literals length calculation	2020-11-16 10:49:17 -05:00
senhuang42	dad20b5ccb	Remove dstCapacity error check	2020-11-16 10:49:17 -05:00
senhuang42	b8e16a2057	Remove extraneous function in this API	2020-11-16 10:49:17 -05:00
senhuang42	f29507c4fc	Add check comparing offset to window size	2020-11-16 10:49:17 -05:00
senhuang42	7a6e46a92f	Fix MSAN errors	2020-11-16 10:49:17 -05:00
senhuang42	cc2642bd17	Address edge case with endPosInSequence	2020-11-16 10:49:17 -05:00
senhuang42	fd10007174	Change debug levels to appropriate ones	2020-11-16 10:49:17 -05:00
senhuang42	2db8441245	Add RLE support	2020-11-16 10:49:17 -05:00
senhuang42	dfef298336	Fix various build warnings	2020-11-16 10:49:17 -05:00
senhuang42	2bbdddf24e	Add test case to roundtrip using ZSTD_getSequences() and ZSTD_compressSequences()	2020-11-16 10:49:16 -05:00
senhuang42	5fd69f8173	Add documentation for new api functions	2020-11-16 10:49:16 -05:00
senhuang42	e8b7fdb64b	Refactor for enhanced code clarity	2020-11-16 10:49:16 -05:00
senhuang42	c675fb46f1	Rename internal function compressSequences(), and promote new *_ext() functions to their actual name	2020-11-16 10:49:16 -05:00
senhuang42	013434e1e4	Add another API function to compress with existing CCTX	2020-11-16 10:49:16 -05:00
senhuang42	c44ce29013	More adjustments to improve code clarity	2020-11-16 10:49:16 -05:00
senhuang42	48f67da854	Pull compressStream2() transparent initialization into its own function	2020-11-16 10:49:16 -05:00
senhuang42	c86151f53c	Add initial support for new ZSTD_Sequence mode	2020-11-16 10:49:16 -05:00
senhuang42	e0f26afce9	Add sequence compression format param	2020-11-16 10:49:16 -05:00
senhuang42	f51af9a609	Always ensure sequenceRange updates properly, add more error forwarding	2020-11-16 10:49:16 -05:00
senhuang42	1a449688fd	Various minor logical refactors to improve clarity	2020-11-16 10:49:16 -05:00
senhuang42	e5fe485dcc	Fix cSize calculation for noCompressBlocks	2020-11-16 10:49:16 -05:00
senhuang42	6145ebb400	Rebased, roundtrips silesia.tar	2020-11-16 10:49:16 -05:00
senhuang42	b5b61cc216	Refactor for better debugging info	2020-11-16 10:49:16 -05:00
senhuang42	293fad6b45	Corrections and edge-case fixes to be able to roundtrip dickens	2020-11-16 10:49:16 -05:00
senhuang42	7eb6fa7be4	Multi-block compression scaffolding - works on single-block files	2020-11-16 10:49:16 -05:00
senhuang42	75b01f34b9	Add support for uncompressible blocks	2020-11-16 10:49:16 -05:00
senhuang42	e04da68157	Enable usage of ZSTD_sequenceRange for single-block compression	2020-11-16 10:49:16 -05:00
senhuang42	337fac216d	Add logic to handle ZSTD_sequenceRange	2020-11-16 10:49:16 -05:00
senhuang42	85822ddd53	Add last literals handling like getSequences()	2020-11-16 10:49:16 -05:00
senhuang42	2cff8df1a2	Pull block compression out of main compressSequences() function	2020-11-16 10:49:16 -05:00
senhuang42	cfced9344a	Implement ZSTD_updateSequenceRange	2020-11-16 10:49:16 -05:00
senhuang42	b116e1f211	Modify SequenceRange to have posInSequence	2020-11-16 10:49:16 -05:00
senhuang42	d99b675112	Add function definition for sequenceRange updater	2020-11-16 10:49:16 -05:00
senhuang42	74e95c05cc	Add ZSTD_SequenceRange to count ranges in array of ZSTD_Sequence	2020-11-16 10:49:16 -05:00
senhuang42	89f3848310	Add support for repcodes	2020-11-16 10:49:16 -05:00
senhuang42	3e930fd044	Code cleanup, add debuglog statments	2020-11-16 10:49:16 -05:00
senhuang42	086513b5b9	Implement first pass at compressSequences()	2020-11-16 10:49:16 -05:00
senhuang42	a9327b1e9b	Add initial function prototype for ZSTD_compressSequences_ext (to be renamed later)	2020-11-16 10:33:35 -05:00
animalize	52f8c07a3f	Clamp compression level in ZSTD_getCParams_internal() function	2020-11-14 13:26:08 +08:00
senhuang42	9d936d61d2	Reduce number of memcpy() calls	2020-11-13 19:43:30 -05:00
senhuang42	be4ac6c5bc	Use existing repcode update function to implement updates	2020-11-12 16:51:12 -05:00
senhuang42	674c9b9235	Add in proper block repcode histories	2020-11-12 15:34:37 -05:00
senhuang42	06c7f14066	Let block reps persist	2020-11-12 12:24:44 -05:00
senhuang42	396275068c	Fix incorrect repcode setting	2020-11-12 11:57:01 -05:00
senhuang42	1a8af0de73	Improve unit test	2020-11-12 11:09:09 -05:00
senhuang42	4d4fd2c55f	Overhaul repcode handling logic	2020-11-12 10:59:35 -05:00
sen	f62edf0fe9	Merge pull request #2381 from senhuang42/expand_sequence_extraction_api Add enum to define ZSTD_Sequence type and update sequence extraction API	2020-11-06 13:00:31 -05:00
senhuang42	7d1dea070c	Update unit tests	2020-11-06 11:10:37 -05:00
senhuang42	779df995c6	Implement mergeGeneratedSequences()	2020-11-06 10:55:46 -05:00
senhuang42	51abd58208	Rename getSequences() to generateSequences()	2020-11-06 10:53:22 -05:00
Luke Pitt	eac309c71b	Add ZSTD_getDictID_fromCDict function to experimental section	2020-11-04 11:37:37 +00:00
senhuang42	f782cac3d4	Change block delimiter removing to linear time approach	2020-11-02 17:06:20 -05:00
senhuang42	3434049c1f	Use ZSTD_memmove() instead of memmove()	2020-11-02 11:43:19 -05:00
senhuang42	d4d0346b40	Update name of enum, clarify documentation	2020-11-02 11:38:17 -05:00
senhuang42	e6178f837f	Revert unnecessary seqCollector adjustment	2020-11-02 10:59:20 -05:00
senhuang42	e8501e00b8	Fix incorrect index increment in merge algorithm	2020-11-02 10:58:41 -05:00
senhuang42	a36fdada57	Add algorithm to remove all delimiters	2020-11-02 10:46:52 -05:00
senhuang42	435a3a0428	Update seqCollector definition	2020-11-02 10:19:26 -05:00
senhuang42	3327932609	Update ZSTD_getSequences function signature	2020-11-02 10:17:59 -05:00
Nick Terrell	7205e609a9	Merge pull request #2354 from terrelln/stable-buffer Add ZSTD_c_stable{In,Out}Buffer and optimize when set	2020-10-30 15:06:56 -07:00
sen	c37c714ef1	Merge pull request #2376 from senhuang42/clarify_sequence_extraction_api Refine external ZSTD_Sequence API	2020-10-30 15:47:25 -04:00
Nick Terrell	d4e021fe35	[lib] Avoid allocating the input buffer when ZSTD_c_stableInBuffer is set We don't use it when we have a stable input buffer, so don't allocate it. I had to slightly modify `ZSTD_copyCCtx()` by storing the `ZSTD_buffered_policy_e` in the `ZSTD_CCtx`, since `inBuffSize > 0` is no longer the correct signal for the buffered mode.	2020-10-30 10:55:34 -07:00
Nick Terrell	24f72789e2	[lib] Skip the input window buffer when ZSTD_c_stableInBuffer is set Compress directly from the `ZSTD_inBuffer`. We still allocate the input buffer. A following commit will remove that allocation.	2020-10-30 10:55:34 -07:00
Nick Terrell	6bd6b6f7d3	[cwksp] Return NULL when 0 bytes are requested This ensures that the buffer is never used.	2020-10-30 10:55:34 -07:00
Nick Terrell	fcf81cee5e	[lib] Avoid allocating output buffer when ZSTD_c_stableOutBuffer is set We compress directly to the `ZSTD_outBuffer` so we don't need to allocate it.	2020-10-30 10:55:34 -07:00
Nick Terrell	6d5dc93d4e	[lib] Compress directly into output when ZSTD_c_stableOutBuffer is set When we have a stable output buffer always compress directly into the `ZSTD_outBuffer`. We are allowed to return `dstSizeTooSmall`.	2020-10-30 10:55:34 -07:00
Nick Terrell	987cb4ca6a	[lib] Take the shortcut when ZSTD_c_stableOutBuffer is set When we have a stable output buffer take the single-pass shortcut. It is okay to return `dstSizeTooSmall` if the output buffer isn't big enough, because we know it will never grow.	2020-10-30 10:55:34 -07:00
Nick Terrell	809b2f2071	[lib] Set ZSTD_c_stable{In,Out}Buffer in ZSTD_compress2() Sets these parameters in ZSTD_compress2() then resets them to their orignal values after the compression call. An alternative design could be to add a flush mode `ZSTD_e_singlePass` which implies `ZSTD_c_stable{In,Out}Buffer` but only for a single compression call, by directly setting the applied parameters. I've opted for the smaller change, but this is open for discussion.	2020-10-30 10:55:34 -07:00
Nick Terrell	c74be3f6de	[lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set Adds the validation of the input/output buffers only. They are still unused.	2020-10-30 10:55:34 -07:00
Nick Terrell	e3e0775cc8	[API] Add ZSTD_c_stable{In,Out}Buffer parameters This commit adds the parameters and sets the value in the CCtxParams but it does not do anything with the value.	2020-10-30 10:54:39 -07:00
Nick Terrell	e2581d9572	[lib] Set appliedParams in zstdmt mode Previously only `nbWorkers` was set. Set all parameters, because that is what is expected. This is needed for the `ZSTD_c_stable{In,Out}Buffer` parameters.	2020-10-30 10:54:38 -07:00
senhuang42	536e89c723	Sequence extractor should update CBlockState	2020-10-30 12:13:19 -04:00
senhuang42	32cac2627a	Emit last literals of 0 size as well, to indicate block boundary	2020-10-29 16:41:17 -04:00
senhuang42	69bd5f0654	Correct literalsRead calculation to include longLength	2020-10-29 14:49:37 -04:00
senhuang42	59624f3163	Remove implicit typecast to appease appVeyor windows build	2020-10-28 16:25:09 -04:00
senhuang42	3ed5d053d8	Clarify comments in zstd.h some more	2020-10-28 09:53:09 -04:00
Nick Terrell	599ff58e08	Merge pull request #2339 from terrelln/zstdmt-stability Fix zstdmt stability issues and clean up the zstdmt code	2020-10-27 19:43:13 -07:00
sen	17b700d78a	Merge pull request #2366 from senhuang42/enable_ldm_by_default Enable LDM by default if window size >= 128MB and strategy uses opt parser	2020-10-27 14:59:28 -04:00
Nick Terrell	0953645837	Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue Fix long distance matcher OSS-fuzz issue	2020-10-27 11:13:03 -07:00
senhuang42	3163909d14	Remove unused variable position	2020-10-27 12:58:12 -04:00
senhuang42	dc448563e9	Add test compatibility with last literals in sequences	2020-10-27 12:35:28 -04:00
senhuang42	1d221ecc03	Add support for representing last literals in the extracted seqs	2020-10-27 11:19:48 -04:00
senhuang42	9171f920cd	Improve documentation of seqStore_t	2020-10-27 10:50:22 -04:00
senhuang42	96b0ff7886	Improve documentation regarding various operations in copyBlockSequences	2020-10-27 10:36:06 -04:00
senhuang42	3a11c7eb03	Modify ZSTD_copyBlockSequences to agree with new API	2020-10-27 10:31:40 -04:00
senhuang42	8bdb32aebe	Add a function for LDM enable check	2020-10-20 13:46:02 -04:00
senhuang42	578e889ec1	Move ldm enable to compressStream2()	2020-10-20 13:04:45 -04:00
senhuang42	d28d8a1d72	Include LDM tables size for CCtx size estimation where relevant	2020-10-20 09:21:30 -04:00
senhuang42	b1c7fc5768	Add compatibility for multithreading	2020-10-19 12:07:06 -04:00
senhuang42	590f7f55f0	Add ldm enable condition in ZSTD_resetCCtx_internal	2020-10-19 10:26:17 -04:00
senhuang42	4d01979b62	Expose and call ZSTD_ldm_skipRawSeqStoreBytes()	2020-10-16 20:30:00 -04:00
Yann Collet	a0ec50c2dc	Merge pull request #2355 from senhuang42/change_ldm_mt_config Reduce --long mode MT jobsize at higher levels	2020-10-16 13:35:50 -07:00
senhuang42	f49926edf4	Change cycleLog adjustment to +3 from +4	2020-10-15 09:56:05 -04:00
senhuang42	ee84817fe7	Reset posInSequence when using ZSTD_referenceExternalSequences()	2020-10-14 22:06:08 -04:00
senhuang42	d0550bb18f	Clarify argument names, fix DEBUGLOG() statements	2020-10-14 15:45:43 -04:00
senhuang42	3f99c9b38d	Adjust match backwards count args	2020-10-14 15:23:03 -04:00
senhuang42	bf0d559449	Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments()	2020-10-14 12:58:06 -04:00
senhuang42	467e4383b0	Merge branch 'dev' of github.com:senhuang42/zstd into change_ldm_mt_config	2020-10-14 10:17:50 -04:00
Yann Collet	f5d5cd3b40	Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser Integrate long distance matches into optimal parser	2020-10-13 13:09:07 -07:00
Nick Terrell	7e6f91ed84	[minor] Improve docs and add an assert in response to review	2020-10-12 16:43:17 -07:00
senhuang42	354b5f1c0a	Use cycleLog instead of chainLog to determine LDM jobLog	2020-10-12 16:09:59 -04:00
Nick Terrell	441ce4178f	[zstdmt] Clarify a comment	2020-10-12 12:58:13 -07:00
Nick Terrell	efff5d8b2d	[zstdmt] Fix determinism issue with rsyncable mode The problem occurs in this scenario: 1. We find a synchronization point. 2. We attmept to create the job. 3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`. 4. We call `ZSTDMT_compressStream_generic` again. 5. We forget that we're at a sync point already, and we continue looking for the next sync point. This fix is to detect if we're currently paused at a sync point, and if we are then don't load any more input. Caught by zstreamtest. I modified it to make the bug occur more often (~1/100K -> ~1/200) and verified that it is fixed after. I then ran a few hundred thousand unmodified zstreamtest iterations to verify.	2020-10-12 12:55:17 -07:00
Nick Terrell	ede4f97153	[zstdmt] Fix bug where extra empty blocks are emitted When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty compression job can be created. Additionally, `mtctx->frameEnded` can be set to 1, which could potentially cause problems like unterminated blocks. The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.	2020-10-12 12:55:17 -07:00
Nick Terrell	c51a9e79b9	[zstdmt] Rip out the zstdmt API This commit leaves only the functions used by zstd_compress.c. All other functions have been removed from the API. The ZSTDMT unit tests in fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And the --mt zstreamtest tests have been ripped out.	2020-10-12 12:55:16 -07:00
Nick Terrell	1784c4b4ab	[zstdmt] Remove single-pass shortcut Simplifies the code and removes blocking from zstdmt. At this point we could completely delete `ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because I think we want to do that in the zstd-1.5.0 release, in case anyone is still using the ZSTDMT API, even though it is not installed by default. Fixes #2327.	2020-10-12 12:53:26 -07:00
Nick Terrell	b55ae009ac	[zstdmt] Remove singleBlockingThread mode This is already handled by zstd, so this logic is never used.	2020-10-12 12:53:26 -07:00
Nick Terrell	d5c688e8ae	Fix ZSTD_adjustCParams_internal() to handle dictionary logic Pass in the `ZSTD_cParamMode_e` to select how we define our cparams. Based on the mode we either take the `dictSize` into account or we set it to `0`. See the documentation for `ZSTD_cParamMode_e`. Some of the modes currently share the same behavior. But they have distinct modes because they are drastically different cases. E.g. compression + reprocessing the dictionary and creating a cdict. Additionally, when downsizing the hashLog and chainLog take the (adjusted) dictionary size into account, since the size of the dictionary gets added onto the window size. Adds a simple test to ensure that we aren't downsizing too far.	2020-10-12 12:50:04 -07:00
Nick Terrell	fadaab8c7c	[minor improvement] Pass 0 as the content size in the DDS The DDS structure can't be copied into the working tables like the DMS. So it doesn't need to account for the source size when sizing its parameters, just the dictionary size.	2020-10-12 12:47:21 -07:00
Nick Terrell	48ef15fb47	[minor improvement] Pass dictSize when selecting parameters When selecting parameters in streaming compression with a dictionary use the dictionary size to select the parameters.	2020-10-12 12:47:19 -07:00
Nick Terrell	012818df99	[refactor] Remove ZSTD_resetCStream_internal() This function is only called in one place. It isn't a logical separation of duties, and it was only obsfucating the code now, so inline it.	2020-10-12 12:46:10 -07:00
Nick Terrell	7083f79008	[bug] Fix dictContentType when reprocessing cdict Conditions to trigger: * CDict is loaded as raw content. * CDict starts with the zstd dictionary magic number. * The CDict is reprocessed (not attached or copied). * The new API is used (streaming or `ZSTD_compress2()`). Bug: The dictionary is loaded as a zstd dictionary, not a raw content dictionary, because the dict content type is set to `ZSTD_dct_auto`. Fix: Pass in the dictionary content type from cdict creation to the call to `ZSTD_compress_insertDictionary()`. Test: Added a test case that exposes the bug, and fixed the raw content tests to not modify the `dictBuffer`, which makes all future tests with the `dictBuffer` raw content, which doesn't seem intentional.	2020-10-12 12:46:10 -07:00
senhuang42	d6911b86be	Require LDM matches to be strictly greater in length	2020-10-09 12:56:18 -04:00
Yann Collet	12541931fa	Merge pull request #2328 from marxin/zstd-pool-api Allow external creation of POOLs that can be shared.	2020-10-09 01:00:50 -07:00
Yann Collet	6fdb0cb8d9	Merge pull request #2303 from senhuang42/let_cdict_take_clevel_priority For ZSTD_compressStream2(), let cdict take compression level priority	2020-10-09 00:48:30 -07:00
senhuang42	b9c8033cde	Define kNullRawSeqStore for every file	2020-10-07 19:02:41 -04:00
senhuang42	a6165c1b28	Change matchState_t::ldmSeqStore to pointer	2020-10-07 14:13:57 -04:00
senhuang42	abce708a56	Move posInSequence correction to correct location	2020-10-07 13:56:25 -04:00
senhuang42	0c515590d8	Replace offCode of largest match if ldm's offCode is superior	2020-10-07 13:56:25 -04:00
senhuang42	0fac8e07e1	Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes	2020-10-07 13:56:25 -04:00
senhuang42	a5500cf2af	Refactor separate ldm variables all into one struct	2020-10-07 13:56:25 -04:00
senhuang42	0731b94e7c	Use kNullRawSeqStore constant in zstdmt_compress.c	2020-10-07 13:56:25 -04:00
senhuang42	0325d878f2	Remove bubbling down matches with longer offCode and same matchLen	2020-10-07 13:56:25 -04:00
senhuang42	031b7ec15f	Disable LDM minMatch adjustment when using opt parser	2020-10-07 13:56:25 -04:00
senhuang42	ddf8a3f1b9	Enable inclusion of mid-flight LDMs in opt parser	2020-10-07 13:56:25 -04:00
senhuang42	88f72ed942	Correct incorrect offcode calculation	2020-10-07 13:56:25 -04:00
senhuang42	d8b43a4202	Add explicit conversion of size_t to U32	2020-10-07 13:56:25 -04:00
senhuang42	b8bfc4e63d	Add cSize regression test to fuzzer.c	2020-10-07 13:56:25 -04:00
senhuang42	c87d2e5866	Prefix new static ldm helpers with ZSTD_opt	2020-10-07 13:56:25 -04:00
senhuang42	429dec4f42	Add DEBUGLOG() calls in ldm helpers	2020-10-07 13:56:25 -04:00
senhuang42	10647924f1	Make function descriptions more accurate	2020-10-07 13:56:25 -04:00
senhuang42	1a687b3fcb	Improve documentation of relevant structs	2020-10-07 13:56:25 -04:00
senhuang42	37617e23d7	Correct matchLength calculation and remove unnecessary functions	2020-10-07 13:56:25 -04:00
senhuang42	7dee62c287	Reset ldmSeqStore after initStats_ultra() pass for btultra2	2020-10-07 13:56:25 -04:00
senhuang42	0718aa70df	Refactor existing functions to use posInSequence	2020-10-07 13:56:25 -04:00
senhuang42	7348b40a87	Adjustments to ldm_calculateMatchRange() to calculate bounds correctly	2020-10-07 13:56:25 -04:00
senhuang42	a1ef2db5b2	Add ldm_calculateMatchRange() function	2020-10-07 13:56:25 -04:00
senhuang42	ef823e0299	Remove rawSeqStore.base and add rawSeqStore.posInSequence	2020-10-07 13:56:25 -04:00
senhuang42	4793ae3b84	Prevent duplicate LDMs from being inserted	2020-10-07 13:56:25 -04:00
senhuang42	65f9cfeeec	Add extra bounds check to prevent heap access after free ASAN error	2020-10-07 13:56:25 -04:00
senhuang42	bff5785fd5	Address mixed variables C90 warning	2020-10-07 13:56:25 -04:00
senhuang42	724b94ed18	ldm_getNextMatch fixed return values	2020-10-07 13:56:25 -04:00
senhuang42	ea92fb3a68	Cleanups, add comments and explanations	2020-10-07 13:56:25 -04:00
senhuang42	78da2e1808	Fixed sifting algorithm	2020-10-07 13:56:25 -04:00
senhuang42	6ccd97fc96	Fixed end of match boundary update issues	2020-10-07 13:56:25 -04:00
senhuang42	28394b64f2	Add proper bounds check on adding ldms	2020-10-07 13:56:25 -04:00
senhuang42	a2f2b58d04	Add a function ldm_voidSequences()	2020-10-07 13:56:25 -04:00
senhuang42	9c3c7cd20e	Fix function argument to getNextMatch()	2020-10-07 13:56:25 -04:00
senhuang42	c8b8572b38	Adjustments to no longer segfault on nci	2020-10-07 13:56:25 -04:00
senhuang42	f57c7e6bbf	Add base adjustment correction	2020-10-07 13:56:25 -04:00
senhuang42	5df9b5e05f	Add initial getNextMatch() in opt parser	2020-10-07 13:56:25 -04:00
senhuang42	f8ce7cabc3	Added more debugging	2020-10-07 13:56:25 -04:00
senhuang42	84009a076a	Add re-copying of ldmSeqStore after processing	2020-10-07 13:56:25 -04:00
senhuang42	42395a70c2	Add debug statements, flesh out functions	2020-10-07 13:56:25 -04:00
senhuang42	dd3dd199bb	Get zstd to build with new functions and callsites, fix arguments	2020-10-07 13:56:25 -04:00
senhuang42	766c4a8c28	Implement part of ldm_maybeAddLdm()	2020-10-07 13:56:25 -04:00
senhuang42	84777059d2	Implement ldm_getNextMatch()	2020-10-07 13:56:24 -04:00
senhuang42	28c74bf591	Implement basic splitSequence and skipSequence functions	2020-10-07 13:56:24 -04:00
senhuang42	634ab7830d	Flesh out required args for ldm_handleLdm()	2020-10-07 13:56:24 -04:00
senhuang42	db70761032	Add callsites to appropriate locations in ..opt_generic()	2020-10-07 13:56:24 -04:00
senhuang42	aea61e3c91	Add ldm helper function declarations into opt parser	2020-10-07 13:56:24 -04:00
senhuang42	35d9f488f5	Modify codepath to use opt parser exclusively if the compression level is high enough	2020-10-07 13:56:24 -04:00
senhuang42	e1ae398ad5	Add rawSeqStore to match state	2020-10-07 13:56:24 -04:00
Martin Liska	b684900a4a	Allow external creation of POOLs that can be shared.	2020-10-07 12:44:33 +02:00
Nick Terrell	27c969ed07	Add comments to ZSTD_getLowest{Match,Prefix}Index() Clarify how we handle dictionaries in each case.	2020-10-01 13:21:46 -07:00
Yann Collet	cc88eb7594	Merge pull request #2317 from animalize/msvc_inline Let MSVC force inline ZSTD_hashPtr() function	2020-09-30 08:27:53 -07:00
Nick Terrell	f1cbeec039	[superblock] Reduce stack usage by correctly sizing header buffers	2020-09-24 19:42:04 -07:00
Nick Terrell	6a1e526ea7	[lib] Add ZSTD_COMPRESS_HEAPMODE tuning parameter	2020-09-24 19:42:04 -07:00
Nick Terrell	b841387218	[freestanding] Improve macro resolution to handle #if X	2020-09-24 19:42:04 -07:00
Nick Terrell	caecd8c211	Allow user to override ASAN/MSAN detection Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined. This allows the user to override ASAN/MSAN detection for platforms that don't support it.	2020-09-24 19:42:04 -07:00
Nick Terrell	88fac5d514	Remove call to memset The previous commit fixes the test so it errors on calls to mem*() functions from <string.h>.	2020-09-24 19:42:04 -07:00
Nick Terrell	9261476b7d	[lib] Wrap customMem xor checks in parens for readability This clarifies operator precedence, and quiets cppcheck in the Kernel Test Robot. I think this is a slight bonus to readability, so I am accepting the suggestion.	2020-09-23 23:26:07 -07:00
animalize	2e5d73dd72	Use `MEM_STATIC FORCE_INLINE_ATTR` instead of `FORCE_INLINE_TEMPLATE` It adds `__attribute__((unused))` for __GNUC__, to eliminate `-Werror=unused-function` error.	2020-09-21 13:26:38 +08:00
animalize	0a69a6b1ca	Let MSVC force inline ZSTD_hashPtr() function ZSTD_hashPtr() function was not expanded by MSVC, led to low performance compared to GCC.	2020-09-21 10:38:55 +08:00
Felix Handte	200c960f1d	Merge pull request #2311 from felixhandte/ddss-fix-cparam-derivation Fix Compression Parameter Derivation Bugs Introduced by DDSS Changes	2020-09-18 14:02:14 -04:00
W. Felix Handte	8930c6e551	Use ZSTD_CCtxParams_init() to Init CCtxParams, not memset() Even if the discrepancies are at the moment benign, it's probably better to standardize on using the one true initializer, rather than trying (and failing) to correctly duplicate its behavior.	2020-09-17 12:15:33 -04:00
W. Felix Handte	e8a44326fa	Avoid Redundancy in ZSTD_initCDict_internal() Args; Don't Take CParams + CCtxParams	2020-09-17 12:08:36 -04:00
W. Felix Handte	eee51a664a	Fall Back if Derived CParams are Incompatible with DDSS; Refactor CDict Creation Rewrite ZSTD_createCDict_advanced() as a wrapper around ZSTD_createCDict_advanced2(). Evaluate whether to use DDSS mode after fully resolving cparams. If not, fall back.	2020-09-15 18:01:08 -04:00
W. Felix Handte	bc6521a6f6	Make ZSTD_createCDict_advanced2() cctxParams Arg Const	2020-09-15 14:06:10 -04:00
W. Felix Handte	26a96a5b35	Do More Complete CParams Deduction in Non-DDSS Path of ZSTD_createCDict_advanced2 Call ZSTD_getCParamsFromCCtxParams() instead of ZSTD_getCParams_internal().	2020-09-15 13:57:43 -04:00
W. Felix Handte	a2af804129	Pull CParam Override Logic into Helper	2020-09-15 13:38:05 -04:00
Yann Collet	c91a0855f8	check endDirective in ZSTD_compressStream2() fix #2297 also : - `assert()` `endDirective` in `ZSTD_compressStream_internal()`, for debug mode - add relevant tests	2020-09-14 10:56:08 -07:00
senhuang42	17b56f934e	Coding style cleanup	2020-09-11 11:42:12 -04:00
senhuang42	801513b5e7	Modify params rather than cctx->requestedParams	2020-09-11 11:41:10 -04:00
W. Felix Handte	c5fab8848a	Document searchFuncs Table	2020-09-10 22:10:02 -04:00
W. Felix Handte	85a95840e4	Further Consolidate Dict Mode Checks	2020-09-10 22:10:02 -04:00
W. Felix Handte	0faefbf1b3	Make DDSS Selection Override ForceCopy Directive	2020-09-10 22:10:02 -04:00
W. Felix Handte	efa33861f2	Attempt to Fix MSVC Warnings	2020-09-10 22:10:02 -04:00
W. Felix Handte	ed43832770	Simplify Match Limit Checks Seems like a ~1.25% speedup.	2020-09-10 22:10:02 -04:00
W. Felix Handte	06d240b8a7	Use All Available Space in the Hash Table to Extent Chain Table Reach Rather than restrict our temp chain table to 2 ** chainLog entries, this commit uses all available space to reach further back to gather longer chains to pack into the DDSS chain table.	2020-09-10 22:10:02 -04:00
W. Felix Handte	b2b0641ea0	Rewrite Table Fill to Retain Cache Entries Beyond Chain Window	2020-09-10 22:10:02 -04:00
W. Felix Handte	916238d9dc	Avoid Malloc in Table Fill; Pack Tmp Structure into Hash Table	2020-09-10 22:10:02 -04:00
W. Felix Handte	f42c5bddd9	Truncate Chain at Last Possible Attempt Make the chain table denser?	2020-09-10 22:10:02 -04:00
W. Felix Handte	20a020edbc	Prefetch Chain Table Matches	2020-09-10 22:10:02 -04:00
W. Felix Handte	9b9feb84f2	Lay Out Chain Table Chains Contiguously Rather than interleave all of the chain table entries, tying each entry's position to the corresponding position in the input, this commit changes the layout so that all the entries in a single chain are laid out next to each other. The last entry in the hash table's bucket for this hash is now a packed pointer of position + length of this chain. This cannot be merged as written, since it allocates temporary memory inside ZSTD_dedicatedDictSearch_lazy_loadDictionary().	2020-09-10 22:10:02 -04:00
W. Felix Handte	66509c7bf4	Only Insert Positions Inside the Chain Window	2020-09-10 22:10:02 -04:00
W. Felix Handte	13c5ec3e41	Only Allow Dedicated Dict Search for Dicts Loaded in 1 Chunk The load algorithm requires we do it all in one go.	2020-09-10 22:10:02 -04:00
W. Felix Handte	07793547e6	Fix Bug: Only Use DDSS Insertion on CDict MatchStates Previously, if DDSS was enabled on a CCtx and a dictionary was inserted into the CCtx, the CCtx MatchState would be filled as a DDSS struct, causing segfaults etc. This changes the check to use whether the MatchState is marked as using the DDSS (which is only ever set for CDict MatchStates), rather than looking at the CCtxParams.	2020-09-10 18:51:52 -04:00
W. Felix Handte	d214d8c859	Shorten Dict Mode Conditionals in Order to Improve Readability	2020-09-10 18:51:52 -04:00
W. Felix Handte	f49c1563ff	Force-Inline ZSTD_insertAndFindFirstIndex_internal() Without this, gcc was declining to inline the function in `ZSTD_noDict` mode, resulting in a ~10% slowdown.	2020-09-10 18:51:52 -04:00
W. Felix Handte	cab86b074f	Clean Up Search Function Selection	2020-09-10 18:51:52 -04:00
W. Felix Handte	2ffbde0d95	Fix `-Wshorten-64-to-32` Error	2020-09-10 18:51:52 -04:00
W. Felix Handte	7b5d2f72ea	Adjust Working Context Table Sizes Back Down	2020-09-10 18:51:52 -04:00
W. Felix Handte	d332f57897	Permit Matching Against Lowest Valid Position This comparison was previously faulty: the lowest valid position is itself valid, and we should therefore be allowed to match against it.	2020-09-10 18:51:52 -04:00
W. Felix Handte	a3659fe1ef	Make ZSTD_dedicatedDictSearch_getCParams Wrap ZSTD_getCParams Fixes up bounds-checking, and lets us clean up what is at the moment an unnecessary duplication of the default cparams tables.	2020-09-10 18:51:52 -04:00
W. Felix Handte	7b9a755ac9	Remove Chain Limit on Hash Cache Entries; Slightly Improve Compression Entries in the hashTable chain cache aren't subject to the same aliasing that the circular chain table is subject to. As such, we don't need to stop when we cross the chain limit. We can delve deeper. :)	2020-09-10 18:51:52 -04:00
W. Felix Handte	e8b4011b52	Split Lookups in Hash Cache and Chain Table into Two Loops Sliiiight speedup.	2020-09-10 18:51:52 -04:00
W. Felix Handte	9e83c782f8	Simplify DDS Hash Table Construction No need to walk the chainTable; we can just keep shifting the entries in the hashTable.	2020-09-10 18:51:52 -04:00
W. Felix Handte	5390fee4f7	Rename and Move DD_BLOG Constant to ZSTD_LAZY_DDSS_BUCKET_LOG	2020-09-10 18:51:52 -04:00
W. Felix Handte	5e91ae27eb	Prefetch First Batch of Match Positions; +11% Speed in Level 5 w/ 1 Dict	2020-09-10 18:51:52 -04:00
W. Felix Handte	df386b3d8d	Fix Off-By-One Error in Counting DDS Search Attempts This caused us to double-search the first position and fail to search the last position in the chain, slowing down search and making it less effective.	2020-09-10 18:51:52 -04:00
W. Felix Handte	914bfe7ee4	Init CCtx's Local Dict with CCtxParams	2020-09-10 18:51:52 -04:00
W. Felix Handte	db2aa25252	Decision for Whether to Attach Should be Based on CDict Config, not CCtx	2020-09-10 18:51:52 -04:00
W. Felix Handte	a494111385	Move Prefetch Before Insertion; Speed Up ~6%	2020-09-10 18:51:52 -04:00
W. Felix Handte	eede46a47e	Misc Refactor of DDS Search Code	2020-09-10 18:51:52 -04:00
W. Felix Handte	f1b428fdac	Rename enableDedicatedDictSearch to dedicatedDictSearch in MatchState This makes it clear that not only is the feature allowed here, we're actually using it, as opposed to the CCtxParam field, in which it's enabled, but we may or may not be using it.	2020-09-10 18:51:52 -04:00
W. Felix Handte	41012193ad	Always Init CDict's enableDedicatedDictSearch Field	2020-09-10 18:51:52 -04:00
W. Felix Handte	34b545acb0	Add a ZSTD_dedicatedDictSearch ZSTD_dictMode_e to Allow Const Propagation Speed +1.5%.	2020-09-10 18:51:52 -04:00
W. Felix Handte	beefdb0d3d	Fix ZSTD_c_forceAttachDict Bounds	2020-09-10 18:51:52 -04:00
W. Felix Handte	def62e2d3e	Fix Compilation Warnings	2020-09-10 18:51:52 -04:00
Bimba Shrestha	9c628238d3	creating ZSTD_createCDict_advanced_internal	2020-09-10 18:51:52 -04:00
Bimba Shrestha	0a9787c3e1	changing to int for consistency	2020-09-10 18:51:52 -04:00
Bimba Shrestha	e29bc3a009	using dict mls instead of src mls	2020-09-10 18:51:52 -04:00
Bimba Shrestha	145c2d12f9	add hashtable head prefetching	2020-09-10 18:51:52 -04:00
Bimba Shrestha	5d5507788d	change method name for consistency	2020-09-10 18:51:52 -04:00
Bimba Shrestha	b30f71becf	pass correct cparams	2020-09-10 18:51:52 -04:00
Bimba Shrestha	71fda0362f	making cctxParams a pointer	2020-09-10 18:51:52 -04:00
Bimba Shrestha	628559d0e4	loading dict using new algorithm	2020-09-10 18:51:52 -04:00
Bimba Shrestha	22705f0c93	adding dedicatedDictSearch algorithm	2020-09-10 18:51:52 -04:00
Bimba Shrestha	31e581bf65	adding enableDedicatedDictSearch to matchState_t	2020-09-10 18:51:52 -04:00
Bimba Shrestha	50550a14ad	adding dedicated dict load method to lazy	2020-09-10 18:51:52 -04:00
Bimba Shrestha	75b6360036	adding ZSTD_createCDict_advanced2 to zstd.h	2020-09-10 18:51:52 -04:00
Bimba Shrestha	b7dddbe89b	always attach dict when using dedicatedDictSearch	2020-09-10 18:51:52 -04:00
Bimba Shrestha	e36a373df4	adding dedicatedDictSearch cParams helper methods	2020-09-10 18:51:52 -04:00
Bimba Shrestha	f10d4e313c	adding ZSTD_dedicatedDictSearch_defaultCParameters variable	2020-09-10 18:51:52 -04:00
Bimba Shrestha	c497cb6716	Add ZSTD_c_enableDedicatedDictSearch Param	2020-09-10 18:51:52 -04:00
senhuang42	64bd68e44b	Adjust ZSTD_createCDict_byReference() function, and check for cdict when using compressStream2	2020-09-10 13:42:26 -04:00
Nick Terrell	79ded1b4a9	[lib] Add ZSTD_NO_UNUSED_FUNCTIONS macro to hide unused functions The unused function definitions are hidden behind a `#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check. Initially hiding all functions which are unused and take up more than 2KB of stack space, because these will show up as warnings in the Linux Kernel build system.	2020-09-09 14:35:39 -07:00
Nick Terrell	ac3a136b0a	[lib] Replace 64-bit divisions with ZSTD_div64()	2020-09-09 14:35:39 -07:00
Nick Terrell	a90779397a	[lib] Reduce zstd stack usage by 1KB	2020-09-09 14:35:39 -07:00
Nick Terrell	046aca190f	Fix ZSTD_initCStream_advanced() with no dictionary and static allocation	2020-09-09 14:35:39 -07:00
Nick Terrell	f91ed5c766	[lib] s/current/curr because it collides with Linux Kernel macro	2020-09-09 14:35:39 -07:00
Nick Terrell	5e4efd22d4	Merge pull request #2291 from i-do-cpp/fix-compression-level-default Fix setParameter not falling back to default compression level	2020-09-08 16:42:34 -07:00
Nick Terrell	6da8acd231	Merge pull request #2293 from allanjude/coverity Resolve Coverity 1432392 Unintentional integer overflow	2020-09-03 13:58:45 -07:00
Allan Jude	8665793164	Resolve Coverity 1432392 Unintentional integer overflow Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN) overflow_before_widen: Potentially overflowing expression: cdict->dictContentSize * 6U with type unsigned int (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type U64 (64 bits, unsigned).	2020-09-03 19:31:50 +00:00
i-do-cpp	aec8b27fff	Update zstd_compress.c	2020-08-31 09:34:08 +02:00
i-do-cpp	d514281e73	Fix setParameter not falling back to default compression level on 0 value See documentation for `ZSTD_c_compressionLevel`: `Special: value 0 means default, which is controlled by ZSTD_CLEVEL_DEFAULT`	2020-08-31 09:25:43 +02:00
Nick Terrell	c465f24457	ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free	2020-08-26 12:26:03 -07:00
Nick Terrell	a686d306d2	Rename ZSTD_{malloc,calloc,free} to ZSTD_custom{Malloc,Calloc,Free}	2020-08-26 12:25:08 -07:00
Nick Terrell	80f577baa2	Move standard includes to zstd_deps.h	2020-08-26 12:25:08 -07:00
Nick Terrell	614e446000	Merge pull request #2271 from terrelln/small-blocks Small block optimizations	2020-08-24 18:54:33 -07:00
Nick Terrell	8def0e5fd3	Fix up code after reading through	2020-08-24 12:24:45 -07:00
Nick Terrell	575731b6db	Use ncount=1 when < 4096 symbols	2020-08-18 16:47:53 -07:00
Nick Terrell	ba1fd17a9f	speed up literal header decoding	2020-08-17 12:17:53 -07:00
Nick Terrell	6004c1117f	speed up small blocks	2020-08-16 23:03:38 -07:00
Yann Collet	38e38546a4	Merge pull request #2258 from Niadb/dev Added STATIC_BMI2 for compile time detection of BMI2 on MSVC, when enabled various intrinsics are used	2020-08-04 09:43:59 -07:00
Niadb	216a63dcf7	Add files via upload	2020-07-28 02:52:52 -06:00
Yann Collet	8b9cdd2597	fixed overlapping count & workspace special case	2020-07-26 22:40:21 -07:00
Yann Collet	051232223f	optimized histogram new version easier to vectorize leads to smaller code and faster execution notably at the last recombination stage (basically, fixed cost per block). Assembly inspected with godbolt On my laptop, with `clang` and `-mavx2` : 2K block : 1280 MB/s -> 1550 MB/s 8K block : 1750 MB/s -> 1860 MB/s	2020-07-26 22:24:22 -07:00
Yann Collet	c224367ede	ensure workspace is large enough even when MAX_TABLELOG is reduced	2020-07-16 20:33:50 -07:00
Nick Terrell	1047097dad	[superblock] Add defensive assert and bounds check The bound check condition should always be met because we selected `set_basic` as our encoding type. But that code is very far away, so assert it is true so if it is ever false we can catch it, and add a bounds check. Fixes #2213.	2020-06-22 10:21:38 -07:00
Nick Terrell	08981d2638	[lib] Allow compression dictionaries with missing symbols Allow compression to use dictionaries with missing symbols in their entropy tables. We set the FSE repeat mode to check when there are missing symbols, and set the FSE repeat mode to valid when all symbols are present. Note that when not all symbols are present, the heuristics which favor dictionary tables for lower compression levels won't activate. Tested by manually creating a dictionary with missing symbols of every type, and validing that the compressor rejects it before this change, and accepts it after this change. Also, I ran the `dictionary_loader` fuzzer for >1 hour of CPU time without running into cases where compression succeeds, but decompression fails. Fixes #2174.	2020-06-12 17:57:19 -07:00
Felix Handte	2af4e07326	Merge pull request #2133 from felixhandte/single-size-calculation Consolidate CCtx Size Estimation Code	2020-05-28 13:07:18 -04:00
Nick Terrell	3cc227e90e	[ldm][mt] Fix loadedDictEnd	2020-05-19 15:55:03 -07:00
Yann Collet	fdc56baa42	fix 22294 (#2151 )	2020-05-18 21:05:10 -07:00
Nick Terrell	b2092c6dc4	[ldm] Reset loadedDictEnd when the context is reset	2020-05-18 12:35:44 -07:00
Nick Terrell	add7ed2d4a	[lib] Fix bug in loading LDM dictionary in MT mode Exposed when loading a dictionary < LDM minMatch bytes in MT mode. Test Plan: ``` CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address" ./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297 ``` TODO: Add an explicit test that loads a small dictionary in MT mode	2020-05-14 11:52:28 -07:00
W. Felix Handte	3bb7992350	Fix Size Estimate for LDM Seq Space	2020-05-14 13:50:53 -04:00
Nick Terrell	70c80e19e6	[greedy] Fix performance instability	2020-05-12 17:51:16 -07:00
Nick Terrell	c3e921c639	Merge pull request #2131 from terrelln/raw-dict-fuzzer Fix rare scenario with lazy parser, dictionary, and repcodes	2020-05-12 17:44:31 -07:00
W. Felix Handte	d9a1e37aec	Nit: Fix Size Type for 32-bit	2020-05-12 18:03:31 -04:00
W. Felix Handte	1aa6c7ccce	Assert We Allocated Approximately What We Expected To	2020-05-12 16:55:03 -04:00
W. Felix Handte	27e2482217	Minor Refactor	2020-05-12 16:55:03 -04:00
W. Felix Handte	afc2488973	Handle Non-Static CCtxes in Estimation	2020-05-12 16:54:33 -04:00
W. Felix Handte	7ed996f5a0	Consolidate CCtx Size Estimation Code This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams` into a helper. It then migrates two other callsites to use that helper, a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`, which is more invasive. This attempts to guarantee that the estimates returned to users are always correct.	2020-05-12 16:26:53 -04:00
Nick Terrell	3c1eba4d99	[lib] Fix lazy repcode validity checks	2020-05-12 12:25:06 -07:00
Nick Terrell	4e0515916d	[lib] Fix repcode validation in no dict mode	2020-05-12 11:57:15 -07:00
Nick Terrell	6d687a8816	[lib] Fix dictionary + repcodes + optimal parser	2020-05-12 10:36:53 -07:00
Nick Terrell	4b88bd3ee0	[lib][fuzz] Assert sequences are valid in round trip tests	2020-05-11 20:38:49 -07:00
Nick Terrell	80d3585e31	[lib] Fix lazy parser with dictionary + repcodes	2020-05-11 19:04:30 -07:00
Yann Collet	608f1bfc4c	fixed context downsize with initStatic When context is created using initStatic, no resize is possible. fix : only bump oversizeDuration when !initStatic	2020-05-11 18:16:38 -07:00
W. Felix Handte	c6636afbbb	Fix ZSTD_estimateCCtxSize() Under ASAN `ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the buffers, assuming they'll be zero. However, the actual workspace allocation logic always allocates those buffers, and when running under ASAN, the workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized buffers end up consuming 512 bytes of space, which is accounted for in the actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't in the estimation path, since it ignores the buffers entirely. This commit fixes this.	2020-05-11 18:58:19 -04:00
Yann Collet	54144285fd	small speed improvement for strategy fast gcc 9.3.0 : kennedy : 459 -> 466 silesia : 360 -> 365 enwik8 : 267 -> 269 clang 10.0.0 : kennedy : 436 -> 441 silesia : 364 -> 366 enwik8 : 271 -> 272	2020-05-07 06:15:58 -07:00
Felix Handte	ad8dbae1b7	Merge pull request #2103 from felixhandte/relative-includes Migrate Includes to Relative Paths	2020-05-06 09:42:23 -07:00
Yann Collet	c29fd7cd8b	some more conversion warnings hunting down some static analyzer warnings	2020-05-05 10:16:59 -07:00
Yann Collet	c1b836f4c3	fix minor conversion warnings	2020-05-04 14:43:09 -07:00
W. Felix Handte	6028827fee	Rewrite Include Paths to be Relative Addresses #1998.	2020-05-04 15:20:26 -04:00
Felix Handte	7e9aabd652	Merge pull request #2099 from felixhandte/compile-under-pedantic Compile Under `-pedantic -Werror` and `-std=c90`	2020-05-04 10:07:13 -07:00
Felix Handte	816ed80774	Merge pull request #1984 from MeghnaM/1636-Reduce-stack-usage-of-HUF_sort Reduce stack usage of HUF_sort()	2020-05-04 08:15:31 -07:00
W. Felix Handte	c7da66c9cf	Purge C++-Style Comments (`// ...`), Make Compilation Succeed Under C90	2020-05-04 10:59:15 -04:00
W. Felix Handte	6696933b32	Make All Invocations Start With Literal Format String	2020-05-04 10:59:15 -04:00
W. Felix Handte	5e5f262612	Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations	2020-05-04 10:58:55 -04:00
Nick Terrell	e103d7b4a6	Fix superblock mode (#2100 ) Fixes: Enable RLE blocks for superblock mode Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table. Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table. Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes. Respect disableLiteralsCompression in superblock mode Fix superblock mode to handle a block consisting of only compressed literals Fix a off by 1 error in superblock mode that disabled it whenever there were last literals Fix superblock mode with long literals/matches (> 0xFFFF) Allow superblock mode to repeat Huffman tables Respect ZSTD_minGain(). Tests: Simple check for the condition in #2096. When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much. Remaining limitations: O(targetCBlockSize^2) because we recompute statistics every sequence Unable to split literals of length > targetCBlockSize into multiple sequences Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead. ... Fixes #2096	2020-05-01 16:11:47 -07:00
Meghna Malhotra	0adfc8dfce	Fix broken CI; make changes in response to the comments	2020-05-01 13:45:48 -07:00
Meghna Malhotra	53d76dc20f	Remove magic constant and made other changes addressing the comments	2020-05-01 13:45:48 -07:00
Meghna Malhotra	fe8402b522	WIP: Still getting an error	2020-05-01 13:45:48 -07:00
Meghna Malhotra	a084d959bd	WIP: Increased wksp size, but it's segfaulting	2020-05-01 13:45:48 -07:00
Meghna Malhotra	fdb2780c47	Move rank table into HUF_buildCTable_wksp()	2020-05-01 13:45:48 -07:00
Bimba Shrestha	1875f616ce	passing dictContentType instead of rawContent every time	2020-04-21 22:29:35 -07:00
Bimba Shrestha	5b0a452cac	Adding --long support for --patch-from (#1959 ) * adding long support for patch-from * adding refPrefix to dictionary_decompress * adding refPrefix to dictionary_loader * conversion nit * triggering log mode on chainLog < fileLog and removing old threshold * adding refPrefix to dictionary_round_trip * adding docs * adding enableldm + forceWindow test for dict * separate patch-from logic into FIO_adjustParamsForPatchFromMode * moving memLimit adjustment to outside ifdefs (need for decomp) * removing refPrefix gate on dictionary_round_trip * rebase on top of dev refPrefix change * making sure refPrefx + ldm is < 1% of srcSize * combining notes for patch-from * moving memlimit logic inside fileio.c * adding display for optimal parser and long mode trigger * conversion nit * fuzzer found heap-overflow fix * another conversion nit * moving FIO_adjustMemLimitForPatchFromMode outside ifndef * making params immutable * moving memLimit update before createDictBuffer call * making maxSrcSize unsigned long long * making dictSize and maxSrcSize params unsigned long long * error on files larger than 4gb * extend refPrefix test to include round trip * conversion to size_t * making sure ldm is at least 10x better * removing break * including zstd_compress_internal and removing redundant macros * exposing ZSTD_cycleLog() * using cycleLog instead of chainLog * add some more docs about user optimizations * formatting	2020-04-17 15:58:53 -05:00
Nick Terrell	5fcbc484c8	Merge pull request #2040 from caoyzh/dev-2 Optimize by prefetching on aarch64	2020-04-08 13:14:47 -07:00
Bimba Shrestha	c0d4b2b5a3	Merge pull request #2075 from bimbashrestha/dict_fuzzer_ref [bug] handling case where prefix is NULL or 0 sized in refPrefix_advanced	2020-04-07 17:37:19 -05:00
Bimba Shrestha	1658ae75cd	handling nil case for refprefix	2020-04-07 14:41:53 -07:00
Carl Woffenden	a93fadfcd9	Further replication removed `CHECK_F` is now in `error_private.h`. Minor tidy.	2020-04-07 11:25:16 +02:00
Carl Woffenden	7af7735fa3	Merge remote-tracking branch 'upstream/dev' into single-file-lib	2020-04-07 11:13:02 +02:00
Carl Woffenden	edd9a07322	Code replicated in compression and decompression moved to shared headers `CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.	2020-04-07 11:02:06 +02:00
Bimba Shrestha	0154866749	moving consts to zstd_internal and reusing them	2020-04-03 14:26:15 -07:00
Carl Woffenden	7c420344d2	Single-file decoder script can now (optionally) create an encoder To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.	2020-04-03 19:07:46 +02:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
Nick Terrell	d34204a7b7	Merge pull request #2029 from terrelln/minor-opt [opt] Update repcodes less often	2020-03-23 18:12:32 -07:00
caoyzh	7201980650	Optimize by prefetching on aarch64	2020-03-14 15:25:59 +08:00
Bimba Shrestha	66607d0eac	Merge pull request #2033 from bimbashrestha/icc [opt] Small icc level 1 compression speed gain using #pragma vector	2020-03-10 20:42:19 -05:00
Bimba Shrestha	a89c45bdbd	Typo	2020-03-10 15:19:48 -05:00
Bimba Shrestha	43fc88f443	Adding comment and remvoing ivdep	2020-03-10 14:57:27 -05:00
Bimba Shrestha	dba3abc95a	Missed returns	2020-03-05 12:20:59 -08:00
Bimba Shrestha	a75e5f2ffc	bitscan add undef check	2020-03-05 11:52:15 -08:00
Bimba Shrestha	4c72a1a9c2	adding vector to main loop	2020-03-05 09:55:38 -08:00
Nick Terrell	81fda0419e	[opt] Only update repcodes upon arrival	2020-03-04 17:57:15 -08:00
Nick Terrell	04744e52dc	Merge pull request #2028 from terrelln/minor-opt [opt] Don't recompute initial literals price	2020-03-04 17:40:59 -08:00
Nick Terrell	0f9882deb9	[opt] Don't recompute repcodes while emitting sequences	2020-03-04 17:23:00 -08:00
Nick Terrell	c6caa2d04e	[opt] Delete ZSTD_litLengthContribution	2020-03-04 16:35:26 -08:00
Nick Terrell	610171ed86	[opt] Explain why we don't include literals price	2020-03-04 16:29:19 -08:00
Nick Terrell	5f49578be7	[opt] Don't recompute initial literals price	2020-03-04 16:27:17 -08:00
Nick Terrell	c836992be1	Dont log errors when ZSTD_fseBitCost() returns an error	2020-03-02 11:13:18 -08:00
Bimba Shrestha	80c26117a9	Line-wrapping	2020-02-03 09:38:16 -08:00
Bimba Shrestha	ee8a712af3	Using appliedParams instead of supplied params	2020-01-31 15:49:07 -08:00
Nick Terrell	a11a9271d6	Fix lowLimit underflow in overflow correction	2020-01-17 12:10:18 -08:00
Nick Terrell	036b30b555	Fix super block compression and stream raw blocks in decompression (#1947 ) Super blocks must never violate the zstd block bound of input_size + ZSTD_blockHeaderSize. The individual sub-blocks may, but not the super block. If the superblock violates the block bound we are liable to violate ZSTD_compressBound(), which we must not do. Whenever the super block violates the block bound we instead emit an uncompressed block. This means we increase the latency because of the single uncompressed block. I fix this by enabling streaming an uncompressed block, so the latency of an uncompressed block is 1 byte. This doesn't reduce the latency of the buffer-less API, but I don't think we really care. * I added a test case that verifies that the decompression has 1 byte latency. * I rely on existing zstreamtest / fuzzer / libfuzzer regression tests for correctness. During development I had several correctness bugs, and they easily caught them. * The added assert that the superblock doesn't violate the block bound will help us discover any missed conditions (though I think I got them all). Credit to OSS-Fuzz.	2020-01-10 18:02:11 -08:00
Nick Terrell	d1cc9d2797	[fuzz] Allow zero sized buffers for streaming fuzzers (#1945 ) * Allow zero sized buffers in `stream_decompress`. Ensure that we never have two zero sized buffers in a row so we guarantee forwards progress. * Make case 4 in `stream_round_trip` do a zero sized buffers call followed by a full call to guarantee forwards progress. * Fix `limitCopy()` in legacy decoders. * Fix memcpy in `zstdmt_compress.c`. Catches the bug fixed in PR #1939	2020-01-09 11:38:50 -08:00
Bimba Shrestha	b1f53b1a10	[fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936 ) * Adding fail logging for superblock flow * Dividing by targetCBlockSize instead of blockSize * Adding new const and using more acurate formula for nbBlocks * Only do dstCapacity check if using superblock * Remvoing disabling logic * Updating test to make it catch more extreme case of previou bug * Also updating comment * Only taking compressEnd shortcut on non-superblock	2020-01-03 16:53:51 -08:00
Bimba Shrestha	56415efc76	Constifying, malloc check and naming nit	2019-12-17 17:16:51 -08:00
Bimba Shrestha	5225dcfc0f	Adding bool to check if enough room left for noCompress superblocks	2019-12-13 15:47:28 -08:00
Yann Collet	d73e2fb465	Merge pull request #1891 from bimbashrestha/oss [fuzz] Superblock fuzz issues	2019-12-10 13:17:00 -08:00
Bimba Shrestha	e1913dc87f	Making const, removing unnecessary indent, changing parameter order	2019-12-04 15:51:17 -08:00
Bimba Shrestha	2ec556fec2	Moving init/end functions, moving compressSuperBlock inside body()	2019-12-04 15:23:13 -08:00
Bimba Shrestha	ffb0463041	Refactor	2019-12-04 14:52:27 -08:00
Bimba Shrestha	49c6d49247	[fuzz] msan uninitialized unsigned value (#1908 ) Fixes new fuzz issue Credit to OSS-Fuzz * Initializing unsigned value * Initialilzing to 1 instead of 0 because its more conservative * Unconditionoally setting to check first and then checking zero * Moving bool to before block for c90 * Move check set before block	2019-12-04 10:02:17 -08:00
Bimba Shrestha	1fc9352f81	Using bss var instead of creating new bool	2019-12-02 21:39:06 -08:00
Bimba Shrestha	1f681d8592	Merge branch 'oss' of https://github.com/bimbashrestha/zstd into oss	2019-11-27 10:56:54 -08:00
Bimba Shrestha	a3a3c62b81	[fuzz] Only set HUF_repeat_valid if loaded table has all non-zero weights (#1898 ) Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table. * Only setting loaded dict huf table to valid on non-zero * Adding hasNoZeroWeights test to fse tables * Forbiding nbBits != 0 when weight == 0 * Reverting the last commit * Setting table log to 0 when weight == 0 * Small (invalid) zero weight dict test * Small (valid) zero weight dict test * Initializing repeatMode vars to check before zero check * Removing FSE changes to seperate pr * Reverting accidentally changed file * Negating bool, using unsigned, optimization nit	2019-11-26 12:24:19 -08:00
Bimba Shrestha	d4e17d0776	Negating bool, updating bool on inner branches	2019-11-26 12:17:43 -08:00
Bimba Shrestha	826b555463	Merge branch 'dev' into oss	2019-11-22 17:29:33 -08:00
Bimba Shrestha	10bce1919e	Mixed declration fix	2019-11-21 13:08:27 -08:00
Bimba Shrestha	0451accab1	Checking noCompressBlock explicitly for rep code confirmation	2019-11-21 13:06:26 -08:00
Nick Terrell	659e9f05cf	Fix null pointer addition	2019-11-20 18:36:04 -08:00
Bimba Shrestha	8f0c2d04c8	Going back to original flow but removing else return	2019-11-19 10:03:07 -08:00
Nick Terrell	a839d6852c	Merge pull request #1888 from senhuang42/superblocks_fixed RLE test and re-enable RLE in main compression loop	2019-11-18 16:09:33 -08:00
Bimba Shrestha	80586f5e80	Reversing condition order and forwarding error	2019-11-18 13:53:55 -08:00
Bimba Shrestha	dade64428f	Output regular uncompressed block when compressSequences fails	2019-11-18 08:43:14 -08:00
Bimba Shrestha	2d5d961a60	Typo in comment	2019-11-15 19:00:53 -08:00
Bimba Shrestha	dba767c0bb	Leaving room for checksum	2019-11-15 18:44:51 -08:00
Sen Huang	d9646dcbb5	Fixed main compression logic changes	2019-11-14 19:39:09 -05:00
Yann Collet	4b1ac69f19	Merge pull request #1868 from senhuang42/superblocks_fixed Superblocks rebased for merge	2019-11-14 13:31:34 -08:00
Sen Huang	c26d32c91c	Change superblock #include to be last	2019-11-14 13:12:17 -05:00
Yann Collet	d67742bc5d	Merge pull request #1858 from senhuang42/dictionary_header_size Method to get dictionary header size	2019-11-14 09:44:07 -08:00
Sen Huang	d9c475f3b3	Fix static analyze error, use proper bounds for dictEnd	2019-11-08 13:57:26 -05:00
Sen Huang	d06b90692b	Move asserts to loadZstdDictionary()	2019-11-08 13:57:26 -05:00
Sen Huang	b39149e156	Expose ZSTD_reset_compressedBlockState() to shared API	2019-11-08 13:57:26 -05:00
Sen Huang	6ce335371b	Add error forwarding to loadCEntropy(), make check for dictSize >= 8 from bad merge	2019-11-08 13:57:26 -05:00
Sen Huang	c787b351ea	Use ZSTD Error codes, improve explanation of ZSTD_loadCEntropy() and ZSTD_loadDEntropy()	2019-11-08 13:57:26 -05:00
Sen Huang	04fb42b4f3	Integrated refactor into getDictHeaderSize, now passes tests	2019-11-08 13:57:26 -05:00
Sen Huang	0bcaf6db08	First working pass at refactor of loadZstdDictionary()	2019-11-08 13:57:26 -05:00
Nick Terrell	8c474f9845	Fix parameter selection and adjustment with srcSize == 0	2019-11-07 08:58:43 -08:00
Felix Handte	5688447758	Merge pull request #1873 from felixhandte/make-overlap-log-multithread-only Fix #1861: Restrict overlapLog Parameter When Not Built With Multithreading	2019-11-06 16:56:37 -05:00
Felix Handte	ba4613602f	Merge pull request #1843 from moozzyk/issue-1637 Take ZSTD_parameters as a const pointer	2019-11-06 16:56:14 -05:00
W. Felix Handte	c13f81905a	Fix #1861 : Restrict overlapLog Parameter When Not Built With Multithreading This parameter is unused in single-threaded compression. We should make it behave like the other multithread-only parameters, for which we only accept zero when we are not built with multithreading.	2019-11-06 16:05:02 -05:00
Sen Huang	13bb7500e8	Fix frame argument to compression	2019-11-05 16:15:55 -05:00
Sen Huang	f2932fb5eb	Fix more merge conflicts	2019-11-05 15:54:05 -05:00
Sen Huang	7ce891870c	Fix merge conflicts	2019-11-05 15:51:25 -05:00
Bimba Shrestha	3fb5b106da	Replacing some literals with constants	2019-11-05 10:26:57 -08:00
Nick Terrell	60205fec02	Fix 2 bugs in dictionary loading * Silently skip dictionaries less than 8 bytes, unless using `ZSTD_dct_fullDict`. This changes the compressor, which silently skips dictionaries <= 8 bytes. * Allow repcodes that are equal to the dictionary content size, since it is in bounds.	2019-11-01 16:52:07 -07:00
Sen Huang	b9ede1c8c2	Make sure contentsize is known	2019-10-30 16:03:58 -04:00
Yann Collet	a9a216a846	Merge pull request #1824 from senhuang42/new_path_for_cdict Avoid using CDict params when input is large.	2019-10-23 12:04:40 -07:00
moozzyk	eda7946a36	Take ZSTD_parameters as a const pointer Fixes: #1637	2019-10-22 23:21:54 -07:00
Yann Collet	5d5c895b18	fix initCStream_advanced() for fast strategies Compression ratio of fast strategies (levels 1 & 2) was seriously reduced, due to accidental disabling of Literals compression. Credit to @QrczakMK, which perfectly described the issue, and implementation details, making the fix straightforward. Example : initCStream with level 1 on synthetic sample P50 : Before : 5,273,976 bytes After : 3,154,678 bytes ZSTD_compress (for comparison) : 3,154,550 Fix #1787. To follow : refactor the test which was supposed to catch this issue (and failed)	2019-10-22 15:01:38 -07:00
Sen Huang	c2e1e54f24	((x or y) or z) == (x or y or z), remove brackets	2019-10-21 19:16:50 -04:00

... 6 7 8 9 10 ...

2136 Commits