townforge/zstd - zstd - Townforge git

Author	SHA1	Message	Date
Elliot Gorokhovsky	db2f4a6532	Move bitwise builtins into bits.h	2022-02-14 11:16:03 -05:00
Yann Collet	aeff128331	change seqDef.offset into seqDef.offBase to better reflect the value stored in this field.	2021-12-23 17:56:08 -08:00
Nick Terrell	0356e05f07	[zdict] Remove ZDICT_CONTENTSIZE_MIN restriction for ZDICT_finalizeDictionary Allow the `dictContentSize` to be any size. The finalized dictionary content size must be at least as large as the maximum repcode (8). So we add zero bytes to the dictionary to ensure that we meet that requirement. I've removed this restriction because its been causing us headaches when people complain that dictionary training failed. It fails because there isn't enough useful content to put in the dictionary. Either because every sample is exactly the same and less than ZDICT_CONTENTSIZE_MIN bytes, or there isn't enough content. Instead, we should succeed in creating the dictionary, and it is up to the user to decide if it is worthwhile. It is possible that the tables alone provide enough value. NOTE: This allows us to produce dictionaries with finalized `dictContentSize < ZDICT_CONTENTSIZE_MIN`. But, they are still valid zstd dictionaries. We could remove the `ZDICT_CONTENTSIZE_MIN` macro, but I've decided to leave that for now, so we don't break users.	2021-11-30 18:02:26 -08:00
Ma Lin	ae986fcdb8	Use __assume(0) for unreachable code path in msvc msvc will optimize away the condition check.	2021-09-27 19:23:57 +08:00
Ma Lin	e5ba858270	Don't initialize the first parameter of _BitScanForward* functions Like the document example, no need to initialize `r` to 0. https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64	2021-09-25 16:36:53 +08:00
Ma Lin	95f492ea17	Don't initialize the first parameter of _BitScanReverse* functions Like the document example, no need to initialize `r` to 0. https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64	2021-09-25 16:36:53 +08:00
Yann Collet	27a8bbe265	new initializer for ll price	2021-09-03 16:07:31 -07:00
Nick Terrell	09149beaf8	[1.5.0] Move `zstd_errors.h` and `zdict.h` to `lib/` root `zstd_errors.h` and `zdict.h` are public headers, so they deserve to be in the root `lib/` directory with `zstd.h`, not mixed in with our private headers.	2021-04-30 15:13:54 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Yann Collet	cefdc023f7	The CLI can be linked to libzstd dynamic library invoking target zstd-dll	2021-01-06 18:00:24 -08:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Gregory Szorc	dd1a7e41ee	Add ifndef guards for _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE This ensures the symbols aren't redefined, which would result in a compiler error. I was getting redefined symbols for _LARGEFILE64_SOURCE when building for 32-bit x86 Linux on an older CentOS release in a CI environment. With this change, I'm able to compile the single file library in this environment. Closes #2443.	2020-12-26 10:02:45 -07:00
animalize	1aec77ea89	use ZSTD_CLEVEL_DEFAULT in zdict.c	2020-12-03 12:46:57 +08:00
Nick Terrell	575731b6db	Use ncount=1 when < 4096 symbols	2020-08-18 16:47:53 -07:00
Carl Woffenden	4a9b7d136f	Initial implementation (files added, macros fixed) Hashing functions still to fix.	2020-06-22 10:31:36 +02:00
Nick Terrell	08981d2638	[lib] Allow compression dictionaries with missing symbols Allow compression to use dictionaries with missing symbols in their entropy tables. We set the FSE repeat mode to check when there are missing symbols, and set the FSE repeat mode to valid when all symbols are present. Note that when not all symbols are present, the heuristics which favor dictionary tables for lower compression levels won't activate. Tested by manually creating a dictionary with missing symbols of every type, and validing that the compressor rejects it before this change, and accepts it after this change. Also, I ran the `dictionary_loader` fuzzer for >1 hour of CPU time without running into cases where compression succeeds, but decompression fails. Fixes #2174.	2020-06-12 17:57:19 -07:00
W. Felix Handte	6028827fee	Rewrite Include Paths to be Relative Addresses #1998.	2020-05-04 15:20:26 -04:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
Sen Huang	c85d10d0ea	Remove mixed declarations	2019-11-08 13:57:26 -05:00
Sen Huang	d9c475f3b3	Fix static analyze error, use proper bounds for dictEnd	2019-11-08 13:57:26 -05:00
Sen Huang	d06b90692b	Move asserts to loadZstdDictionary()	2019-11-08 13:57:26 -05:00
Sen Huang	b39149e156	Expose ZSTD_reset_compressedBlockState() to shared API	2019-11-08 13:57:26 -05:00
Sen Huang	6ce335371b	Add error forwarding to loadCEntropy(), make check for dictSize >= 8 from bad merge	2019-11-08 13:57:26 -05:00
Sen Huang	4a61aaf368	Remove redundant comment	2019-11-08 13:57:26 -05:00
Sen Huang	c787b351ea	Use ZSTD Error codes, improve explanation of ZSTD_loadCEntropy() and ZSTD_loadDEntropy()	2019-11-08 13:57:26 -05:00
Sen Huang	04fb42b4f3	Integrated refactor into getDictHeaderSize, now passes tests	2019-11-08 13:57:26 -05:00
Sen Huang	4b141b63e0	Revert "Move decompress symbols into zstd_internal.h, remove dependency" This reverts commit a152b4c67a5266f611db4a2eac4a79003852a795.	2019-11-08 13:57:26 -05:00
Sen Huang	84404cff6e	Move decompress symbols into zstd_internal.h, remove dependency	2019-11-08 13:57:26 -05:00
Sen Huang	341e0641ed	Checks malloc() for failure, returns 0 if so	2019-11-08 13:57:26 -05:00
Sen Huang	97b7f712f3	Change to heap allocation, remove implicit type conversion	2019-11-08 13:57:25 -05:00
Sen Huang	3c36a7f13a	Add ZDICT_getHeaderSize()	2019-11-08 13:57:08 -05:00
moozzyk	eda7946a36	Take ZSTD_parameters as a const pointer Fixes: #1637	2019-10-22 23:21:54 -07:00
Ed Maste	b81d7cc6a0	remove extraneous doubled ;s	2019-08-15 21:17:06 -04:00
Tyler-Tran	cb47871a0a	[dictBuilder] Be more specific than ERROR(generic) (#1616 ) * Specify errors at a finer granularity than `ERROR(generic)`. * Add tests for bad parameters in the dictionary builder.	2019-05-22 18:57:50 -07:00
Yann Collet	ededcfca57	fix confusion between unsigned <-> U32 as suggested in #1441. generally U32 and unsigned are the same thing, except when they are not ... case : 32-bit compilation for MIPS (uint32_t == unsigned long) A vast majority of transformation consists in transforming U32 into unsigned. In rare cases, it's the other way around (typically for internal code, such as seeds). Among a few issues this patches solves : - some parameters were declared with type `unsigned` in .h, but with type `U32` in their implementation .c . - some parameters have type unsigned*, but the caller user a pointer to U32 instead. These fixes are useful. However, the bulk of changes is about %u formating, which requires unsigned type, but generally receives U32 values instead, often just for brevity (U32 is shorter than unsigned). These changes are generally minor, or even annoying. As a consequence, the amount of code changed is larger than I would expect for such a patch. Testing is also a pain : it requires manually modifying `mem.h`, in order to lie about `U32` and force it to be an `unsigned long` typically. On a 64-bit system, this will break the equivalence unsigned == U32. Unfortunately, it will also break a few static_assert(), controlling structure sizes. So it also requires modifying `debug.h` to make `static_assert()` a noop. And then reverting these changes. So it's inconvenient, and as a consequence, this property is currently not checked during CI tests. Therefore, these problems can emerge again in the future. I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests. It's another restriction for coding, adding more frustration during merge tests, since most platforms don't need this distinction (hence contributor will not see it), and while this can matter in theory, the number of platforms impacted seems minimal. Thoughts ?	2018-12-21 18:09:41 -08:00
Yann Collet	2e7fd6a2cb	fixed remaining searchLength invocations	2018-11-20 15:13:27 -08:00
Nick Terrell	f2d6db45cd	[zstd] Add -Wmissing-prototypes	2018-09-27 15:24:48 -07:00
modbw	d14edf259f	Fixed memory leak detected by cppcheck cppcheck (which is run regularly in our CI environment) detected a possible memory leak.	2018-08-28 07:25:05 +02:00
Jennifer Liu	9d6ed9def3	Merge fastCover into DictBuilder (#1274 ) * Minor fix * Run non-optimize FASTCOVER 5 times in benchmark * Merge fastCover into dictBuilder * Fix mixed declaration issue * Add fastcover to symbol.c * Add fastCover.c and cover.h to build * Change fastCover.c to fastcover.c * Update benchmark to run FASTCOVER in dictBuilder * Undo spliting fastcover_param into cover_param and f * Remove convert param functions * Assign f to parameter * Add zdict.h to Makefile in lib * Add cover.h to BUCK * Cast 1 to U64 before shifting * Remove trimming of zero freq head and tail in selectSegment and rebenchmark * Remove f as a separate parameter of tryParam * Read 8 bytes when d is 6 * Add trimming off zero frequency head and tail * Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed) * Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary * Change nbDmer to always read 8 bytes even when d=6 * Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER * Update comments and benchmarking result * Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover * Add dictType enum and fix bug about passing zParam when converting to coverParam * Combine finalize and skip into a single parameter * Update acceleration parameters and benchmark on 3 sample sets * Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets * Initialize variables outside of for loop in benchmark.c * Update benchmark result for hg-manifest * Remove cover.h from install-includes * Add explanation of f * Set default compression level for trainFromBuffer to 3 * Add assertion of fastCoverParams in DiB_trainFromFiles * Add checkTotalCompressedSize function + some minor fixes * Add test for multithreading fastCovr * Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish * Free segmentFreqs * Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment * Add FASTCOVER_MEMMULT * Minor fix * Update benchmarking result	2018-08-23 12:06:20 -07:00
Yann Collet	6e66bbf5dd	fixed several minor issues detected by scan-build only notable one : writeNCount() resists better vs invalid distributions (though it should never happen within zstd anyway)	2018-08-14 16:55:35 -07:00
Jennifer Liu	f5228f2c44	Refactoring	2018-07-31 13:58:54 -07:00
Jennifer Liu	4e29bc2469	Use CDict instead of CCtx in analyzeEntropy	2018-07-31 10:36:45 -07:00
Yann Collet	fa41bcc2c2	grouped debug functions into debug.h There were 2 competing set of debug functions within zstd_internal.h and bitstream.h. They were mostly duplicate, and required care to avoid messing with each other. There is now a single implementation, shared by both. Significant change : The macro variable ZSTD_DEBUG does no longer exist, it has been replaced by DEBUGLEVEL, which required modifying several source files.	2018-06-13 15:43:09 -04:00
Nick Terrell	569e2abccd	Allow negative compression levels in training * Set `dictCLevel` in `zstdcli.c`. * Only set to default level if the compression level `== 0`, not `<= 0`.	2018-04-09 12:12:03 -07:00
Yann Collet	752bae4a48	added warning message when pathological dataset is detected (note : cover_optimize needs -v to display the warning)	2018-01-11 11:29:28 -08:00
Yann Collet	e8093dde09	fixed #304 Pathological samples may result in literal section being incompressible. This case is now detected, and literal distribution is replaced by one that can be written into the dictionary.	2018-01-11 11:16:32 -08:00
Yann Collet	218e9fe0fc	added a test case for dictBuilder failure cyclic data set makes the entropy stage fails now, onto a fix for #304 ...	2018-01-11 09:42:38 -08:00
Yann Collet	c173dbd6e7	no longer supported starting C++17	2017-12-04 18:00:53 -08:00
Yann Collet	77c137b3ae	minor comment refactor	2017-09-14 15:12:57 -07:00
Yann Collet	3128e03be6	updated license header to clarify dual-license meaning as "or"	2017-09-08 00:09:23 -07:00

1 2 3

129 Commits