townforge/zstd - zstd - Townforge git

Author	SHA1	Message	Date
Nick Terrell	ae85676d44	Fix alignment of scratchBuffer in HUF_compressWeights() The scratch buffer must be 4-byte aligned. This causes test failures in 32-bit systems, where the stack isn't aligned. Fixes Issue #2428.	2020-12-17 14:30:27 -08:00
Yann Collet	6132df8dd3	fix gcc-10 strict aliasing warnings by exposing HUF_CElt declaration.	2020-12-04 16:43:19 -08:00
Yann Collet	68c14bdff2	minor speed improvement to HUF_readCTable() faster by ~+1-2%	2020-12-04 16:33:39 -08:00
Nick Terrell	7205e609a9	Merge pull request #2354 from terrelln/stable-buffer Add ZSTD_c_stable{In,Out}Buffer and optimize when set	2020-10-30 15:06:56 -07:00
Nick Terrell	c74be3f6de	[lib] Validate buffers when ZSTD_c_stable{In,Out}Buffer is set Adds the validation of the input/output buffers only. They are still unused.	2020-10-30 10:55:34 -07:00
Nick Terrell	e3e0775cc8	[API] Add ZSTD_c_stable{In,Out}Buffer parameters This commit adds the parameters and sets the value in the CCtxParams but it does not do anything with the value.	2020-10-30 10:54:39 -07:00
senhuang42	3ed5d053d8	Clarify comments in zstd.h some more	2020-10-28 09:53:09 -04:00
senhuang42	9171f920cd	Improve documentation of seqStore_t	2020-10-27 10:50:22 -04:00
Nick Terrell	8c46c1d851	Merge pull request #2356 from bsdimp/neon aarch64: use __ARM_NEON instead of __aarch64__ to control use of neon	2020-10-13 15:42:46 -07:00
Warner Losh	43c0054405	aarch64: use __ARM_NEON instead of __aarch64__ to control use of neon There are compilation environments in aarch64 where NEON isn't available. While these environments could define ZSTD_NO_INTRINSICS, it's more fail-safe to use the more specific symbol to know if NEON extensions are available. __ARM_NEON is the proper symbol, defined in ARM C Language Extensions Release 2.1 (https://developer.arm.com/documentation/ihi0053/d/). Some sources suggest __ARM_NEON__, but that's the obsolete spelling from prior versions of the standard. Signed-off-by: Warner Losh <imp@bsdimp.com>	2020-10-13 12:12:46 -06:00
Like Ma	cc907770bd	Fix building on AIX 5.1	2020-10-09 18:34:00 +08:00
Yann Collet	12541931fa	Merge pull request #2328 from marxin/zstd-pool-api Allow external creation of POOLs that can be shared.	2020-10-09 01:00:50 -07:00
Martin Liska	b684900a4a	Allow external creation of POOLs that can be shared.	2020-10-07 12:44:33 +02:00
Nick Terrell	f1cbeec039	[superblock] Reduce stack usage by correctly sizing header buffers	2020-09-24 19:42:04 -07:00
Nick Terrell	caecd8c211	Allow user to override ASAN/MSAN detection Rename ADDRESS_SANITIZER -> ZSTD_ADDRESS_SANITIZER and same for MEMORY_SANITIZER. Also set it to 0/1 instead of checking for defined. This allows the user to override ASAN/MSAN detection for platforms that don't support it.	2020-09-24 19:42:04 -07:00
Nick Terrell	9ae0483858	Reorganize zstd_deps.h and mem.h + replace mem.h for the kernel	2020-09-24 19:41:59 -07:00
Nick Terrell	260fc75028	Move __has_builtin() fallback define to compiler.h	2020-09-24 15:51:08 -07:00
Nick Terrell	4d63ee57f5	Move ASAN/MSAN support declarations to compiler.h	2020-09-24 15:51:08 -07:00
Nick Terrell	b09ec5c2b9	Remove MEM_STATIC_ASSERT and use DEBUG_STATIC_ASSERT instead	2020-09-24 15:51:04 -07:00
Nick Terrell	dec7fb03ec	[lib] Silence -Wunused-const-variable warnings	2020-09-23 12:59:57 -07:00
Nick Terrell	aab4bf7b0d	[linux-kernel] Add test that checks the ifdef hardwiring	2020-09-09 14:36:19 -07:00
Nick Terrell	79ded1b4a9	[lib] Add ZSTD_NO_UNUSED_FUNCTIONS macro to hide unused functions The unused function definitions are hidden behind a `#ifndef ZSTD_NO_UNUSED_FUNCTIONS` check. Initially hiding all functions which are unused and take up more than 2KB of stack space, because these will show up as warnings in the Linux Kernel build system.	2020-09-09 14:35:39 -07:00
Nick Terrell	ac3a136b0a	[lib] Replace 64-bit divisions with ZSTD_div64()	2020-09-09 14:35:39 -07:00
Nick Terrell	a90779397a	[lib] Reduce zstd stack usage by 1KB	2020-09-09 14:35:39 -07:00
Nick Terrell	e975de289c	Add ZSTD_NO_INTRINSICS macro to avoid explicit intrinsics	2020-09-09 14:35:39 -07:00
Nick Terrell	c465f24457	ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free	2020-08-26 12:26:03 -07:00
Nick Terrell	a686d306d2	Rename ZSTD_{malloc,calloc,free} to ZSTD_custom{Malloc,Calloc,Free}	2020-08-26 12:25:08 -07:00
Nick Terrell	80f577baa2	Move standard includes to zstd_deps.h	2020-08-26 12:25:08 -07:00
Nick Terrell	4193638996	[bug] Fix FSE_readNCount() * Fix bug introduced in PR #2271 * Fix long-standing bug that is impossible to trigger inside of zstd * Add a fuzzer that makes sure the normalized count always round trips correctly	2020-08-25 15:42:41 -07:00
Nick Terrell	614e446000	Merge pull request #2271 from terrelln/small-blocks Small block optimizations	2020-08-24 18:54:33 -07:00
Nick Terrell	6d2f750b37	Document the BMI2 default() functions	2020-08-24 14:44:33 -07:00
Nick Terrell	cebe0b5c0b	Improve FSE_normalizeCount() docs	2020-08-24 13:58:34 -07:00
Nick Terrell	8def0e5fd3	Fix up code after reading through	2020-08-24 12:24:45 -07:00
Nick Terrell	8f8bd2d1ac	[regression] Update results.csv	2020-08-20 12:41:35 -07:00
Nick Terrell	575731b6db	Use ncount=1 when < 4096 symbols	2020-08-18 16:47:53 -07:00
Nick Terrell	612e947c5e	wire up bmi2 support	2020-08-17 16:35:28 -07:00
Nick Terrell	ba1fd17a9f	speed up literal header decoding	2020-08-17 12:17:53 -07:00
Nick Terrell	6004c1117f	speed up small blocks	2020-08-16 23:03:38 -07:00
Nick Terrell	e3bda594ae	Prefer __builtin_prefetch over inline asm Reorder the ifdefs for the PREFETCH macros so that the compiler builtin is favored over the inline assembly for aarch64.	2020-08-10 22:17:18 -07:00
Yann Collet	38e38546a4	Merge pull request #2258 from Niadb/dev Added STATIC_BMI2 for compile time detection of BMI2 on MSVC, when enabled various intrinsics are used	2020-08-04 09:43:59 -07:00
helloguo	acb3dd9a68	Use ZSTD_copy16 instead of memcpy	2020-07-28 11:58:46 -07:00
Niadb	a8ebc14035	Update bitstream.h Profiler showed some of these not being inlined on MSVC	2020-07-28 11:17:04 -06:00
Niadb	493fd40dca	Add files via upload	2020-07-28 02:52:15 -06:00
helloguo	82b0cd844f	Optimize ZSTD_wildcopy	2020-07-27 22:08:52 -07:00
helloguo	6de87b3a74	fix preprocessor in ZSTD_wildcopy	2020-07-24 10:53:58 -07:00
Yann Collet	21c273da84	import some minor fixes from FSE project	2020-07-16 20:25:15 -07:00
Yann Collet	a44671b281	Revert "Fix -Wunused-variable under FUZZING_BUILD_MODE..."	2020-07-15 12:42:18 -07:00
Mitch Phillips	23b55d6b3e	Fix -Wunused-variable under FUZZING_BUILD_MODE... Fuzzing build modes (FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) doesn't necessarily imply that assert() is enabled, according to the manual. When the current do-nothing is expanded under -Wunused-variable (-Wall), it results in unused variables in some of the FUZZING_BUILD_MODE... blocks. This patch extends the do-nothing to avoid the unused variable.	2020-07-14 09:03:02 -07:00
yoshihitoh	c6548eac8e	Rename static vars to avoid redefinition error.	2020-06-29 10:51:50 +09:00
Niadb	74f65f624c	Update compiler.h clean wording	2020-06-19 09:51:00 -06:00
Niadb	8c115cbe23	Update compiler.h Added a comment explaining the purpose of the WIN_CDECL macro	2020-06-19 09:48:35 -06:00
Niadb	2962fda93f	Add files via upload	2020-06-19 03:34:05 -06:00
Niadb	a4c8aa5e02	Add files via upload	2020-06-19 03:31:47 -06:00
Yann Collet	20bd246045	blindfix for VS macro redefinition	2020-05-11 19:29:36 -07:00
caoyzh	969ba4f2b9	Change the modification of ZSTD_wildcopy()	2020-05-07 13:10:46 -07:00
caoyzh	9e802ede9c	Modify indent of comments	2020-05-07 13:10:46 -07:00
caoyzh	7f75f05e84	Change "arm_neon.h" to system include <arm_neon.h>	2020-05-07 13:10:46 -07:00
caoyzh	b2e56f7f7f	Optimize compression by using neon function.	2020-05-07 13:10:46 -07:00
Nick Terrell	5717bd39ee	[lib] Fix NULL pointer dereference When the output buffer is `NULL` with size 0, but the frame content size is non-zero, we will write to the NULL pointer because our bounds check underflowed. This was exposed by a recent PR that allowed an empty frame into the single-pass shortcut in streaming mode. * Fix the bug. * Fix another NULL dereference in zstd-v1. * Overflow checks in 32-bit mode. * Add a dedicated test. * Expose the bug in the dedicated simple_decompress fuzzer. * Switch all mallocs in fuzzers to return NULL for size=0. * Fix a new timeout in a fuzzer. Neither clang nor gcc show a decompression speed regression on x86-64. On x86-32 clang is slightly positive and gcc loses 2.5% of speed. Credit to OSS-Fuzz.	2020-05-06 12:09:02 -07:00
W. Felix Handte	6028827fee	Rewrite Include Paths to be Relative Addresses #1998.	2020-05-04 15:20:26 -04:00
Felix Handte	7e9aabd652	Merge pull request #2099 from felixhandte/compile-under-pedantic Compile Under `-pedantic -Werror` and `-std=c90`	2020-05-04 10:07:13 -07:00
Felix Handte	816ed80774	Merge pull request #1984 from MeghnaM/1636-Reduce-stack-usage-of-HUF_sort Reduce stack usage of HUF_sort()	2020-05-04 08:15:31 -07:00
W. Felix Handte	3764859060	Switch Helper Declaration to Not Force Inline It was causing build issues in ANSI mode.	2020-05-04 10:59:15 -04:00
W. Felix Handte	c7da66c9cf	Purge C++-Style Comments (`// ...`), Make Compilation Succeed Under C90	2020-05-04 10:59:15 -04:00
W. Felix Handte	952427aebf	Avoid inline Keyword in C90 Previously we would use it for all gcc-like compilations, even when a restrictive mode that disallowed it had been selected.	2020-05-04 10:59:15 -04:00
W. Felix Handte	baa4e2e36c	Don't Evaluate Arguments to Dummy Function	2020-05-04 10:59:15 -04:00
W. Felix Handte	450542d3a7	Allow Empty Format Strings in Error Macro Invocations `-Wall` implies `-Wformat-zero-length`, which will cause compilation to fail under `-Werror` when an empty string is passed as the format string to a `printf`-family function. This commit moves us back to prefixing the provided format string, which successfully avoids that warning. However, this removes the failure mode where that `RAWLOG` invocation would fail to compile when no format string was provided at all (which was desirable to avoid having code that would successfully compile normally but fail under `-pedantic`, which does require that a non-zero number of args are provided). So this commit also introduces a function which does nothing at all, but will fail to compile if not provided with at least one argument, which is a string. This successfully links the compilability of pedantic and non-pedantic builds.	2020-05-04 10:59:15 -04:00
W. Felix Handte	2745f7a7d5	Make Error Macro Invocation Without Info String Fail to Compile Even without `-pedantic`, these macros will now fail to compile unless you provide an info string argument. This will prevent us from regressing.	2020-05-04 10:59:15 -04:00
Nick Terrell	e103d7b4a6	Fix superblock mode (#2100 ) Fixes: Enable RLE blocks for superblock mode Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table. Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table. Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes. Respect disableLiteralsCompression in superblock mode Fix superblock mode to handle a block consisting of only compressed literals Fix a off by 1 error in superblock mode that disabled it whenever there were last literals Fix superblock mode with long literals/matches (> 0xFFFF) Allow superblock mode to repeat Huffman tables Respect ZSTD_minGain(). Tests: Simple check for the condition in #2096. When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much. Remaining limitations: O(targetCBlockSize^2) because we recompute statistics every sequence Unable to split literals of length > targetCBlockSize into multiple sequences Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead. ... Fixes #2096	2020-05-01 16:11:47 -07:00
Meghna Malhotra	a084d959bd	WIP: Increased wksp size, but it's segfaulting	2020-05-01 13:45:48 -07:00
Nick Terrell	a4ff217baf	[lib] Add ZSTD_d_stableOutBuffer	2020-04-27 18:09:44 -07:00
Nick Terrell	5fcbc484c8	Merge pull request #2040 from caoyzh/dev-2 Optimize by prefetching on aarch64	2020-04-08 13:14:47 -07:00
Carl Woffenden	7af7735fa3	Merge remote-tracking branch 'upstream/dev' into single-file-lib	2020-04-07 11:13:02 +02:00
Carl Woffenden	edd9a07322	Code replicated in compression and decompression moved to shared headers `CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.	2020-04-07 11:02:06 +02:00
Bimba Shrestha	0154866749	moving consts to zstd_internal and reusing them	2020-04-03 14:26:15 -07:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
caoyzh	7201980650	Optimize by prefetching on aarch64	2020-03-14 15:25:59 +08:00
Bimba Shrestha	dba3abc95a	Missed returns	2020-03-05 12:20:59 -08:00
Bimba Shrestha	a75e5f2ffc	bitscan add undef check	2020-03-05 11:52:15 -08:00
Bimba Shrestha	85d0efd619	Removing no-tree-vectorize for intel	2020-03-05 10:02:48 -08:00
Nick Terrell	e32e3e8662	Improve wildcopy performance across the board	2020-01-28 20:37:04 -08:00
Bimba Shrestha	b1f53b1a10	[fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936 ) * Adding fail logging for superblock flow * Dividing by targetCBlockSize instead of blockSize * Adding new const and using more acurate formula for nbBlocks * Only do dstCapacity check if using superblock * Remvoing disabling logic * Updating test to make it catch more extreme case of previou bug * Also updating comment * Only taking compressEnd shortcut on non-superblock	2020-01-03 16:53:51 -08:00
Bimba Shrestha	a3a3c62b81	[fuzz] Only set HUF_repeat_valid if loaded table has all non-zero weights (#1898 ) Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table. * Only setting loaded dict huf table to valid on non-zero * Adding hasNoZeroWeights test to fse tables * Forbiding nbBits != 0 when weight == 0 * Reverting the last commit * Setting table log to 0 when weight == 0 * Small (invalid) zero weight dict test * Small (valid) zero weight dict test * Initializing repeatMode vars to check before zero check * Removing FSE changes to seperate pr * Reverting accidentally changed file * Negating bool, using unsigned, optimization nit	2019-11-26 12:24:19 -08:00
Nick Terrell	718f00ff6f	Optimize decompression speed for gcc and clang (#1892 ) * Optimize `ZSTD_decodeSequence()` * Optimize Huffman decoding * Optimize `ZSTD_decompressSequences()` * Delete `ZSTD_decodeSequenceLong()`	2019-11-25 18:26:19 -08:00
Sen Huang	7ce891870c	Fix merge conflicts	2019-11-05 15:51:25 -05:00
Nick Terrell	919d1d8e93	Merge pull request #1831 from terrelln/zstdmt-bad-memset [zstdmt] Don't memset the jobDescription	2019-10-21 15:53:57 -07:00
Felix Handte	cf725630a6	Merge pull request #1795 from felixhandte/workspace-asan Add Poisoned Redzones to the Workspace When Compiling with ASAN	2019-10-21 12:15:17 -04:00
Nick Terrell	243824551f	[threading] Add debug utilities	2019-10-18 15:05:34 -07:00
Yann Collet	19741c7d99	Merge pull request #1815 from facebook/zlibwrap make zlibWrapper strict ISO-C90 compatible	2019-10-16 16:45:15 -07:00
Yann Collet	2d5201b0ab	removed wildcopy8() which is no longer used, noticed by @davidbolvansky	2019-10-16 14:51:33 -07:00
W. Felix Handte	b6987acbbf	Declare the ASAN Functions We Need, Don't Include the Header	2019-10-10 13:40:16 -04:00
W. Felix Handte	edb6d884a5	Detect Whether We're Being Compiled with ASAN	2019-10-10 13:40:16 -04:00
W. Felix Handte	dc1fb684bf	Remove Unused MEM_SKIP_MSAN Macro	2019-10-10 13:40:16 -04:00
Yann Collet	cb18fffe65	enforce C90 compatibility for zlibWrapper	2019-09-24 17:50:58 -07:00
Dávid Bolvanský	1ab1a40c9c	Fixed one more place	2019-09-23 21:32:56 +02:00
Dávid Bolvanský	1f7228c040	Use clz ^ 31 instead of 31 - clz; better codegen for GCC	2019-09-23 21:23:09 +02:00
Nick Terrell	5cb7615f1f	Add UNUSED_ATTR to ZSTD_storeSeq()	2019-09-20 21:37:13 -07:00
Nick Terrell	44c65da97e	Remove literals overread in ZSTD_storeSeq() for ~neutral perf	2019-09-20 12:23:25 -07:00
Nick Terrell	cdad7fa512	Widen ZSTD_wildcopy to 32 bytes	2019-09-20 00:52:15 -07:00
Nick Terrell	efd37a64ea	Optimize decompression and fix wildcopy overread * Bump `WILDCOPY_OVERLENGTH` to 16 to fix the wildcopy overread. * Optimize `ZSTD_wildcopy()` by removing unnecessary branches and unrolling the loop. * Extract `ZSTD_overlapCopy8()` into its own function. * Add `ZSTD_safecopy()` for `ZSTD_execSequenceEnd()`. It is optimized for single long sequences, since that is the important case that can end up in `ZSTD_execSequenceEnd()`. Without this optimization, decompressing a block with 1 long match goes from 5.7 GB/s to 800 MB/s. * Refactor `ZSTD_execSequenceEnd()`. * Increase the literal copy shortcut to 16. * Add a shortcut for offset >= 16. * Simplify `ZSTD_execSequence()` by pushing more cases into `ZSTD_execSequenceEnd()`. * Delete `ZSTD_execSequenceLong()` since it is exactly the same as `ZSTD_execSequence()`. clang-8 seeds +17.5% on silesia and +21.8% on enwik8. gcc-9 sees +12% on silesia and +15.5% on enwik8. TODO: More detailed measurements, and on more datasets. Crdit to OSS-Fuzz for finding the wildcopy overread.	2019-09-19 21:07:14 -07:00
Yann Collet	3cac061db5	Merge pull request #1802 from bimbashrestha/rle_block_bound_fix_pt2 Adding 4 blocks to FSE_BLOCKBOUND() in lib/common (different from las…	2019-09-18 16:32:37 -07:00
Bimba Shrestha	6e9f6813bb	adding bit container size	2019-09-18 13:49:45 -07:00
Bimba Shrestha	f9b6abb896	Adding 4 blocks to FSE_BLOCKBOUND() in lib/common (different from last week)	2019-09-18 13:29:05 -07:00
Yann Collet	bfff5b30a4	Merge pull request #1756 from mgrice/dev Improvements in zstd decode performance	2019-09-18 11:35:50 -07:00
Felix Handte	2164a130f3	Merge pull request #1780 from felixhandte/workspace-efficiency-3 Avoid Clearing Tables Even When Changing CParams	2019-09-16 14:37:05 -04:00
W. Felix Handte	72ea79cacd	Don't Include `sanitizer/msan_interface.h`, Since Not All Platforms Provide It Instead, explicitly declare the functions we use.	2019-09-16 12:08:03 -04:00
Yann Collet	09b1844d9b	Merge pull request #1784 from bimbashrestha/fse_block_bound_err Rearranging assert and allowing 4 extra for FSE_BLOCKBOUND()	2019-09-12 19:09:27 -07:00
Bimba Shrestha	fe9af338ed	Added assert to BIT_flushBits()	2019-09-12 15:35:27 -07:00
Bimba Shrestha	43da5bf27e	Rearranging assert and allowing 4 extra for FSE_BLOCKBOUND()	2019-09-12 14:43:50 -07:00
W. Felix Handte	a10c191613	`__msan_poison()` Workspace When Preparing for Re-Use	2019-09-11 17:14:45 -04:00
mgrice	5d89771529	fix warning: always_inline function might not be inlinable	2019-08-29 12:32:15 -07:00
mgrice	b830599582	Improvements in zstd decode performance Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3). This change takes that further by exploiting some properties: 1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches 2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved. Speedup on Xeon E5-2680v4 is in the range of 3-5%. Measured wildcopy length distributions on silesia.tar: level <=8 <=16 <=24 >24 1 78.05% 11.49% 3.52% 6.94% 3 82.14% 8.99% 2.44% 6.43% 6 85.81% 6.51% 2.92% 4.76% 8 83.02% 7.31% 3.64% 6.03% 10 84.13% 6.67% 3.29% 5.91% 15 77.58% 7.55% 5.21% 9.66% 16 80.07% 7.20% 3.98% 8.75% Test Plan: benchmark silesia, make check	2019-08-29 12:25:56 -07:00
Carl Woffenden	901ea61f83	Tweaks to create a single-file decoder The CHECK_F macros differ slightly (but eventually do the same thing). Older GCC needs to fallback on the old-style pragma optimisation flags.	2019-08-21 17:49:17 +02:00
Yann Collet	61936ba42a	Merge pull request #1705 from josepho0918/dev Add support for IAR C/C++ Compiler for Arm	2019-08-05 15:57:28 +02:00
Yann Collet	0b0b83e8f3	fix test 122 it's an unsupported scenario.	2019-08-03 16:51:26 +02:00
Joseph Chen	3855bc4295	Add support for IAR C/C++ Compiler for Arm	2019-07-29 15:25:58 +08:00
mgrice	812e8f2a16	perf improvements for zstd decode (#1668 ) * perf improvements for zstd decode tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge) Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization. See how GCC autovectorizes the loop here: https://godbolt.org/z/apr0x0 Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on) After: https://godbolt.org/z/OwO4F8 Note that autovectorization still does not do a good job on the optimized version, so it's turned off\ via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both. silesia benchmark data - second triad of each file is with the original code: file orig compressedratio encode decode change 1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s 2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s 3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s 1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60% 2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01% 3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99% 1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s 2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s 3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s 1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30% 2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20% 3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50% 1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s 2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s 3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s 1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76% 2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54% 3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89% 1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s 2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s 3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s 1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47% 2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72% 3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29% 1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s 2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s 3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s 1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15% 2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98% 3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45% 1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s 2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s 3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s 1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41% 2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42% 3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68% 1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s 2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s 3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s 1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26% 2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36% 3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38% 1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s 2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s 3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s 1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45% 2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93% 3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04% 1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s 2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s 3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s 1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70% 2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85% 3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31% 1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s 2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s 3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s 1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89% 2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43% 3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33% 1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s 2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s 3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s 1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24% 2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99% 3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05% 1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s 2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s 3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s 1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52% 2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00% 3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37% 7.51% * makefile changed to only pass -fno-tree-vectorize to gcc * <Replace this line with a title. Use 1 line only, 67 chars or less> Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__) * fix for warning/error with subtraction of void* pointers * fix c90 conformance issue - ISO C90 forbids mixed declarations and code * Fix assert for negative diff, only when there is no overlap * fix overflow revealed in fuzzing tests * tweak for small speed increase	2019-07-11 18:31:07 -04:00
Josh Soref	a880ca239b	Spelling (#1582 ) * spelling: accidentally * spelling: across * spelling: additionally * spelling: addresses * spelling: appropriate * spelling: assumed * spelling: available * spelling: builder * spelling: capacity * spelling: compiler * spelling: compressibility * spelling: compressor * spelling: compression * spelling: contract * spelling: convenience * spelling: decompress * spelling: description * spelling: deflate * spelling: deterministically * spelling: dictionary * spelling: display * spelling: eliminate * spelling: preemptively * spelling: exclude * spelling: failure * spelling: independence * spelling: independent * spelling: intentionally * spelling: matching * spelling: maximum * spelling: meaning * spelling: mishandled * spelling: memory * spelling: occasionally * spelling: occurrence * spelling: official * spelling: offsets * spelling: original * spelling: output * spelling: overflow * spelling: overridden * spelling: parameter * spelling: performance * spelling: probability * spelling: receives * spelling: redundant * spelling: recompression * spelling: resources * spelling: sanity * spelling: segment * spelling: series * spelling: specified * spelling: specify * spelling: subtracted * spelling: successful * spelling: return * spelling: translation * spelling: update * spelling: unrelated * spelling: useless * spelling: variables * spelling: variety * spelling: verbatim * spelling: verification * spelling: visited * spelling: warming * spelling: workers * spelling: with	2019-04-12 11:18:11 -07:00
shakeelrao	0033bb4785	Update documentation for ZSTD_frameSizeInfo	2019-03-17 17:41:27 -07:00
shakeelrao	19b75b6ecb	Test new ZSTD_findFrameCompressedSize and update documentation	2019-03-15 18:04:19 -07:00
W. Felix Handte	501eb25102	Rename FORWARD_ERROR -> FORWARD_IF_ERROR	2019-01-29 12:56:07 -05:00
W. Felix Handte	429987c9a6	Add Comment	2019-01-28 17:35:31 -05:00
W. Felix Handte	2179ce00e1	Remove CHECK_E Macro	2019-01-28 17:33:13 -05:00
W. Felix Handte	7ebd897157	Remove CHECK_F Macro	2019-01-28 17:16:32 -05:00
W. Felix Handte	324e9654d3	Add grep-able String to Error Macros	2019-01-28 12:50:36 -05:00
W. Felix Handte	a3538bbc6f	Add RETURN_ERROR and FORWARD_ERROR Macros	2019-01-28 12:45:26 -05:00
W. Felix Handte	54fa31f03b	Add RETURN_ERROR_IF Macro That Logs Debug Information When Check Fails	2019-01-28 11:43:33 -05:00
Yann Collet	ededcfca57	fix confusion between unsigned <-> U32 as suggested in #1441. generally U32 and unsigned are the same thing, except when they are not ... case : 32-bit compilation for MIPS (uint32_t == unsigned long) A vast majority of transformation consists in transforming U32 into unsigned. In rare cases, it's the other way around (typically for internal code, such as seeds). Among a few issues this patches solves : - some parameters were declared with type `unsigned` in .h, but with type `U32` in their implementation .c . - some parameters have type unsigned*, but the caller user a pointer to U32 instead. These fixes are useful. However, the bulk of changes is about %u formating, which requires unsigned type, but generally receives U32 values instead, often just for brevity (U32 is shorter than unsigned). These changes are generally minor, or even annoying. As a consequence, the amount of code changed is larger than I would expect for such a patch. Testing is also a pain : it requires manually modifying `mem.h`, in order to lie about `U32` and force it to be an `unsigned long` typically. On a 64-bit system, this will break the equivalence unsigned == U32. Unfortunately, it will also break a few static_assert(), controlling structure sizes. So it also requires modifying `debug.h` to make `static_assert()` a noop. And then reverting these changes. So it's inconvenient, and as a consequence, this property is currently not checked during CI tests. Therefore, these problems can emerge again in the future. I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests. It's another restriction for coding, adding more frustration during merge tests, since most platforms don't need this distinction (hence contributor will not see it), and while this can matter in theory, the number of platforms impacted seems minimal. Thoughts ?	2018-12-21 18:09:41 -08:00
Yann Collet	e4ae24c229	Merge pull request #1420 from felixhandte/zstd-decompress-minimal Various Macros to Allow Building Extremely Minimal Decoder Library	2018-12-20 15:17:37 -08:00
W. Felix Handte	8e61ac8161	Use Unused Variable in ERR_getErrorString()	2018-12-19 12:36:10 -08:00
W. Felix Handte	c560e34c86	Add HUF_FORCE_DECOMPRESS_X2	2018-12-18 13:36:39 -08:00
W. Felix Handte	432314b58a	Rename HUF_DECOMPRESS_MINIMAL -> HUF_FORCE_DECOMPRESS_X1	2018-12-18 13:36:39 -08:00
W. Felix Handte	605dd576ee	Remove Error Strings with ZSTD_STRIP_ERROR_STRINGS	2018-12-18 13:36:39 -08:00
W. Felix Handte	9d5f3963ff	Add Option to Not Request Inlining with ZSTD_NO_INLINE	2018-12-18 13:36:39 -08:00
W. Felix Handte	f45c9df42e	Totally Hide/Disable X2 Variants when HUF_DECOMPRESS_MINIMAL is Defined	2018-12-18 13:36:39 -08:00
Yann Collet	373ff8b983	play around with rescale weights	2018-12-17 15:48:34 -08:00
Yann Collet	9c3265a53f	Merge pull request #1417 from facebook/advancedAPI Advanced API	2018-12-10 18:48:15 -08:00
Ryan Schmidt	ef4df0df4a	Fix i386 build failure "Junk character 13"	2018-11-16 02:16:21 -06:00
Yann Collet	d7e10a774a	added constant ZSTD_WINDOWLOG_LIMIT_DEFAULT answering #1407. Also : removed obsolete function ZSTD_setDStreamParameter() which could only be used with one parameter (DStream_p_maxWindowSize). Now replaced by ZSTD_DCtx_setWindowSize() (which exists since a few revisions)	2018-11-13 18:12:34 -08:00
Yann Collet	626040ab53	changed PREFETCH() macro into PREFETCH_L2() which is more accurate	2018-11-12 17:05:32 -08:00
Yann Collet	1b4a9c518b	Merge pull request #1410 from facebook/prefetch_dec improve long-range decoder speed	2018-11-08 18:41:58 -08:00
Yann Collet	9126da5b5c	improve long-range decoder speed on enwik9 at level 22 (which is almost a worst case scenario), speed improves by +7% on my laptop (415 -> 445 MB/s)	2018-11-08 12:47:46 -08:00
Nick Terrell	a8daa2d683	Signal before unlocking in pool.c	2018-11-08 10:45:53 -08:00
Yann Collet	acd75a1448	fixed a second memset() on NULL not sure why it only triggers now, this code has been around for a while. Introduced a new error code : dstBuffer_null, I couldn't express anything even remotely similar with existing error codes set.	2018-10-29 15:03:57 -07:00
Yann Collet	2b4914082e	created zstd_decompress_block module isolate all logic associated with block decompression into its own module. zstd_decompress is still in charge of context creation/destruction, frames, headers, streaming, special blocks, etc. Compressed blocks themselves are now handled within zstd_decompress_block .	2018-10-25 16:28:41 -07:00
Yann Collet	ccd2d426fc	separate DDict logic into its own module created zstd_ddict.c within lib/decompress	2018-10-23 17:25:49 -07:00
Ori Livneh	f31715f5e0	Enable use of bswap intrinsics in clang Necessary because clang disguises itself as an older (__GNUC_MINOR__ = 2) GCC.	2018-10-11 15:01:09 -04:00
Yann Collet	6ed3b526e4	restored bitMask for shift values since corrupted bitstreams can generate too large values. This slightly reduces the benefits from clang on my laptop. gcc results and code generation are not affected.	2018-10-10 18:29:50 -07:00
Yann Collet	c012e9540a	removed one assert() that can be triggered by a corrupted bitstream.	2018-10-10 17:33:04 -07:00
Yann Collet	7791f192ee	removed one assert() which can be triggered when input is corrupted.	2018-10-10 16:39:15 -07:00

1 2 3 4 5 ...

684 Commits