This is an attempt to make boost::atomic<> interface closer to the standard. It
makes a difference in C++17 as it mandates copy elision, which makes this code
possible:
boost::atomic<int> a = 10;
It also makes is_convertible<T, boost::atomic<T>> return true, which has
implications on the standard library components, such as std::pair.
This removes the workaround for gcc 4.7, which complains that
operator=(value_arg_type) is considered ambiguous with operator=(atomic const&)
in assignment expressions, even though conversion to atomic<> is less preferred
than conversion to value_arg_type. We try to work around the problem from the
operator= side.
Added a new compile test to check that the initializing constructor is implicit.
This requires the is_trivially_default_constructible type trait, which is not
available in the older libstdc++ versions up to gcc 5.1. Thus the config macro
is updated to reflect the fact that Boost.Atomic now has more advanced needs.
Also, attempt to work around Intel compiler problem, which seems to break
(allegedly) because of the noexcept specifiers in the defaulted default
constructors. This may not be the cause, so this change will need to be tested.
Also, use value_arg_type consistently across different specializations of
basic_atomic.
The support includes:
- The standard fetch_add/fetch_sub operations.
- Extra operations: (fetch_/opaque_)negate, (opaque_)add/sub.
- Extra capability macros: BOOST_ATOMIC_FLOAT/DOUBLE/LONG_DOUBLE_LOCK_FREE.
The atomic operations are currently implemented on top of the integer-based
backends and thus are mostly CAS-based. The CAS operations perform binary
comparisons, and as such have different behavior wrt. special FP values like
NaN and signed zero than normal C++.
The support for floating point types is optional and can be disabled by
defining BOOST_ATOMIC_NO_FLOATING_POINT. This can be useful if on a certain
platform parameters of the floating point types cannot be deduced from the
compiler-defined or system macros (in which case the compilation fails).
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0020r6.html
These operations are useful for two reasons. First, they are needed by
atomic<> interface as the pre-increment/decrement and add/subtract operators
need to perform the corresponding arithmetics and return the actual result while
not exhibiting UB in case of overflow. This means that the operation must be
performed on the unsigned storage type in the backend. Second, the (op)_and_test
operations on ARM and PowerPC can be implemented in a more generic way on top of
the operations that return the result. And since we have those operations
internally, why not expose them to users.
Added tests and docs for the new operations. Also, added docs for the recently
added scoped names of the memory_order enum values.
Also, added a specialized "emulated" backend for the extra operations. This
backend makes better use of the fact that the operations are lock-protected
by avoiding any CAS-based loops.
The standard says that arithmetic operations on atomic types must always produce
a well-defined result in terms of two's complement arithmetic
([atomics.types.int]/7), which means integer owerflows are allowed and no trap
representations are allowed. This requires that all internal arithmetics be done
on unsigned integer types, even when the value type is a signed integer.
The implementation now casts between signed and unsigned integers internally,
performing zero or sign extension if the internal storage size is larger than
the stored value. This should have roughly the same performance as before,
although it mostly depends on the optimizer. The casting implementation
currently relies on that the signed integer representation is two's complement
on all supported platforms; other representations are not supported.
The compiler complains:
error: can't find a register in class 'GENERAL_REGS' while reloading 'asm'
Try to work around it by explicitly specifying all registers to use.
That change did not fix the compilers. The issue seems to be a bug specific to
MinGW gcc compilers up to 4.6, inclusively, and the cause of the problem is
yet unknown.
Alternatives are not supported by all compilers, and for the purpose we're using
them we can use constraint-local alternatives (i.e. "qm" instead of "q,m").
Additionally, this reduces duplication of other constraints in the asm block.
As the names suggest, the methods perform the corresponding operation and test
if the result is not zero.
Also, for the emulated fetch_complement, take care of integral promotion, which
could mess up the storage bits that were not part of the value on backends
where the storage is larger than the value. This could in turn break CAS on
the atomic value as it compares the whole storage.
The compiler, surprisingly, uses ebx for memory operands, which messed up the
save/restore logic in asm blocks, resulting the memory operand (which was
supposed to be the pointer to the atomic storage) being incorrect.
First, clang (and, apparently, recent gcc as well) are able to deal with ebx
around the asm blocks by themselves, which makes it unnecessary to save/restore
the register in the asm blocks. Therefore, for those compilers we now use the
non-PIC branch in PIC mode as well. This sidesteps the original problem with
clang.
Second, since we can't be sure if other compilers are able to pull the same
trick, the PIC branches of code have been updated to avoid any memory operand
constraints and use the explicitly calculated pointer in a register instead. We
also no longer use a scratch slot on the stack to save ebx but instead use esi
for that, which is also conveniently used for one of the inputs. This should
be slightly faster as well. The downside is that we're possibly wasting one
register for storing the pointer to the storage, but there seem to be no way
around it.
Apparently, gcc versions up to 4.6, inclusively, have problems allocating
eax:edx register pairs in asm statements for 32-bit x86 targets. Included those
compilers in the existing workaround.
Also, for clang removed the use of __sync-based workarounds for exchange()
implementation and use the asm branch with the workaround. It should produce
a more efficient code.
Clang failed to compile such code. Given that gcc 7 also complained about
missing displacements in memory operands, this trick is no longer effective
with newer compilers.
Instead, the assembler code have been refactored to avoid having to specify
any displacements at all, offloading this work to the compiler. We hope that
the compiler will be smart enough to not overallocate registers for every
memory operand used in the inline assembler. At least, recent gcc and clang are
able to do this and generate code comparable to what was achieved previously.
Additionally, it was possible to reduce assembler code in several places by
removing mov instructions setting up input registers or handling the results.
Instead, we now rely on the compiler doing this work to satisfy assembler block
constraints.
In 32-bit load and store, improved support for targets with SSE but not SSE2.
It is possible to use SSE to do the loads/stores. Also, the scratch xmm register
is now picked by the compiler.
Clang has a bug of not advertising support for atomics implemented via
cmpxchg8b, even if the instruction is enabled in the command line. We have
to workaround the same problem with cmpxchg16b on 64-bit x86 as well, so
we apply the same approach here - we implement all atomic ops through DCAS
ourselves.
This fix adds a check for cmpxchg8b to capabilities definition.
This makes the result of (op)_and_test more consistent with other
methods such as test_and_set and bit_test_and_set, as well as the
methods used in the C++ standard library.
This is a breaking change. The users are able to define
BOOST_ATOMIC_HIGHLIGHT_OP_AND_TEST macro to generate warnings on each
use of the changed functions. This will help users to port from Boost
1.66 to newer Boost releases.
More info at:
https://github.com/boostorg/atomic/issues/11http://boost.2283326.n4.nabble.com/atomic-op-and-test-naming-
tc4701445.html