Gcc 7 removed support for __atomic intrinsics on 16-byte operands on x86-64 and
instead always generates library calls to libatomic, thus breaking user's code
compilation due to having to link with the library. Also, the assembler tends
to generate warnings when implicit zero displacement is used in memory operands.
Also made a few wording corrections and added is_always_lock_free and
a section about atomic<> typedefs. Clarified the status quo regarding
memory_order_consume. Removed the obsolete preudo-header for doxygen
that was not used for docs (if we want doxygen, it's better to
add comments to the real headers anyway).
The new implementation of 8 and 16-bit ops uses the lbarx/stbcx and
lharx/sthcx instructions available in Power8 and later architectures.
This allows to use smaller storage types, similar to those used by
compiler intrinsics.
Also added detection of 128-bit instructions lqarx/stqcx, which can
later be used to implement 128-bit ops.
Use ldrexb/w and strexb/w on ARMv7 and later to implement byte/word-wide
atomic ops. On the older ARM versions we still have to use 32-bit
widening implementation.
Also allowed immediate constants in some of the operations to improve
generated code.
Common ARM code extracted to a separate header to reuse with extra ops.
Fixed incorrect calculation of the min distance limit for arithmetic tests.
Moved some of the arithmetic tests to a separate function because
otherwise MSVC-10 for x64 generated broken code (the code would use
garbage values in registers to pass arguments to
add_and_test/sub_and_test).
Added output operators and employed newer test macros that output compared
values in case of test failure.
This allows for more flexibility in register allocation and potentially
more efficient code. Also, the temporary register was not exactly
customizable in the previous code, so it should have been cleaned up
anyway.
In order to support more flexible definition of the extra operations for
different platforms, define extra_operations as an addon to the existing
operations template. The extra_operations template will be used only by
the non-standard operations added by Boost.Atomic.
This is an attempt to improve generated code in the calling application that
involves CAS in a tight loop. The neccessity to cast between the value type and
the storage type for the `expected` argument results in inefficient code
that involves copying of the expected value and also saving the CAS result on
the stack. This has been observed at least with gcc 6.3 with a tight loop
on the user's side.
When we can ensure that the storage type can safely alias other types, and the
value type has the same size as the storage type, we can simplify CAS by
performing type punning on the `expected` reference instead of copying it back
and forth.