This is an attempt to improve generated code in the calling application that
involves CAS in a tight loop. The neccessity to cast between the value type and
the storage type for the `expected` argument results in inefficient code
that involves copying of the expected value and also saving the CAS result on
the stack. This has been observed at least with gcc 6.3 with a tight loop
on the user's side.
When we can ensure that the storage type can safely alias other types, and the
value type has the same size as the storage type, we can simplify CAS by
performing type punning on the `expected` reference instead of copying it back
and forth.