This changes the BOOST_COMPUTE_FUNCTION() macro (and the related
BOOST_COMPUTE_CLOSURE() macro) to use custom, user-provided argument
names instead of auto-generating them based on their index.
This is an API-breaking change. Users should now provide argument
names when using the BOOST_COMPUTE_FUNCTION() macro. The examples
and documentation have been updated to reflect the new API.
This changes the vector<T> constructors which copy or initialize
data to take a queue argument used for performing the operations.
Previously they just took a context argument used to initialize the
buffer and then created a new command queue to use. This improves
performance by not requiring a new command queue and also fixes issues
when performing operations on a different command queue while the
vector was still being initialized.
This adds a improved reduce() algorithm implementation for
GPUs. Also adds checks to accumulate() which allow it to
use the higher-performance reduce() algorithm if possible.
This removes the init argument from reduce. This simplifies the
implementation and avoids copying a value from the host to the
device on every call to reduce.
If an initial value is required, the accumulate function can be
called instead.
refs kylelutz/compute#9
device, context, and queue are initialized statically in `context_setup.hpp`.
With this change all tests are able to complete when an NVIDIA GPU is in
exclusive compute mode.
Side effect of the change:
Time for all tests to complete reduced from 15.71 to 13.03 sec Tesla C2075.