clEnqueueSVMMemcpy() operation does not work on AMD devices due to a bug
in drivers (https://community.amd.com/thread/190585). This affects copy()
algorithm (when SVM is used) and results in copy_svm_ptr failing. Now this
test case is skipped on AMD devices.
This changes the vector<T> constructors which copy or initialize
data to take a queue argument used for performing the operations.
Previously they just took a context argument used to initialize the
buffer and then created a new command queue to use. This improves
performance by not requiring a new command queue and also fixes issues
when performing operations on a different command queue while the
vector was still being initialized.
This adds a copy() specialization for host-to-host transfers
which simply forwards the call to std::copy().
This is useful in templated algorithms which may in certain
circumstances copy() between data ranges on the host.
This fixes a few issues encountered when using iterators with a
void value_type (e.g. std::insert_iterator<>).
The is_contiguous_iterator meta-function was refactored to always
return false for iterators with a void value_type and avoid
instantiating types for containers with a void value_type
(e.g. std::vector<void>::iterator) which previously resulted
in compilation errors.
This adds a new macro for the unit-tests which checks a range of
values on the device against an array of values on the host. This
simplifies writing tests and removes the need to explicitly copy
values back to the host for verification.
refs kylelutz/compute#9
device, context, and queue are initialized statically in `context_setup.hpp`.
With this change all tests are able to complete when an NVIDIA GPU is in
exclusive compute mode.
Side effect of the change:
Time for all tests to complete reduced from 15.71 to 13.03 sec Tesla C2075.