Now all tests work even if Context::context, Context::device,
Context::queue are not default context, device and queue. This
is required for developing better tests in the future.
Note: Some tests may work only for default context/queue/device
since classes that they test work only for default context/q/d.
There are two solutions for this problem: either those tests run
on default queue (no matter what) or they does not run when
Context::context is not the default context. See test_string.cpp.
This changes the vector<T> constructors which copy or initialize
data to take a queue argument used for performing the operations.
Previously they just took a context argument used to initialize the
buffer and then created a new command queue to use. This improves
performance by not requiring a new command queue and also fixes issues
when performing operations on a different command queue while the
vector was still being initialized.
This adds a improved reduce() algorithm implementation for
GPUs. Also adds checks to accumulate() which allow it to
use the higher-performance reduce() algorithm if possible.
This adds adds an overload of the reduce() function which
uses plus<T>() as the reductor. This simplifies the common
case of calculating the sum for a range of values.
This removes the init argument from reduce. This simplifies the
implementation and avoids copying a value from the host to the
device on every call to reduce.
If an initial value is required, the accumulate function can be
called instead.
This adds a test for computing the minimum and maximum
values of a vector simultaneously using reduce() with a
custom reduction function.
Also fixes a bug in reduce() in which inplace_reduce() was
being used even if the input type and result type differed.
This adds an output iterator result argument to the reduce()
algorithm. Now, instead of returning the reduced result, the
result is written to an output iterator. This allows the value
to stay on the device and avoids a device-to-host copy in cases
where the result is not needed on the host (e.g. it is part of
a larger computation).
This is an API breaking change to users of reduce(). Affected code
should now declare a result variable and then pass a pointer to it
as the new result argument.
refs kylelutz/compute#9
device, context, and queue are initialized statically in `context_setup.hpp`.
With this change all tests are able to complete when an NVIDIA GPU is in
exclusive compute mode.
Side effect of the change:
Time for all tests to complete reduced from 15.71 to 13.03 sec Tesla C2075.