This adds an experimental algorithm like copy_if() which copies
the index of the values for which predicate returns true instead
of the values themselves.
This adds a new macro for the unit-tests which checks a range of
values on the device against an array of values on the host. This
simplifies writing tests and removes the need to explicitly copy
values back to the host for verification.
refs kylelutz/compute#9
device, context, and queue are initialized statically in `context_setup.hpp`.
With this change all tests are able to complete when an NVIDIA GPU is in
exclusive compute mode.
Side effect of the change:
Time for all tests to complete reduced from 15.71 to 13.03 sec Tesla C2075.