This adds a random number distribution which generates random
numbers in a uniform distribution.
Also adds a convenience algorithm which fills a range with
uniformly distributed random numbers between two values.
This adds an enqueue_migrate_memory_objects() method to the
command_queue class which allows memory objects to be migrated
between compute devices and to the host.
This makes a few tweaks to the reduce() algorithm in order to
improve performance. An unnecessary barrier() has been removed
and now multiple values are reduced on the initial read.
This changes the meta_kernel::add_arg() overload with a name
and a value to a separate method. This fixes conflict when
using add_arg() with string values.
This adds a specialization for the get<N>() function when used
with zip_iterator's. Now, only the N'th iterator for the expression
will be dereferenced instead of dereferencing all of the iterators
into a tuple and then extracting the N'th component.
This removes the cv-qualifiers for the value-type returned from
get<N>() expressions. This fixes issues when specializing based
on the type (e.g. pair, tuple).
This fixes a bug in the meta_kernel streaming operators with
float values. Now, float scalar and vector literals are inserted
into the kernel source with the proper 'f' suffix.
This makes some improvements to the system::find_default_device()
method. Now, the devices on the system will only be queried once
when searching for the default device. This reduces the number of
calls to clGetPlatformIDs() and clGetDeviceIDs().
Also, in the case that no GPU or CPU devices are found, the first
device on the system will be selected as the default device. This
fixes issues when using Boost.Compute with pocl.
This adds a check to skip tests which use fill() with pair and
tuple types on AMD platforms. There is a bug which crashes the
OpenCL compiler with an "UNREACHABLE executed!" message on AMD
platforms when using struct assignment in kernel code.
See: http://devgurus.amd.com/thread/166622
This adds a check to the reverse() algorithm to ensure that
the range contains at least two elements. Previously, passing
zero or one element ranges to reverse() would result in errors.
This fixes a compilation error which occurred when assigning
to a future<void> from a future<T>. For different future types
the event member variable is private and must be accessed via
the get_event() method.
This fixes issues when using char and unsigned char literals in
a meta_kernel. Previously the character values would be directly
inserted without quotes (e.g. c instead of 'c') which lead to
kernel compilation errors.
This fixes a bug when creating a temporary vector for use in the
in-place scan() algorithm. Previously, a separate command queue
was used to copy the input values to the temporary vector. Now,
the same command queue is used for copying the input values and
performing the scan.
This removes the timer class. The technique of measuring the time
difference between two different OpenCL markers on a command queue
is not portable to all OpenCL implementations (only works on NVIDIA).
A new internal timer class has been added which uses boost::chrono
(or std::chrono if BOOST_COMPUTE_TIMER_USE_STD_CHRONO is defined).
This new timer is used by the benchmarks to measure time elapsed
on the host.
This cleans up the example code. Now all of the examples use
the "namespace compute = boost::compute" alias. This shortens
the example code making it less verbose and more clear. Also
cleans up a few style issues.
This adds a simple inplace_merge() algorithm which merges
two contiguous sorted ranges in-place.
For now, the implementation simply copies the ranges to
two temporary vectors and calls merge().
This adds support for using the get<N>() function in lambda
expressions to extract a single component of an aggregate type.
Also adds a test of using boost::tuple<> to store a user-defined
data type on the device and sort them by their first component
using a lambda expression as the comparator.