This fixes a few issues encountered when using iterators with a
void value_type (e.g. std::insert_iterator<>).
The is_contiguous_iterator meta-function was refactored to always
return false for iterators with a void value_type and avoid
instantiating types for containers with a void value_type
(e.g. std::vector<void>::iterator) which previously resulted
in compilation errors.
This adds a system-wide default command queue. This queue is
accessible via the new static system::default_queue() method.
The default command queue is created for the default compute
device in the default context and is analogous to the default
stream in CUDA.
This changes how algorithms operate when invoked without an
explicit command queue. Previously, each algorithm had two
overloads, the first expected a command queue to be explicitly
passed and the second would create and use a temporary command
queue. Now, all algorithms take a command queue argument which
has a default value equal to system::default_queue().
This fixes a number of race-conditions and performance issues
througout the library associated with create, using, and
destroying many separate command queues.
This adds a new macro for the unit-tests which checks a range of
values on the device against an array of values on the host. This
simplifies writing tests and removes the need to explicitly copy
values back to the host for verification.
This fixes a few memory handling issues between device_ptr,
buffer_iterator, buffer_value, allocator, and malloc/free.
Previously, memory buffers that were allocated by allocator and
malloc were being retained (via clRetainMemObject() in buffer's
constructor) by device_ptr, buffer_iterator and buffer_value.
Now, false is passed for the retain parameter to buffer's
constructor so that the buffer's reference count is not
incremented. Furthermore, the classes now set the buffer to
null before being destructed so that they will not decrement its
reference count (which normally occurs buffer's destructor).
The main effect of this change is that objects which refer to a
memory buffer but do not own it (e.g. device_ptr, buffer_iterator)
will not modify the reference count for the buffer. This fixes a
number of memory leaks which occured in longer running programs.
This adds a new scalar<T> "container" which stores a single
value in a memory buffer. This simplifies memory handling in
algorithms which read and write a single value.
This refactors the system::default_device() method. Now, the
default compute device for the system is only found once and
stored in a static variable. This eliminates many redundant
calls to clGetPlatformIDs() and clGetDeviceIDs().
Also, the default_cpu_device() and default_gpu_device() methods
have been removed and their usages replaced with default_device().
This adds checks to the device test-suite to ensure that the
current device supports the partitioning types before attempting
to use the corresponding device::partition_*() methods.
This fixes a couple of narrowing conversion warnings in the
device partitioning methods which were seen when compiling
VexCL with Boost.Compute in C++11 mode.
This adds a get<N>() function which returns the n'th element
of an aggregate type (e.g. vector type, pair, tuple).
This unifies the functionality of, and replaces, the get_pair()
and vector_component() functions.
This changes the vector class to not auto-initialize values
when it is created or resized. This improves performance by
eliminating a call to fill(). If needed, user code can call
fill() explicitly on the newly allocated values.
This increases the work-group size for the copy() kernel to be
up to 32 items based on the size of the input. This increases the
performance of copy() and related algorithms (e.g. transform()).
This changes the clamp_range() test to use float values instead
of int values. The OpenCL clamp() function is only defined for
float values and this test caused kernel compilation errors on
certain platforms.
Also updates the test to use the new global context.
This adds a clamp_range() algorithm which clamps a range
of values between a low and high value. This is based on
the algorithm of the same name in Boost.Algorithm.
This removes the documentation for the non-existent platforms()
and platform_count() methods in the platform class. These methods
have been moved to the system class and are documented there.
refs kylelutz/compute#9
device, context, and queue are initialized statically in `context_setup.hpp`.
With this change all tests are able to complete when an NVIDIA GPU is in
exclusive compute mode.
Side effect of the change:
Time for all tests to complete reduced from 15.71 to 13.03 sec Tesla C2075.
This adds a test for the enqueue_write_buffer_rect() method
in the command_queue class. This method copies a rectangular
region of memory from the host to a device buffer.
This changes the enqueue_nd_range_kernel() method to return an
event object. This allows clients to monitor the progress of a
kernel executing on a device.
boost::compute::system::default_device() supports the following
environment variables:
BOOST_COMPUTE_DEFAULT_DEVICE for device name
BOOST_COMPUTE_DEFAULT_PLATFORM for OpenCL platform name
BOOST_COMPUTE_DEFAULT_VENDOR for device vendor name
If one or more of these variables is set, then device that satisfies
all conditions gets selected. If such a device is unavailable, then
the first available GPU is selected. If there are no GPUs in the
system, then the first available CPU is selected. Otherwise,
default_device() returns null device.
The hello_world example is modified to use default_device() instead
of default_gpu_device().
This adds a specialization of multiplies<T> for std::complex<T>
which implements complex number multiplication.
Also adds a simple test using transform() to verify the complex
multiplication works correctly.
This fixes an unused variable warning which occurs in the
get_base_iterator_buffer() function when the base iterator
is not a buffer iterator and thus the iter argument is not
used.
This fixes a bug in which boost::result_of() would return the
wrong result type for a function due to the new implementation
using decltype instead of the result_of protocol on compilers
that sufficently support C++11 (such as clang >= 3.2).
Now, boost::tr1_result_of() is used to explicitly request that
the result_of protocol be used even when decltype is supported
by the compiler.