This cleans up the constructor methods for the OpenCL wrapper
classes and unifies the API used for creating a wrapper class
object from the underlying OpenCL objects.
Now, every wrapper class has a constructor taking the OpenCL
object and an optional boolean retain parameter which indicates
whether the constructor should increment the reference count.
This updates the constructors for the image2d and image3d
classes to use the new clCreateImage() function instead of
the deprecated clCreateImage2D/3D() functions.
This changes the enqueue_marker() method in the command_queue
class to use clEnqueueMarkerWithWaitList() instead of the
deprecated clEnqueueMarker() function when compiling with
OpenCL 1.2.
This changes the enqueue_barrier() method in the command_queue
class to use clEnqueueBarrierWithWaitList() instead of the
deprecated clEnqueueBarrier() function when compiling with
OpenCL 1.2.
This remove the enqueue_wait_for_event() method from the
command_queue class as the clEnqueueWaitForEvents() function
has been deprecated in OpenCL 1.2.
This moves the unload_compiler() method from the system class
to the platform class. Also changes the method to use the
clUnloadPlatformCompiler() function instead of the deprecated
clUnloadCompiler() when compiling with OpenCL 1.2.
This moves the get_extension_function_address() method from
the system class to the platform class. Also changes the method
to use the clGetExtensionFunctionAddressForPlatform() function
instead of the deprecated clGetExtensionFunctionAddress() when
compiling with OpenCL 1.2.
This fixes a bug in the move-constuctor for the vector<T>
class.
Previously, the moved-from object was also deallocating the
memory buffer leading to an error when the moved-to object
attempted to use it. Now, the constructor checks if the buffer
is non-empty before deallocating it.
This removes support for cl_half (typedef'd to half_).
The issue is that the cl_half type is indistinguishable
from the cl_ushort type (both are typedefs for uint16_t)
which caused the cl_khr_fp16 pragma to be injected into
kernels using cl_ushort which causes errors on platforms
that do not support the cl_khr_fp16 extension.
This fixes a bug in the event_profiling test case in the
command_queue test. On AMD platforms, the event object
returned from clEnqueueMarker() has no profiling information
associated with it and returns an error code when accessed.
Now, profiling information for a simple write to a device
buffer is checked instead.
This adds a new set of methods to the device class allowing
device objects to be partitioned into multiple sub-devices
using the clCreateSubDevices() function.
For now, device partitioning is only supported on systems
with OpenCL version 1.2 (or later).
This adds support for returning a std::vector<T> from the
various get_info<T>() methods. This provides a simpler
interface to get the values in an array returned from one
of the clGet*Info() functions.
This also adds a test using the new API to get the maximum
work item sizes in each dimension for a device.
This fixes a bug in the test for inplace_reduce() in which
the vector was being filled with data from two different
command queues leaving the data in an undefined state.
This fixes a bug in which the remove_if() function would overwrite
parts of the input before they were properly copied to the output
range. This is fixed by first copying the input values to a temporary
vector and then passing that as the input range to copy_if().
This fixes a bug in which the Intel OpenCL compiler would
fail to compile the count_if() and find_if() kernels for
vector types with the following error:
error: no matching function for call to 'all'
note: candidate function not viable: 1st argument ('__global int4')
is in address space 16776960, but parameter must be in address space 0
This is caused when the predicate compares a value from the input
buffer (in the global memory space) to a literal value (in the
private memory space).
This is fixed by first reading the value into a local variable in
the private memory space and then calling the predicate function.
This removes the check for local_memory_size in test_kernel. The
local memory size differs between platforms and some (e.g. Intel)
don't report any local memory usage.
This fixes a bug in which the fill() algorithm was called by
scan_impl() with an integer zero rather than zero of the value
type which caused issues when using scan() with vector values.
This adds a new method which allows for type definitions and
type pragmas to be added to a meta_kernel.
This provides a more generic and general interface and replaces
the previously used add_pair_type() method along with the special
case handling of half and double types.
This fixes the check for the local memory size in the
get_work_group_info kernel test.
While the kernel only allocates 16 float's, some platforms
will use more local memory. This changes the test to check
for at least 16 float's worth of local memory.
This fixes a bug in which certain platforms would return
CL_INVALID_VALUE from clCreateProgramWithBinary() if the
binary_status argument was not provided.
This removes the default type_name_trait::value() function
implementation.
Previously, the default implementation would return a null
pointer leading to run-time errors if a type name was not
provided. Now, a compile-time error will occur if type_name()
is called for an unknown type.
This adds a test for the sort() method which sorts a container
of 3D vectors by their length. This uses a lambda expression to
generate the compare function for the sort() algorithm.