This adds a new macro which allows the user to adapt a C++ struct
or class for use with OpenCL given its type, name, and members.
This allows for custom user-defined data-types to be used with the
Boost.Compute containers and algorithms.
This adds a new function which will return the named field
from a value. For example, this can be used to return one of
the components of a pair object or to swizzle a vector value.
This adds a new macro to ease the definition of custom user
functions. The BOOST_COMPUTE_FUNCTION() macro creates a new
boost::compute::function<> object with the provided return
type, argument types, function name and OpenCL source code.
This removes the timer class. The technique of measuring the time
difference between two different OpenCL markers on a command queue
is not portable to all OpenCL implementations (only works on NVIDIA).
A new internal timer class has been added which uses boost::chrono
(or std::chrono if BOOST_COMPUTE_TIMER_USE_STD_CHRONO is defined).
This new timer is used by the benchmarks to measure time elapsed
on the host.
This adds a simple inplace_merge() algorithm which merges
two contiguous sorted ranges in-place.
For now, the implementation simply copies the ranges to
two temporary vectors and calls merge().
This adds a system-wide default command queue. This queue is
accessible via the new static system::default_queue() method.
The default command queue is created for the default compute
device in the default context and is analogous to the default
stream in CUDA.
This changes how algorithms operate when invoked without an
explicit command queue. Previously, each algorithm had two
overloads, the first expected a command queue to be explicitly
passed and the second would create and use a temporary command
queue. Now, all algorithms take a command queue argument which
has a default value equal to system::default_queue().
This fixes a number of race-conditions and performance issues
througout the library associated with create, using, and
destroying many separate command queues.
This refactors the system::default_device() method. Now, the
default compute device for the system is only found once and
stored in a static variable. This eliminates many redundant
calls to clGetPlatformIDs() and clGetDeviceIDs().
Also, the default_cpu_device() and default_gpu_device() methods
have been removed and their usages replaced with default_device().
This adds a get<N>() function which returns the n'th element
of an aggregate type (e.g. vector type, pair, tuple).
This unifies the functionality of, and replaces, the get_pair()
and vector_component() functions.
This changes the vector class to not auto-initialize values
when it is created or resized. This improves performance by
eliminating a call to fill(). If needed, user code can call
fill() explicitly on the newly allocated values.
This removes the documentation for the non-existent platforms()
and platform_count() methods in the platform class. These methods
have been moved to the system class and are documented there.
This changes the enqueue_nd_range_kernel() method to return an
event object. This allows clients to monitor the progress of a
kernel executing on a device.
This cleans up the constructor methods for the OpenCL wrapper
classes and unifies the API used for creating a wrapper class
object from the underlying OpenCL objects.
Now, every wrapper class has a constructor taking the OpenCL
object and an optional boolean retain parameter which indicates
whether the constructor should increment the reference count.
This remove the enqueue_wait_for_event() method from the
command_queue class as the clEnqueueWaitForEvents() function
has been deprecated in OpenCL 1.2.
This moves the unload_compiler() method from the system class
to the platform class. Also changes the method to use the
clUnloadPlatformCompiler() function instead of the deprecated
clUnloadCompiler() when compiling with OpenCL 1.2.
This moves the get_extension_function_address() method from
the system class to the platform class. Also changes the method
to use the clGetExtensionFunctionAddressForPlatform() function
instead of the deprecated clGetExtensionFunctionAddress() when
compiling with OpenCL 1.2.
This implements the merge() algorithm which merges two
ranges of sorted values into a single sorted range.
The current implementation uses a simple serial merge
algorithm. A GPU optimized version is coming soon.