This adds third-party performance tests to use in comparing
Boost.Compute with other parallel/GPGPU frameworks like Intel's
TBB and NVIDIA's Thrust along with the C++ STL.
Also refactors the timing and profiling infrastructure and adds
a simple perf.py driver script for running performance tests.
This removes the timer class. The technique of measuring the time
difference between two different OpenCL markers on a command queue
is not portable to all OpenCL implementations (only works on NVIDIA).
A new internal timer class has been added which uses boost::chrono
(or std::chrono if BOOST_COMPUTE_TIMER_USE_STD_CHRONO is defined).
This new timer is used by the benchmarks to measure time elapsed
on the host.