This changes the vector<T> constructors which copy or initialize
data to take a queue argument used for performing the operations.
Previously they just took a context argument used to initialize the
buffer and then created a new command queue to use. This improves
performance by not requiring a new command queue and also fixes issues
when performing operations on a different command queue while the
vector was still being initialized.
This adds a new sort_by_key() algorithm which sorts a range
of values by a range of keys with a comparison operator.
For now this is only implemented by the serial insertion sort
algorithm. In the future it will be ported to the other sorting
algorithms (e.g. radix sort).