poly_collection/doc/poly_collection.qbk
2019-09-04 09:28:30 +02:00

1208 lines
47 KiB
Plaintext

[library Boost.PolyCollection
[quickbook 1.6]
[authors [López Muñoz, Joaquín M]]
[copyright 2016-2019 Joaquín M López Muñoz]
[category containers]
[id poly_collection]
[dirname poly_collection]
[purpose
High-performance containers for polymorphic objects.
]
[license
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
[@http://www.boost.org/LICENSE_1_0.txt])
]
]
[def _Boost.TypeErasure_ [@boost:/libs/type_erasure Boost.TypeErasure]]
[def _Boost.TypeIndex_ [@boost:/libs/type_index Boost.TypeIndex]]
[def _CopyConstructible_ [@http://en.cppreference.com/w/cpp/named_req/CopyConstructible [* `CopyConstructible`]]]
[def _duck_typing_ [@https://en.wikipedia.org/wiki/Duck_typing ['duck typing]]]
[def _EqualityComparable_ [@http://en.cppreference.com/w/cpp/named_req/EqualityComparable [* `EqualityComparable`]]]
[def _ForwardIterator_ [@http://en.cppreference.com/w/cpp/named_req/ForwardIterator [* `ForwardIterator`]]]
[def _MoveConstructible_ [@http://en.cppreference.com/w/cpp/named_req/MoveConstructible [* `MoveConstructible`]]]
[def _MoveAssignable_ [@http://en.cppreference.com/w/cpp/named_req/MoveAssignable [* `MoveAssignable`]]]
[def _std::for_each_ [@http://en.cppreference.com/w/cpp/algorithm/for_each `std::for_each`]]
[def _std::function_ [@http://en.cppreference.com/w/cpp/utility/functional/function `std::function`]]
[def _std::unordered_multiset_ [@http://en.cppreference.com/w/cpp/container/unordered_multiset `std::unordered_multiset`]]
[def _RandomAccessIterator_ [@http://en.cppreference.com/w/cpp/named_req/RandomAccessIterator [* `RandomAccessIterator`]]]
[template ticket[number]'''<ulink url="https://svn.boost.org/trac/boost/ticket/'''[number]'''">'''#[number]'''</ulink>''']
[template github[module number]'''<ulink url="https://github.com/boostorg/'''[module]'''/issues/'''[number]'''">'''#[number]'''</ulink>''']
[template github_pr[module number]'''<ulink url="https://github.com/boostorg/'''[module]'''/pull/'''[number]'''">'''PR#[number]'''</ulink>''']
[section Introduction]
Dynamic polymorphism in C++ requires that objects (such as instances
of classes derived from an abstract base) be accessed through an indirection pointer
because their actual /type/ and /size/ are not known at the point of usage. As a
consequence, regular containers cannot store polymorphic objects directly: the usual
workaround is to have containers of pointers to heap-allocated elements. In modern
computer architectures this pattern incurs two types of inefficiency:
* The lack of memory contiguity produced by heap allocation degrades CPU cache
performance.
* Executing virtual operations on a sequence of polymorphic objects whose
actual types differ from one to the next results in failures in
[@https://en.wikipedia.org/wiki/Branch_predictor branch prediction] and a
lower execution speed.
When the particular traversal order is not relevant to the user application,
Boost.PolyCollection proposes an alternative data structure that restores memory
contiguity and packs elements according to their concrete type. Three container
class templates are provided:
* `boost::base_collection`
* `boost::function_collection`
* `boost::any_collection`
respectively dealing with three different types of dynamic polymorphism
available in C++:
* Classic base/derived or OOP polymorphism.
* Function wrapping in the spirit of _std::function_.
* So-called _duck_typing_ as implemented by _Boost.TypeErasure_.
The interface of these containers closely follows that of standard containers.
Additionally, the library provides versions of many of the standard library
algorithms (including
[@http://en.cppreference.com/w/cpp/algorithm/for_each `std::for_each`])
with improved performance and a special feature called /type restitution/
that allows user code to provide clues on the concrete types of the elements
stored for further opportunities of increased efficiency related to inlining and
[@http://hubicka.blogspot.com.es/2014/01/devirtualization-in-c-part-1.html ['devirtualization]].
[note Boost.PolyCollection is a header-only library. C++11 support is required.
The library has been verified to work with Visual Studio 2015, GCC 4.8 and
Clang 3.3.]
[endsect]
[section An efficient polymorphic data structure]
Suppose we have a `base` abstract class with implementations `derived1`, `derived2`
and `derived3`. The memory layout of a `std::vector<base*>` (or similar constructs
such as `std::vector<std::unique_ptr<base>>` or
[@boost:/libs/ptr_container/ `boost::ptr_vector<base>`]) looks like
the following:
[$poly_collection/img/ptr_vector.png]
Elements that are adjacent in the vector are not necessarily allocated contiguously,
much less so if the vector has undergone mid insertions and deletions. A typical
processing operation
std::vector<base*> v;
...
for(base* b: v){
... // access base's virtual interface
}
is impacted negatively by two factors:
* Scattering of elements throughout memory reduces CPU caching efficiency,
which in general favor regular access loops to contiguous memory areas.
* [@https://en.wikipedia.org/wiki/Branch_predictor Branch prediction] tries
to minimize the effect of running conditional code (such as an
`if`-`else` statement or the invocation of a `base` virtual function) by
speculatively executing a given branch based on past history. This
mechanism is rendered mostly useless when `derived1`, `derived2` and
`derived3` elements are interspersed along the sequence without a definite pattern.
These limitations are imposed by the very nature of dynamic polymorphism:
as the exact types of the elements accessed through `base`'s interface are not
known, an indirection through `base*` (a particular form of /type erasure/)
is required. There is however a critical observation: even though derived types
are not known when traversing a `std::vector<base*>`, the information is typically
available /at compile time/ at the point of insertion in the vector:
std::vector<base*> v;
...
v.insert(new derived2{...}); // the type derived2 is "forgotten" by v
A suitably designed container can take advantage of this information to
arrange elements contiguously according to their exact type, which results
in an internal data structure (a map of pointers to
[@http://en.cppreference.com/w/cpp/types/type_info `std::type_info`] objects,
actually) pointing to as many vectors or /segments/ as there are derived classes:
[$poly_collection/img/segment_map.png]
Traversing such a structure reduces to looping over all the segments one after
another: this is extremely efficient both in terms of caching and branch
prediction. In the process we have however lost the free-order capability of a
`std::vector<base*>` (free order can only be retained at the segment level), but
if this is not relevant to the user application the potential performance gains
of switching to this structure are large.
The discussion has focused on base/derived programming, also known as OOP, but it
also applies to other forms of dynamic polymorphism:
* _std::function_ abstracts callable entities with the same given signature
under a common interface. Internally, pointer indirections and virtual-like
function calls are used. Memory fragmentation is expected to be lower than
with OOP, though, as implementations usually feature the so-called
[@http://talesofcpp.fusionfenix.com/post-16/episode-nine-erasing-the-concrete#a-note-on-small-buffer-optimization [' small buffer optimization]]
to avoid heap allocation in some situations.
* The case of `std::function` can be seen as a particular example of a
more general form of polymorphism called _duck_typing_, where unrelated types
are treated uniformly if they conform to the same given /interface/ (a specified
set of member functions and/or operations). Duck typing provides the power of
OOP while allowing for greater flexibility as polymorphic types need not derive
from a preexisting base class or in general be designed with any particular
interface in mind --in fact, the same object can be duck-typed to different
interfaces. Among other libraries, _Boost.TypeErasure_ provides duck typing
for C++. Under the hood, duck typing requires pointer indirection and virtual
call implementation techniques analogous to those of OOP, and so there are
the same opportunities for efficient container data structures as we have
described.
Boost.PolyCollection provides three different container class templates
dealing with OOP, `std::function`-like polymorphism and duck typing as
implemented by Boost.TypeErasure:
* `boost::base_collection`
* `boost::function_collection`
* `boost::any_collection`
The interfaces of these containers are mostly the same and follow the usual
conventions of standard library containers.
[endsect]
[section Tutorial]
[section Basics]
[import ../example/rolegame.hpp]
[section `boost::base_collection`]
[import ../example/basic_base.cpp]
(Code samples from [@boost:/libs/poly_collection/example/basic_base.cpp `basic_base.cpp`].)
Imagine we are developing a role playing game in C++ where sprites are rendered on
screen; for the purposes of illustration we can model rendering simply as
outputting some information to a `std::ostream`:
[rolegame_1]
The game features warriors, juggernauts (a special type of warrior) and goblins,
each represented by its own class derived (directly or indirectly) from `sprite`:
[rolegame_2]
Let us populate a polymorphic collection with an assortment of sprites:
[basic_base_1]
There are two aspects to notice here:
* `boost::base_collection` does not have a `push_back` member function like, say,
`std::vector`, as the order in which elements are stored cannot be freely
chosen by the user code \u2014we will see more about this later. The insertion mechanisms
are rather those of containers like _std::unordered_multiset_, namely `insert` and
`emplace` with or without a position /hint/.
* Elements are not created with `new` but constructed on the stack and passed
directly much like one would do with a standard non-polymorphic container.
[important Elements inserted into a `boost::base_collection` (or the other
containers of Boost.PolyCollection) must be copyable and assignable; strictly
speaking, they must at least model _MoveConstructible_ and either be
_MoveAssignable_ or not throw on move construction. This might
force you to revisit your code as it is customary to explicitly forbid
copying at the base level of a virtual hierarchy to avoid
[@https://en.wikipedia.org/wiki/Object_slicing ['slicing]].]
Rendering can now be implemented with a simple `for` loop over `c`:
[basic_base_2]
The output being:
[pre
juggernaut 0,juggernaut 4,juggernaut 7,goblin 1,goblin 3,goblin 5,warrior 2,warrior 6
]
As we have forewarned, the traversal order does not correspond to that of insertion.
Instead, the elements are grouped in /segments/ according to their concrete type.
Here we see that juggernauts come first, followed by goblins and warriors. In
general, no assumptions should be made about segment ordering, which may be
different for this very example in your environment. On the other hand, elements
inserted into an already existing segment always come at the end (except if a hint
is provided). For instance, after inserting a latecomer goblin with `id==8`:
[basic_base_3]
the rendering loop outputs (new element in red):
[pre
juggernaut 0,juggernaut 4,juggernaut 7,goblin 1,goblin 3,goblin 5,[role red goblin 8],warrior 2,warrior 6
]
Deletion can be done in the usual way:
[basic_base_4]
Now rendering emits:
[pre
juggernaut 0,juggernaut 4,goblin 1,goblin 3,goblin 5,goblin 8,warrior 2,warrior 6
]
[endsect]
[section `boost::function_collection`]
[import ../example/basic_function.cpp]
(Code samples from [@boost:/libs/poly_collection/example/basic_function.cpp `basic_function.cpp`].
C++14 `std::make_unique` is used.)
Well into the development of the game, the product manager requests that two new
types of entities be added to the rendering loop:
* Overlaid messages, such as scores, modelled as `std::string`s.
* Pop-up windows coming from a third party library that, unfortunately, do not
derive from `sprite` and use their own `display` member functon for rendering:
[rolegame_3]
We then decide to refactor the code so that sprites, messsages and windows are
stored in dedicated containers:
[basic_function_1]
and define our rendering container as a `boost::function_collection` of
/callable entities/ \u2014function pointers or objects with a function
call `operator()`\u2014 accepting a `std::ostream&` as a parameter
[basic_function_2]
which we fill with suitable adaptors for `sprite`s, `std::string`s and
`window`s, respectively. Lambda functions allow for a particularly terse code.
[basic_function_3]
The rendering loop now looks like this:
[basic_function_4]
and produces the following for a particular scenario of sprites, messages and
windows:
[pre
juggernaut 0,goblin 1,warrior 2,goblin 3,"stamina: 10,000","game over",'''[pop-up 1]''','''[pop-up 2]'''
]
The container we have just created works in many respects like a
`std::vector<std::function<render_callback>>`, the main difference being
that with `boost::function_collection` callable entities of the same type
are packed together in memory
[footnote Note that all `sprite`s come into one segment: this is why
goblins #1 and #3 are not adjacent. Exercise for the reader: change
the code of the example so that sprites are further segmented according
to their concrete type.], thus increasing performance (which is the whole
point of using Boost.PolyCollection), while a vector of `std::function`s
results in an individual allocation for each entity stored
[footnote Except when small buffer optimization applies.].
In fact, the `value_type` elements held by a
`boost::function_collection` are actually /not/ `std::function`s,
although they behave analogously and can be converted to `std::function` if needed:
[basic_function_5]
There is a reason for this: elements of a polymorphic collection cannot be freely
assigned to by the user as this could end up trying to insert an object into a
segment of a different type. So, unlike with `std::function`, this will not work:
[basic_function_6]
[endsect]
[section `boost::any_collection`]
[import ../example/basic_any.cpp]
(Code samples from [@boost:/libs/poly_collection/example/basic_any.cpp `basic_any.cpp`].)
[note Here we just touch on the bare essentials of _Boost.TypeErasure_ needed
to present `boost::any_collection`. The reader is advised to read
Boost.TypeErasure documentation for further information.]
After measuring the performance of the latest changes, we find that rendering
is too slow and decide to refactor once again: if we could store all the
entities --sprites, messages and windows-- into one single container, that would
eliminate a level of indirection. The problem is that these types are totally
unrelated to each other.
_Boost.TypeErasure_ provides a class template `boost::type_erasure::any<Concept>`
able to hold an object of whatever type as long as it conforms to the interface
specified by `Concept`. In our case, we find it particularly easy to implement
a common interface for rendering by providing overloads of `operator<<`
[basic_any_1]
so that we can use it to specify the interface all entities have to adhere to:
[basic_any_2]
The collection just created happily accepts insertion of heterogeneous entities
(since interface conformity is silently checked at compile time)
[basic_any_3]
and rendering becomes
[basic_any_4]
with output
[pre
'''[pop-up 1]''','''[pop-up 2]''',juggernaut 0,goblin 1,goblin 3,warrior 2,"stamina: 10,000","game over"
]
As was the case with `boost::function_collection`, this container is similar
to a `std::vector<boost::type_erasure::any<Concept>>` but has better performance
due to packing of same-type elements. Also, the `value_type` of a
`boost::any_collection<Concept>` is /not/ `boost::type_erasure::any<Concept>`,
but a similarly behaving entity
[footnote Actually, it is
`boost::type_erasure::any<Concept2,boost::type_erasure::_self&>` for some
internally defined `Concept2` that extends `Concept`.].
In any case, we are not accessing sprites through an abstract `sprite&`
anymore, so we could as well dismantle the virtual hierarchy and implement
each type autonomously: this is left as an exercise to the reader.
[endsect]
[endsect]
[section Deeper into the segmented nature of Boost.PolyCollection]
[import ../example/segmented_structure.cpp]
(Code samples from [@boost:/libs/poly_collection/example/segmented_structure.cpp `segmented_structure.cpp`].
C++14 `std::make_unique` is used.)
[section Type registration]
Getting back to our [link poly_collection.tutorial.basics.boost_base_collection `boost::base_collection`]
example, suppose we want to refactor the populating code as follows:
[segmented_structure_1]
Unexpectedly, this piece of code throws an exception of type
[link poly_collection.reference.header_boost_poly_collection_exc.class_unregistered_type `boost::poly_collection::unregistered_type`].
What has changed from our original code?
Suppose a `warrior` has been created by `make_sprite`. The statement
`c.insert(*make_sprite())` is passing the object as a `sprite&`: even though
`boost::base_collection` is smart enough to know that the object is actually
derived from `sprite` (by using
[@http://en.cppreference.com/w/cpp/language/typeid `typeid()`]) and
[@https://en.wikipedia.org/wiki/Object_slicing slicing] is to be avoided,
there is no way that a segment for it can be created without accessing
the type `warrior` /at compile time/ for the proper internal class templates
to be instantiated
[footnote If this is conceptually difficult to grasp, consider the potentially
more obvious case where `warrior` is defined in a dynamic module linked to
the main program: the code of `boost::base_collection`, which has been compiled
before linking, cannot even know the size of this as-of-yet unseen class, so
hardly can it allocate a segment for the received object.].
This did not happen in the pre-refactoring code because objects were passed
as references to their true types.
A type is said to be /registered/ into a polymorphic collection if
there is a (potentially empty) segment created for it. This can be checked
with:
[segmented_structure_2]
Registration happens automatically when the object is passed as a reference
to its true type or
[link poly_collection.tutorial.insertion_and_emplacement.emplacement `emplace`]'d,
and explicitly with
`register_types`:
[segmented_structure_3]
Once `T` has been registered into a polymorphic collection, it remains
so regardless of whether objects of type `T` are stored or not, except if
the collection is moved from, assigned to, or swapped.
As slicing is mainly an issue with OOP, `unregistered_type` exceptions are
expected to be much less frequent with `boost::function_collection` and
`boost::any_collection`. Contrived examples can be found, though:
[segmented_structure_4]
[endsect]
[section Segment-specific interface]
For most of the interface of a polymorphic collection, it is possible to only
refer to the elements of a given segment by providing either their type or
`typeid()`. For instance:
[segmented_structure_5]
Note that any of these (except
[link poly_collection.tutorial.deeper_into_the_segmented_nature.reserve `reserve`])
will throw `boost::poly_collection::unregistered_type` if the type is not
registered. Segment-specific addressability also includes traversal:
[endsect]
[section Local iterators]
The following runs only over the `warrior`s of the collection:
[segmented_structure_6]
Output:
[pre
warrior 2,warrior 6
]
`begin|end(typeid(T))` return objects of type `local_base_iterator`
associated to the segment for `T`. These iterators dereference to the same
value as regular iterators (in the case of `boost::base_collection<base>`,
`base&`) but can only be used to traverse a given segment (for instance,
`local_base_iterator`'s from different segments cannot be compared between them).
In exchange, `local_base_iterator` is a _RandomAccessIterator_, whereas
whole-collection iterators only model _ForwardIterator_.
A terser range-based `for` loop can be used with the convenience `segment`
member function:
[segmented_structure_7]
Even more powerful than `local_base_iterator` is `local_iterator<T>`:
[segmented_structure_8]
This appends a `"super"` prefix to the `rank` data member of existing warriors:
[pre
superwarrior 2,superwarrior 6
]
The observant reader will have noticed that in order to access `rank`, which
is a member of `warrior` rather than its base class `sprite`, `first` (or
`x` in the range `for` loop version) has to refer to a `warrior&`, and this
is precisely the difference between `local_iterator<warrior>` (the type
returned by `begin<warrior>()`) and `local_base_iterator`.
`local_iterator<warrior>` is also a _RandomAccessIterator_: for most respects,
\[`begin<T>()`, `end<T>()`) can be regarded as a range over an array of `T`
objects. `local_iterator<T>`s can be explicitly converted to
`local_base_iterator`s. Conversely, if a `local_base_iterator` is associated
to a segment for `T`, it can then be explictly converted to a
`local_iterator<T>` (otherwise the conversion is undefined behavior).
The figure shows the action scopes of all the iterators associated to
a polymorphic collection (`const_` versions not included):
[$ poly_collection/img/poly_collection_iterators.png]
We have yet to describe the bottom part of the diagram.
Whereas `segment(typeid(T))` is used to range over the /elements/
of a particular segment, `segment_traversal()` returns an object for ranging over
/segments/, so that the whole collection can be processed with a nested
segment-element `for` loop like the following:
[segmented_structure_9]
Segment decomposition of traversal loops forms the basis of
[link poly_collection.tutorial.algorithms improved-performance algorithms].
[endsect]
[section Reserve]
Much like `std::vector`, segments can be made to reserve memory in
advance to minimize reallocations:
[segmented_structure_10]
If there is no segment for the indicated type (here, `goblin`), one is
automatically created. This is in contrast with the rest of segment-specific
member functions, which throw
[link poly_collection.tutorial.deeper_into_the_segmented_nature.type_registration
`boost::poly_collection::unregistered_type`].
`reserve` can deal with all segments at once. The following
[segmented_structure_11]
instructs every /existing/ segment to reserve 1,000 elements. If a
segment is created later (through element insertion or with
[link poly_collection.tutorial.deeper_into_the_segmented_nature.type_registration
type registration]), its capacity will be different.
[note Unlike standard containers, collection-level `capacity()` and
`max_size()` are not provided because their usual semantics cannot be
applied to Boost.PolyCollection: for instance, `capacity()` is typically
used to check how many elements can be inserted without reallocation,
but in a segmented structure that depends on the exact types of the
elements and whether they are registered or not.
]
[endsect]
[endsect]
[section Insertion and emplacement]
[import ../example/insertion_emplacement.cpp]
(Code samples from [@boost:/libs/poly_collection/example/insertion_emplacement.cpp `insertion_emplacement.cpp`].)
We already know that `insert(x)`, without further positional information,
stores `x` at the end of its associated segment. When a regular iterator `it`
is provided, insertion happens at the position indicated if both `it` and
`x` belong in the same segment; otherwise, `it` is ignored. For instance,
if our sprite collection has the following entities:
[pre
juggernaut 0,juggernaut 4,juggernaut 7,goblin 1,goblin 3,goblin 5,warrior 2,warrior 6
]
then this code:
[insertion_emplacement_1]
puts the new `juggernaut` at the beginning:
[pre
[role red juggernaut 8],juggernaut 0,juggernaut 4,juggernaut 7,goblin 1,...
]
whereas the position hint at
[insertion_emplacement_2]
is ignored and the new `goblin` gets inserted at the end of its segment:
[pre
juggernaut 8,...,juggernaut 7,goblin 1,goblin 3,goblin 5,[role red goblin 9],warrior 2,...
]
[link poly_collection.tutorial.deeper_into_the_segmented_nature.local_iterators Local iterators]
can also be used to indicate the insertion position:
[insertion_emplacement_3]
[pre
juggernaut 8,juggernaut 0,[role red juggernaut 10],juggernaut 4,juggernaut 7,goblin 1,...
]
There is a caveat, though: when using a local iterator, the element inserted
/must belong to the indicated segment/:
[insertion_emplacement_4]
Member functions are provided for range insertion, with and without a position
hint:
[insertion_emplacement_5]
As [link poly_collection.tutorial.deeper_into_the_segmented_nature.type_registration already explained],
care must be taken that types be pre-registered into the collection if
they are not passed as references to their actual type. So, why did not
this line
[insertion_emplacement_6]
throw `boost::poly_collection::unregistered_type`? As it happens, in the
special case where the inserted range belongs to a polymorphic collection
of the same type, registration is done automatically
[footnote That is, Boost.PolyCollection has enough static information to
do type registration without further assistance from the user.].
[#poly_collection.tutorial.insertion_and_emplacement.emplacement]
Emplacement is slightly different for Boost.PolyCollection than with
standard containers. Consider this attempt at emplacing a `goblin`:
[insertion_emplacement_7]
If examined carefully, it is only natural that the code above not compile:
Boost.PolyCollection cannot possibly know that it is precisely a `goblin`
that we want to emplace and not some other type constructible from an `int`
(like `warrior`, `juggernaut` or an unrelated class). So, the type of
the emplaced element has to be specified explicitly:
[insertion_emplacement_8]
As with `insert`, a position can be indicated for emplacing:
[insertion_emplacement_9]
Note the naming here: `emplace_hint` is used when the position is indicated with
a regular iterator, and for local iterators it is `emplace_pos`.
[endsect]
[section Exceptions]
[import ../example/exceptions.cpp]
(Code samples from [@boost:/libs/poly_collection/example/exceptions.cpp `exceptions.cpp`].)
Besides the usual exceptions like
[@http://en.cppreference.com/w/cpp/memory/new/bad_alloc `std::bad_alloc`] and
those generated by user-provided types, polymorphic collections can throw the following:
* `boost::poly_collection::unregistered_type`
* `boost::poly_collection::not_copy_constructible`
* `boost::poly_collection::not_equality_comparable`
The situations in which the first is raised have been
[link poly_collection.tutorial.deeper_into_the_segmented_nature.type_registration already discussed];
let us focus on the other two.
We have a new type of sprite in our game defined by the non-copyable
class `elf`:
[rolegame_4]
and we use it without problems until we write this:
[exceptions_1]
The first insertion works because the `elf` object passed is /temporary/
and can be /moved/ into the container, but the second statement actually
needs to /copy/ the `elf` elements in `c` to `c2`, hence the exception.
The potentially surprising aspect of this behavior is that standard
containers signal this kind of problems by /failing at compile time/.
Here we cannot afford this luxury because the exact types contained in a
polymorphic collection are not known until run time (for instance,
if `elf` elements had been erased before copying `c` to `c2` everything
would have worked): basically, the deferral of errors from compile time
to run time is an intrinsic feature of dynamic polymorphism.
In the same vein, equality comparison requires that elements themselves
be equality comparable:
[exceptions_2]
The above is unremarkable once we notice we have not defined `operator==`
for any `sprite`. The problem may go unnoticed for quite some time, however,
because determining that two polymorphic collections are equal (i.e.
all their non-empty segments are equal) can return `false` without
comparing anything at all (for instance, if segment sizes differ), in which
case no exception is thrown.
[note Operators for `<`, `<=`, `>` and `>=` comparison are not provided
because /segment/ order is not fixed and may vary across otherwise
identical collections. The situation is similar to that of standard
[@http://en.cppreference.com/w/cpp/named_req/UnorderedAssociativeContainer unordered
associative containers.]
]
These three are all the intrinsic exceptions thrown by Boost.PolyCollection.
The implication is that if a type is _CopyConstructible_,
_MoveAssignable_ (or move construction does not throw) and _EqualityComparable_, then
the entire interface of Boost.PolyCollection is unrestrictedly available for it
[footnote Provided, of course, that the type has the /right/ to be in the collection,
that is, it is derived from the specified base, or callable with the specified
signature, etc.].
[endsect]
[section Algorithms]
[import ../example/algorithms.cpp]
(Code samples from [@boost:/libs/poly_collection/example/algorithms.cpp `algorithms.cpp`].
C++14 generic lambda expressions are used.)
The ultimate purpose of Boost.PolyCollection is to speed up the processing of
large quantities of polymorphic entities, in particular for those operations
that involve linear traversal as implemented with a `for`-loop or using the
quintessential _std::for_each_ algorithm.
[algorithms_1]
Replacing the container used in the program from classic alternatives to
Boost.PolyCollection:
* `std::vector<base*>` (or similar) → `boost::base_collection<base>`
* `std::vector<std::function<signature>>` → `boost::function_collection<signature>`
* `std::vector<boost::type_erasure::any<concept_>>` → `boost::any_collection<concept_>`
is expected to increase performance due to better caching and branch prediction
behavior. But there is still room for improvement.
Consider this transformation of the code above using a double
segment-element loop (based on the
[link poly_collection.tutorial.deeper_into_the_segmented_nature.local_iterators local iterator]
capabilities of Boost.PolyCollection):
[algorithms_2]
Although not obvious at first sight, this version of the same basic operation
is actually /faster/ than the first one: for a segmented structure such as used
by Boost.PolyCollection, each iteration with the non-local iterator passed to
`std::for_each` involves:
# hopping to next position in the segment,
# checking for /end of segment/ (always),
# if at end (infrequent), selecting the next segment,
# checking for /end of range/ (always),
whereas in the second version, iteration on the inner loop, where most
processing happens, is a simple increment-and-check operation, that is,
there is one less check (which happens at the much shorter outer loop). When
the workload of the algorithm (the actually useful stuff done with the elements
themselves) is relatively light, the overhead of looping can be very
significant.
To make it easier for the user to take advantage of faster segment-element
looping, Boost.PolyCollection provides its own version of `for_each` based
on that technique:
[algorithms_3]
`boost::poly_collection::for_each` has the same interface and behavior as
`std::for_each` except for the fact that it only works for (non-local)
iterators of a polymorphic container
[footnote For any other type of iterator, it is guaranteed not to compile.].
Versions of other standard algorithms are provided as well:
[algorithms_4]
In fact, variants are given of most (though not all) of the algorithms in
[@http://en.cppreference.com/w/cpp/algorithm `<algorithm>`]
among those that are compatible with polymorphic collections
[footnote For example, algorithms requiring bidirectional iterators or a higher
category are not provided because polymorphic collections have forward-only
iterators.].
Check the [link poly_collection.reference.header_boost_poly_collection_alg reference]
for details.
[section Type restitution]
By /type restitution/ we mean the generic process of getting a concrete entity
from an abstract one by providing missing type information:
[algorithms_5]
Type restitution has the potential to increase performance. Consider for
instance the following:
[algorithms_6]
Behaviorwise this code is equivalent to simply executing `std::cout<<r<<"\n"`,
but when type restitution succeeds the statement `std::cout<<str<<"\n"` is
skipping a virtual-like call that would have happened if `r` were used instead.
From a general point of view, supplying the compiler with extra type
information allows it to perform more optimizations than in the abstract case,
inlining being the prime example.
All Boost.PolyCollection algorithms accept an optional list of types for
restitution. Let us use the
[link poly_collection.tutorial.basics.boost_any_collection `boost::any_collection`]
scenario to illustrate this point:
[algorithms_7]
Output:
[pre
warrior 2,warrior 6,[pop-up 1],[pop-up 2],juggernaut 0,juggernaut 4,
juggernaut 7,goblin 1,goblin 3,goblin 5,"stamina: 10,000","game over"
]
This rendering loop differs from the non-restituting one in two aspects:
* A list of types is provided as template arguments to the algorithm. This is
an indication that the types /may/ be actually present in the collection, not
a promise. Also, the list of types need not be exhaustive, that is, some
unlisted types could be present in the collection \u2014in the example above,
the loop traverses *all* elements (including `std::string`s and `window`s),
not only those corresponding to restituted types. In general, the more types
are restituted, the greater the potential improvement in performance.
* The lambda function passed is a generic one accepting its argument as
`const auto&`
[footnote This requires C++14, but the same effect can be achieved in C++11
providing an equivalent, if more cumbersome, functor with a templatized
call operator.].
Internally, `boost::poly_collection::for_each` checks for each segment if its
type, say `T`, belongs in the type restitution list: if this is the case, the lambda
function is passed a `const T&` rather than the generic
`const boost::any_collection::value_type&`. For each restituted type we are
saving indirection calls and possibly getting inlining optimizations, etc.
As we see in the
[link poly_collection.performance performance section], the speedup can be
very significant.
Type restitution works equally for the rest of collections of
Boost.PolyCollection. When using `boost::base_collection`, though, the case
of improved performance is more tricky:
[algorithms_8]
The problem here is that, even though the argument to the lambda function will
be restituted (when appropriate) to `warrior&`, `juggernaut&` or `goblin&`,
using it still involves doing a virtual function call in `s.render(std::cout)`.
Whether this call is resolved to a non-virtual one depends on the quality
of implementation of the compiler:
* If the concrete class is marked as
[@http://en.cppreference.com/w/cpp/language/final `final`], the compiler /in
principle/ has enough information to get rid of the virtual function call.
* Other than this,
[@http://hubicka.blogspot.com.es/2014/01/devirtualization-in-c-part-1.html ['devirtualization]]
capabilities /may/ be able to figure out that the type restitution scenario
is actually casting the element to its true type, in which case, again, virtual
calls are not needed.
[endsect]
[endsect]
[endsect]
[section Performance]
[def _vs2015_x86_ Visual Studio 2015 x86]
[def _vs2015_x64_ Visual Studio 2015 x64]
[def _gcc63_x64_ GCC 6.3 x64]
[def _clang40_x64_ Clang 4.0 x64]
[template perf_fig[name environment file]
[section [environment]]
[role aligncenter
'''<inlinemediaobject>
<imageobject><imagedata fileref="'''poly_collection/img/[file].png'''"/></imageobject>
</inlinemediaobject>'''
]
[role aligncenter [*[name], [environment]]]
[endsect]
]
[import ../example/perf.cpp]
(Testing program at [@boost:/libs/poly_collection/example/perf.cpp `perf.cpp`].)
We ran tests to measure the performance of the containers of
Boost.PolyCollection in two scenarios:
* Insertion of elements.
* Linear traversal and processing with `std::for_each` and `boost::poly_collection::for_each`
(with and without
[link poly_collection.tutorial.algorithms.type_restitution type restitution]).
As a comparison baseline we used containers and facilities from the
standard library and Boost (details below). Tests were run in the
following environments:
* *_vs2015_x86_*: Visual C++ 2015 in 32-bit (x86) release mode, Windows 7 64-bit, Intel Core i5-2520M @2.5GHz
* *_vs2015_x64_*: Visual C++ 2015 in 64-bit (x64) release mode, same machine
* *_gcc63_x64_*: GCC 6.3 release mode, Xubuntu 17.04 x64, Intel Core i7-5820 @3.3GHz
* *_clang40_x64_*: Clang 4.0 release mode, same machine
[section Container definitions]
[section `boost::base_collection`]
* Baseline container: `ptr_vector` = `boost::ptr_vector<base>`
* Polymorphic collection: `base_collection` = `boost::base_collection<base>`
* Element types: `T1` = `derived1`, `T2` = `derived2`, `T3` = `derived3`
[perf_base_types]
[endsect]
[section `boost::function_collection`]
* Baseline container: `func_vector` = `std::vector<std::function<int(int)>>`
* Polymorphic collection: `function_collection` = `boost::function_collection<int(int)>`
* Element types: `T1` = `concrete1`, `T2` = `concrete2`, `T3` = `concrete3`
[perf_function_types]
[endsect]
[section `boost::any_collection`]
* Baseline container: `any_vector` = `std::vector<boost::type_erasure::any<concept_>>`
* Polymorphic collection: `any_collection` = `boost::any_collection<concept_>`
* Element types: `T1` = `int`, `T2` = `double`, `T3` = `char`
[perf_any_types]
[endsect]
[endsect]
[section Insertion tests]
Tests measure the time taken to insert /n/ elements (/n/ between
10'''<superscript>2</superscript>''' and 10'''<superscript>7</superscript>''')
from a source of values with types following the cyclic sequence
`T1` `T1` `T2` `T2` `T3` `T1` `T1` `T2` `T2` `T3` ···
No `reserve` operation is done before insertion.
The figures show resulting times in nanoseconds/element.
The horizontal axis is logarithmic.
[section Results for `boost::base_collection`]
[perf_fig Insertion.._vs2015_x86_..insert_base_vs2015_x86]
[perf_fig Insertion.._vs2015_x64_..insert_base_vs2015_x64]
[perf_fig Insertion.._gcc63_x64_..insert_base_gcc63_x64]
[perf_fig Insertion.._clang40_x64_..insert_base_clang40_x64]
[endsect]
[section Results for `boost::function_collection`]
[perf_fig Insertion.._vs2015_x86_..insert_function_vs2015_x86]
[perf_fig Insertion.._vs2015_x64_..insert_function_vs2015_x64]
[perf_fig Insertion.._gcc63_x64_..insert_function_gcc63_x64]
[perf_fig Insertion.._clang40_x64_..insert_function_clang40_x64]
[endsect]
[section Results for `boost::any_collection`]
[perf_fig Insertion.._vs2015_x86_..insert_any_vs2015_x86]
[perf_fig Insertion.._vs2015_x64_..insert_any_vs2015_x64]
[perf_fig Insertion.._gcc63_x64_..insert_any_gcc63_x64]
[perf_fig Insertion.._clang40_x64_..insert_any_clang40_x64]
[endsect]
[endsect]
[section Processing tests]
Tests measure the time taken to traverse a container of size
/n/ (/n/ between
10'''<superscript>2</superscript>''' and 10'''<superscript>7</superscript>''')
and execute an operation on each of its elements. The operation for
`boost::base_collection` and `boost::function_collection` (and the associated
baseline containers) is defined as
[perf_for_each_callable]
whereas for `boost::any_collection` we use
[perf_for_each_incrementable]
The baseline container is tested with three different setups:
* Directly as initialized by the process described for the
[link poly_collection.performance.insertion_tests insertion tests].
The sequence of types is complex enough that CPU's branch prediction mechanisms
are not able to fully anticipate it
[footnote This has been verified empirically: simpler cycles did indeed
yield better execution times.]. As elements are ordered according to their
construction time, certain degree of memory contiguity is expected.
* With an extra post-insertion stage by which elements are sorted
according to their `typeid()`. This increases branch prediction
efficiency at the expense of having worse cache friendliness.
* With an extra post-insertion stage that randomly
shuffles all the elements in the container. This is the worst possible
scenario both in terms of caching and branch prediction.
As for the polymorphic collection, three variations are measured:
* With `std::for_each` (the same as the baseline container).
* Using `boost::poly_collection::for_each`.
* Using `boost::poly_collection::for_each` with
[link poly_collection.tutorial.algorithms.type_restitution /type restitution/]
of `T1`, `T2` and `T3`.
The figures show resulting times in nanoseconds/element.
The horizontal axis is logarithmic.
[section Results for `boost::base_collection`]
[perf_fig Processing.._vs2015_x86_..for_each_base_vs2015_x86]
[perf_fig Processing.._vs2015_x64_..for_each_base_vs2015_x64]
[perf_fig Processing.._gcc63_x64_..for_each_base_gcc63_x64]
[perf_fig Processing.._clang40_x64_..for_each_base_clang40_x64]
[endsect]
[section Results for `boost::function_collection`]
[perf_fig Processing.._vs2015_x86_..for_each_function_vs2015_x86]
[perf_fig Processing.._vs2015_x64_..for_each_function_vs2015_x64]
[perf_fig Processing.._gcc63_x64_..for_each_function_gcc63_x64]
[perf_fig Processing.._clang40_x64_..for_each_function_clang40_x64]
[endsect]
[section Results for `boost::any_collection`]
[perf_fig Processing.._vs2015_x86_..for_each_any_vs2015_x86]
[perf_fig Processing.._vs2015_x64_..for_each_any_vs2015_x64]
[perf_fig Processing.._gcc63_x64_..for_each_any_gcc63_x64]
[perf_fig Processing.._clang40_x64_..for_each_any_clang40_x64]
[endsect]
[endsect]
[endsect]
[include reference.qbk]
[section Future work]
A number of features asked by reviewers and users of Boost.PolyCollection are
considered for inclusion into future versions of the library.
[section Alternative RTTI systems]
Boost.PolyCollection can be extended to use _Boost.TypeIndex_ in RTTI-challenged
scenarios. Taking this idea further, it is not unusual that some environments
(game engines, for instance) provide their own RTTI framework: an even more
ambitious extension to Boost.PolyCollection would then be to make it configurable
for user-provided RTTI through some sort of traits class specifying replacements for
[@http://en.cppreference.com/w/cpp/types/type_info `std::type_info`]
and [@http://en.cppreference.com/w/cpp/language/typeid `typeid`].
[endsect]
[section Copy traits]
`boost::base_collection` requires that stored objects be _MoveConstructible_ and
_MoveAssignable_; unfortunately, it is customary to restrict copying in OOP
hierarchies to avoid slicing, which would force users to revisit their class
definitions in order to use Boost.PolyCollection. This can be alleviated by
offering a configurable traits class where copy and assignment can be defined
externally
``
template<typename T>
struct copy_traits
{
void construct(void*,T&&);
void assign(T&,T&&);
};
``
with default implementations resorting to regular placement `new` and
`T::operator=`.
[endsect]
[section Parallel algorithms]
C++17 introduces
[@http://en.cppreference.com/w/cpp/experimental/parallelism parallel algorithms],
like for instance a parallel version of _std::for_each_; it is only
natural then to provide the corresponding
[link poly_collection.tutorial.algorithms Boost.PolyCollection-specific algorithms].
The segmented nature of polymorphic collections makes them particularly amenable to
parallel processing.
[endsect]
[section `variant_collection`]
/Closed polymorphism/ is a kind of dynamic polymorphism where the set of implementation
types is fixed at definition time: the prime example of this paradigm in C++ is
[@http://en.cppreference.com/w/cpp/utility/variant `std::variant`]. Although
`boost::any_collection<boost::mpl::vector<>>` can act as a sort of replacement for
`std::vector<std::variant<T1,...,TN>>`, this is in fact more similar to a
`std::vector<`[@http://en.cppreference.com/w/cpp/utility/any `std::any`]`>`,
and a collection class `boost::variant_collection<T1,...,TN>` could be designed to
better model closed polymorphism and take further advantage of the fact that
implementation types are fixed (for instance, internal virtual calls can be
completely eliminated). From a conceptual point of view, this would require introducing
a new [* `ClosedPolymorphicCollection`] notion and renaming the current
[link poly_collection.reference.polymorphic_containers.polymorphic_collections [* `PolymorphicCollection`]]
model to [* `OpenPolymorphicCollection`].
[endsect]
[section Ordered polymorphic collections]
Users have expressed interest in polymorphic collections where elements are kept
ordered within their segment and optionally duplicates are excluded, much like
[@http://boost.org/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx `boost::flat_set`/`boost::flat_multiset`]
do over their internal data vector. The open question remains of whether these
collections should also guarantee some order between segments (current ones don't)
to allow for the definition of container-level `operator<` and related operators.
[endsect]
[endsect]
[section Release notes]
[section Boost 1.72]
* Maintenance work.
[endsect]
[section Boost 1.71]
* Maintenance work.
[endsect]
[section Boost 1.70]
* Improved handling of stateful allocators and allocator propagation traits,
after an error reported by Billy O'Neal ([github_pr poly_collection 9]).
* Fixed a potentially serious bug with an internal cache structure.
[endsect]
[section Boost 1.69]
* Added Boost.PolyCollection-specific versions of algorithms
`std::for_each_n` and `std::sample`.
[endsect]
[section Boost 1.67]
* Maintenance fixes.
[endsect]
[section Boost 1.66]
* Boost.PolyCollection has been backported to GCC 4.8 to 4.9 and Clang 3.3 to 3.6.
The version of libstdc++-v3 shipped with GCC 4.8 (which can also be used by Clang)
has deficiencies that result in the following limitations when using
Boost.PolyCollection:
* Stateful allocators are not properly supported.
* Allocator-extended move construction decays to allocator-extended copy construction.
* Copy construction crashes if an exception is thrown during element copying.
* Maintenance fixes.
[endsect]
[section Boost 1.65]
* Initial release.
[endsect]
[endsect]
[section Acknowledgments]
The library uses [@http://www.pdimov.com/ Peter Dimov]'s
[@http://www.pdimov.com/cpp2/simple_cxx11_metaprogramming.html implementation
of `std::make_index_sequence`].
[@http://manu343726.github.io/ Manu Sánchez] aided immensely with CI setup and
performance testing. [@http://stackoverflow.com/users/85371/sehe Sehe]
contributed performance results for GCC 5.2, and
[@https://www.linkedin.com/in/francisco-jose-tapia-iba%C3%B1ez-4239a07a Francisco
José Tapia] for Clang 3.9.
The Boost acceptance review took place between the 3rd and 17th of May, 2017. Thank you
to Ion Gaztañaga for smoothly managing the process. The following people participated
with full reviews or valuable comments:
Pete Bartlett, Hans Dembinski, Dominique Devienne, Edward Diener, Vinnie Falco,
Ion Gaztañaga, Andrzej Krzemienski, Brook Milligan, Thorsten Ottosen, Steven Watanabe,
Adam Wulkiewicz. Many thanks to all of them. Steven Watanabe gave crucial help in
solving some hard problems related to the usage of _Boost.TypeErasure_.
Boost.PolyCollection was designed and written between rainy
[@https://es.wikipedia.org/wiki/Viav%C3%A9lez Viavélez], noisy Madrid and beautiful
[@https://en.wikipedia.org/wiki/C%C3%A1ceres,_Spain Cáceres], August-November, 2016.
Most of the after-review work in preparation for the official Boost release was done
in the quiet town of [@https://es.wikipedia.org/wiki/Oropesa_(Toledo) Oropesa] during
the spring of 2017.
In memory of Joaquín López Borrella (1939-2015), in memory of Héctor (2004-2017):
may your ghosts keep us company.
[endsect]