8263fb21a0
[SVN r46368]
173 lines
8.0 KiB
Plaintext
173 lines
8.0 KiB
Plaintext
[/
|
|
/ Copyright (c) 2008 Eric Niebler
|
|
/
|
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
/]
|
|
|
|
[section Cycle collection with [^tracking_ptr<>]]
|
|
|
|
In xpressive, regex objects can refer to each other and themselves by value or by reference.
|
|
In addition, they ref-count their referenced regexes to keep them alive. This creates the possibility
|
|
for cyclic reference counts, and raises the possibility of memory leaks. xpressive avoids leaks
|
|
by using a type called `tracking_ptr<>`. This doc describes at a high level how `tracking_ptr<>`
|
|
works.
|
|
|
|
[h2 Constraints]
|
|
|
|
Our solution must meet the following design constraints:
|
|
|
|
* No dangling references: All objects referred to directly or indirectly must be kept alive
|
|
as long as the references are needed.
|
|
* No leaks: all objects must be freed eventually.
|
|
* No user intervention: The solution must not require users to explicitly invoke some cycle
|
|
collection routine.
|
|
* Clean-up is no-throw: The collection phase will likely be called from a destructor, so it
|
|
must never throw an exception under any circumstance.
|
|
|
|
[h2 Handle-Body Idiom]
|
|
|
|
To use `tracking_ptr<>`, you must separate your type into a handle and a body. In the case of
|
|
xpressive, the handle type is called `basic_regex<>` and the body is called `regex_impl<>`. The
|
|
handle will store a `tracking_ptr<>` to the body.
|
|
|
|
The body type must inherit from `enable_reference_tracking<>`. This gives the body the bookkeeping
|
|
data structures that `tracking_ptr<>` will use. In particular, it gives the body:
|
|
|
|
# `std::set<shared_ptr<body> > refs_` : collection of bodies to which this body refers, and
|
|
# `std::set<weak_ptr<body> > deps_` : collection of bodies which refer to this body.
|
|
|
|
[h2 References and Dependencies]
|
|
|
|
We refer to (1) above as the "references" and (2) as the "dependencies". It is crucial to the
|
|
understanding of `tracking_ptr<>` to recognize that the set of references includes both those
|
|
objects that are referred to directly as well as those that are referred to indirectly (that
|
|
is, through another reference). The same is true for the set of dependencies. In other words,
|
|
each body holds a ref-count directly to every other body that it needs.
|
|
|
|
Why is this important? Because it means that when a body no longer has a handle referring
|
|
to it, all its references can be released immediately without fear of creating dangling references.
|
|
|
|
References and dependencies cross-pollinate. Here's how it works:
|
|
|
|
# When one object acquires another as a reference, the second object acquires the first as
|
|
a dependency.
|
|
# In addition, the first object acquires all of the second object's references, and the second
|
|
object acquires all of the first object's dependencies.
|
|
# When an object picks up a new reference, the reference is also added to all dependent objects.
|
|
# When an object picks up a new dependency, the dependency is also added to all referenced
|
|
objects.
|
|
# An object is never allowed to have itself as a dependency. Objects may have themselves as
|
|
references, and often do.
|
|
|
|
Consider the following code:
|
|
|
|
sregex expr;
|
|
{
|
|
sregex group = '(' >> by_ref(expr) >> ')'; // (1)
|
|
sregex fact = +_d | group; // (2)
|
|
sregex term = fact >> *(('*' >> fact) | ('/' >> fact)); // (3)
|
|
expr = term >> *(('+' >> term) | ('-' >> term)); // (4)
|
|
} // (5)
|
|
|
|
Here is how the references and dependencies propagate, line by line:
|
|
|
|
[table
|
|
[[Expression][Effects]]
|
|
[[1) `sregex group = '(' >> by_ref(expr) >> ')';`]
|
|
[[^group: cnt=1 refs={expr} deps={}\n
|
|
expr: cnt=2 refs={} deps={group}]]]
|
|
|
|
[[2) `sregex fact = +_d | group;`]
|
|
[[^group: cnt=2 refs={expr} deps={fact}\n
|
|
expr: cnt=3 refs={} deps={group,fact}\n
|
|
fact: cnt=1 refs={expr,group} deps={}]]]
|
|
|
|
[[3) `sregex term = fact >> *(('*' >> fact) | ('/' >> fact));`]
|
|
[[^group: cnt=3 refs={expr} deps={fact,term}\n
|
|
expr: cnt=4 refs={} deps={group,fact,term}\n
|
|
fact: cnt=2 refs={expr,group} deps={term}\n
|
|
term: cnt=1 refs={expr,group,fact} deps={}]]]
|
|
|
|
[[4) `expr = term >> *(('+' >> term) | ('-' >> term));`]
|
|
[[^group: cnt=5 refs={expr,group,fact,term} deps={expr,fact,term}\n
|
|
expr: cnt=5 refs={expr,group,fact,term} deps={group,fact,term}\n
|
|
fact: cnt=5 refs={expr,group,fact,term} deps={expr,group,term}\n
|
|
term: cnt=5 refs={expr,group,fact,term} deps={expr,group,fact}]]]
|
|
|
|
[[5) `}`]
|
|
[[^expr: cnt=2 refs={expr,group,fact,term} deps={group,fact,term}]]]
|
|
]
|
|
|
|
This shows how references and dependencies propagate when creating cycles of objects. After
|
|
line (4), which closes the cycle, every object has a ref-count on every other object, even
|
|
to itself. So how does this not leak? Read on.
|
|
|
|
[h2 Cycle Breaking]
|
|
|
|
Now that the bodies have their sets of references and dependencies, the hard part is done.
|
|
All that remains is to decide when and where to break the cycle. That is the job of `tracking_ptr<>`,
|
|
which is part of the handle. The `tracking_ptr<>` holds 2 `shared_ptr`s. The first, obviously,
|
|
is the `shared_ptr<body>` -- the reference to the body to which this handle refers. The other
|
|
`shared_ptr` is used to break the cycle. It ensures that when all the handles to a body go out
|
|
of scope, the body's set of references is cleared.
|
|
|
|
This suggests that more than one handle can refer to a body. In fact, `tracking_ptr<>` gives
|
|
you copy-on-write semantics -- when you copy a handle, the body is shared. That makes copies
|
|
very efficient. Eventually, all the handles to a particular body go out of scope. When that
|
|
happens, the ref count to the body might still be greater than 0 because some other body (or
|
|
this body itself!) might be holding a reference to it. However, we are certain that the cycle-breaker's
|
|
ref-count goes to 0 because the cycle-breaker only lives in handles. No more handles, no more
|
|
cycle-breakers.
|
|
|
|
What does the cycle-breaker do? Recall that the body has a set of references of type
|
|
`std::set<shared_ptr<body> >`. Let's call this type "references_type". The cycle-breaker is a
|
|
`shared_ptr<references_type>`. It uses a custom deleter, which is defined as follows:
|
|
|
|
template<typename DerivedT>
|
|
struct reference_deleter
|
|
{
|
|
void operator ()(std::set<shared_ptr<DerivedT> > *refs) const
|
|
{
|
|
refs->clear();
|
|
}
|
|
};
|
|
|
|
The job of to the cycle breaker is to ensure that when the last handle to a body goes away,
|
|
the body's set of references is cleared. That's it.
|
|
|
|
We can clearly see how this guarantees that all bodies are cleaned up eventually. Once every
|
|
handle has gone out of scope, all the bodies' sets of references will be cleared, leaving none
|
|
with a non-zero ref-count. No leaks, guaranteed.
|
|
|
|
It's a bit harder to see how this guarantees no dangling references. Imagine that there are 3
|
|
bodies: A, B and C. A refers to B which refers to C. Now all the handles to B go out of scope,
|
|
so B's set of references is cleared. Doesn't this mean that C gets deleted, even though it
|
|
is being used (indirectly) by A? It doesn't. This situation can never occur because we propagated
|
|
the references and dependencies above such that A will be holding a reference directly to C
|
|
in addition to B. When B's set of references is cleared, no bodies get deleted, because they
|
|
are all still in use by A.
|
|
|
|
[h2 Future Work]
|
|
|
|
All these `std::set`s and `shared_ptr`s and `weak_ptr`s! Very inefficient. I used them because
|
|
they were handy. I could probably do better.
|
|
|
|
Also, some objects stick around longer than they need to. Consider:
|
|
|
|
sregex b;
|
|
{
|
|
sregex a = _;
|
|
b = by_ref(a);
|
|
b = _;
|
|
}
|
|
// a is still alive here!
|
|
|
|
Due to the way references and dependencies are propagated, the `std::set` of references can only
|
|
grow. It never shrinks, even when some references are no longer needed. For xpressive this
|
|
isn't an issue. The graphs of referential objects generally stay small and isolated. If someone
|
|
were to try to use `tracking_ptr<>` as a general ref-count-cycle-collection mechanism, this problem
|
|
would have to be addressed.
|
|
|
|
[endsect]
|