948 lines
38 KiB
ReStructuredText
948 lines
38 KiB
ReStructuredText
+++++++++++++++++++++++++++++++++++++++++++
|
|
Building Hybrid Systems with Boost.Python
|
|
+++++++++++++++++++++++++++++++++++++++++++
|
|
|
|
:Author: David Abrahams
|
|
:Contact: dave@boost-consulting.com
|
|
:organization: `Boost Consulting`_
|
|
:date: 2003-05-14
|
|
|
|
:Author: Ralf W. Grosse-Kunstleve
|
|
|
|
:copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved
|
|
|
|
.. contents:: Table of Contents
|
|
|
|
.. _`Boost Consulting`: http://www.boost-consulting.com
|
|
|
|
==========
|
|
Abstract
|
|
==========
|
|
|
|
Boost.Python is an open source C++ library which provides a concise
|
|
IDL-like interface for binding C++ classes and functions to
|
|
Python. Leveraging the full power of C++ compile-time introspection
|
|
and of recently developed metaprogramming techniques, this is achieved
|
|
entirely in pure C++, without introducing a new syntax.
|
|
Boost.Python's rich set of features and high-level interface make it
|
|
possible to engineer packages from the ground up as hybrid systems,
|
|
giving programmers easy and coherent access to both the efficient
|
|
compile-time polymorphism of C++ and the extremely convenient run-time
|
|
polymorphism of Python.
|
|
|
|
==============
|
|
Introduction
|
|
==============
|
|
|
|
Python and C++ are in many ways as different as two languages could
|
|
be: while C++ is usually compiled to machine-code, Python is
|
|
interpreted. Python's dynamic type system is often cited as the
|
|
foundation of its flexibility, while in C++ static typing is the
|
|
cornerstone of its efficiency. C++ has an intricate and difficult
|
|
compile-time meta-language, while in Python, practically everything
|
|
happens at runtime.
|
|
|
|
Yet for many programmers, these very differences mean that Python and
|
|
C++ complement one another perfectly. Performance bottlenecks in
|
|
Python programs can be rewritten in C++ for maximal speed, and
|
|
authors of powerful C++ libraries choose Python as a middleware
|
|
language for its flexible system integration capabilities.
|
|
Furthermore, the surface differences mask some strong similarities:
|
|
|
|
* 'C'-family control structures (if, while, for...)
|
|
|
|
* Support for object-orientation, functional programming, and generic
|
|
programming (these are both *multi-paradigm* programming languages.)
|
|
|
|
* Comprehensive operator overloading facilities, recognizing the
|
|
importance of syntactic variability for readability and
|
|
expressivity.
|
|
|
|
* High-level concepts such as collections and iterators.
|
|
|
|
* High-level encapsulation facilities (C++: namespaces, Python: modules)
|
|
to support the design of re-usable libraries.
|
|
|
|
* Exception-handling for effective management of error conditions.
|
|
|
|
* C++ idioms in common use, such as handle/body classes and
|
|
reference-counted smart pointers mirror Python reference semantics.
|
|
|
|
Given Python's rich 'C' interoperability API, it should in principle
|
|
be possible to expose C++ type and function interfaces to Python with
|
|
an analogous interface to their C++ counterparts. However, the
|
|
facilities provided by Python alone for integration with C++ are
|
|
relatively meager. Compared to C++ and Python, 'C' has only very
|
|
rudimentary abstraction facilities, and support for exception-handling
|
|
is completely missing. 'C' extension module writers are required to
|
|
manually manage Python reference counts, which is both annoyingly
|
|
tedious and extremely error-prone. Traditional extension modules also
|
|
tend to contain a great deal of boilerplate code repetition which
|
|
makes them difficult to maintain, especially when wrapping an evolving
|
|
API.
|
|
|
|
These limitations have lead to the development of a variety of wrapping
|
|
systems. SWIG_ is probably the most popular package for the
|
|
integration of C/C++ and Python. A more recent development is SIP_,
|
|
which was specifically designed for interfacing Python with the Qt_
|
|
graphical user interface library. Both SWIG and SIP introduce their
|
|
own specialized languages for customizing inter-language bindings.
|
|
This has certain advantages, but having to deal with three different
|
|
languages (Python, C/C++ and the interface language) also introduces
|
|
practical and mental difficulties. The CXX_ package demonstrates an
|
|
interesting alternative. It shows that at least some parts of
|
|
Python's 'C' API can be wrapped and presented through a much more
|
|
user-friendly C++ interface. However, unlike SWIG and SIP, CXX does
|
|
not include support for wrapping C++ classes as new Python types.
|
|
|
|
The features and goals of Boost.Python_ overlap significantly with
|
|
many of these other systems. That said, Boost.Python attempts to
|
|
maximize convenience and flexibility without introducing a separate
|
|
wrapping language. Instead, it presents the user with a high-level
|
|
C++ interface for wrapping C++ classes and functions, managing much of
|
|
the complexity behind-the-scenes with static metaprogramming.
|
|
Boost.Python also goes beyond the scope of earlier systems by
|
|
providing:
|
|
|
|
* Support for C++ virtual functions that can be overridden in Python.
|
|
|
|
* Comprehensive lifetime management facilities for low-level C++
|
|
pointers and references.
|
|
|
|
* Support for organizing extensions as Python packages,
|
|
with a central registry for inter-language type conversions.
|
|
|
|
* A safe and convenient mechanism for tying into Python's powerful
|
|
serialization engine (pickle).
|
|
|
|
* Coherence with the rules for handling C++ lvalues and rvalues that
|
|
can only come from a deep understanding of both the Python and C++
|
|
type systems.
|
|
|
|
The key insight that sparked the development of Boost.Python is that
|
|
much of the boilerplate code in traditional extension modules could be
|
|
eliminated using C++ compile-time introspection. Each argument of a
|
|
wrapped C++ function must be extracted from a Python object using a
|
|
procedure that depends on the argument type. Similarly the function's
|
|
return type determines how the return value will be converted from C++
|
|
to Python. Of course argument and return types are part of each
|
|
function's type, and this is exactly the source from which
|
|
Boost.Python deduces most of the information required.
|
|
|
|
This approach leads to *user guided wrapping*: as much information is
|
|
extracted directly from the source code to be wrapped as is possible
|
|
within the framework of pure C++, and some additional information is
|
|
supplied explicitly by the user. Mostly the guidance is mechanical
|
|
and little real intervention is required. Because the interface
|
|
specification is written in the same full-featured language as the
|
|
code being exposed, the user has unprecedented power available when
|
|
she does need to take control.
|
|
|
|
.. _Python: http://www.python.org/
|
|
.. _SWIG: http://www.swig.org/
|
|
.. _SIP: http://www.riverbankcomputing.co.uk/sip/index.php
|
|
.. _Qt: http://www.trolltech.com/
|
|
.. _CXX: http://cxx.sourceforge.net/
|
|
.. _Boost.Python: http://www.boost.org/libs/python/doc
|
|
|
|
===========================
|
|
Boost.Python Design Goals
|
|
===========================
|
|
|
|
The primary goal of Boost.Python is to allow users to expose C++
|
|
classes and functions to Python using nothing more than a C++
|
|
compiler. In broad strokes, the user experience should be one of
|
|
directly manipulating C++ objects from Python.
|
|
|
|
However, it's also important not to translate all interfaces *too*
|
|
literally: the idioms of each language must be respected. For
|
|
example, though C++ and Python both have an iterator concept, they are
|
|
expressed very differently. Boost.Python has to be able to bridge the
|
|
interface gap.
|
|
|
|
It must be possible to insulate Python users from crashes resulting
|
|
from trivial misuses of C++ interfaces, such as accessing
|
|
already-deleted objects. By the same token the library should
|
|
insulate C++ users from low-level Python 'C' API, replacing
|
|
error-prone 'C' interfaces like manual reference-count management and
|
|
raw ``PyObject`` pointers with more-robust alternatives.
|
|
|
|
Support for component-based development is crucial, so that C++ types
|
|
exposed in one extension module can be passed to functions exposed in
|
|
another without loss of crucial information like C++ inheritance
|
|
relationships.
|
|
|
|
Finally, all wrapping must be *non-intrusive*, without modifying or
|
|
even seeing the original C++ source code. Existing C++ libraries have
|
|
to be wrappable by third parties who only have access to header files
|
|
and binaries.
|
|
|
|
==========================
|
|
Hello Boost.Python World
|
|
==========================
|
|
|
|
And now for a preview of Boost.Python, and how it improves on the raw
|
|
facilities offered by Python. Here's a function we might want to
|
|
expose::
|
|
|
|
char const* greet(unsigned x)
|
|
{
|
|
static char const* const msgs[] = { "hello", "Boost.Python", "world!" };
|
|
|
|
if (x > 2)
|
|
throw std::range_error("greet: index out of range");
|
|
|
|
return msgs[x];
|
|
}
|
|
|
|
To wrap this function in standard C++ using the Python 'C' API, we'd
|
|
need something like this::
|
|
|
|
extern "C" // all Python interactions use 'C' linkage and calling convention
|
|
{
|
|
// Wrapper to handle argument/result conversion and checking
|
|
PyObject* greet_wrap(PyObject* args, PyObject * keywords)
|
|
{
|
|
int x;
|
|
if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments
|
|
{
|
|
char const* result = greet(x); // invoke wrapped function
|
|
return PyString_FromString(result); // convert result to Python
|
|
}
|
|
return 0; // error occurred
|
|
}
|
|
|
|
// Table of wrapped functions to be exposed by the module
|
|
static PyMethodDef methods[] = {
|
|
{ "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" }
|
|
, { NULL, NULL, 0, NULL } // sentinel
|
|
};
|
|
|
|
// module initialization function
|
|
DL_EXPORT init_hello()
|
|
{
|
|
(void) Py_InitModule("hello", methods); // add the methods to the module
|
|
}
|
|
}
|
|
|
|
Now here's the wrapping code we'd use to expose it with Boost.Python::
|
|
|
|
#include <boost/python.hpp>
|
|
using namespace boost::python;
|
|
BOOST_PYTHON_MODULE(hello)
|
|
{
|
|
def("greet", greet, "return one of 3 parts of a greeting");
|
|
}
|
|
|
|
and here it is in action::
|
|
|
|
>>> import hello
|
|
>>> for x in range(3):
|
|
... print hello.greet(x)
|
|
...
|
|
hello
|
|
Boost.Python
|
|
world!
|
|
|
|
Aside from the fact that the 'C' API version is much more verbose,
|
|
it's worth noting a few things that it doesn't handle correctly:
|
|
|
|
* The original function accepts an unsigned integer, and the Python
|
|
'C' API only gives us a way of extracting signed integers. The
|
|
Boost.Python version will raise a Python exception if we try to pass
|
|
a negative number to ``hello.greet``, but the other one will proceed
|
|
to do whatever the C++ implementation does when converting an
|
|
negative integer to unsigned (usually wrapping to some very large
|
|
number), and pass the incorrect translation on to the wrapped
|
|
function.
|
|
|
|
* That brings us to the second problem: if the C++ ``greet()``
|
|
function is called with a number greater than 2, it will throw an
|
|
exception. Typically, if a C++ exception propagates across the
|
|
boundary with code generated by a 'C' compiler, it will cause a
|
|
crash. As you can see in the first version, there's no C++
|
|
scaffolding there to prevent this from happening. Functions wrapped
|
|
by Boost.Python automatically include an exception-handling layer
|
|
which protects Python users by translating unhandled C++ exceptions
|
|
into a corresponding Python exception.
|
|
|
|
* A slightly more-subtle limitation is that the argument conversion
|
|
used in the Python 'C' API case can only get that integer ``x`` in
|
|
*one way*. PyArg_ParseTuple can't convert Python ``long`` objects
|
|
(arbitrary-precision integers) which happen to fit in an ``unsigned
|
|
int`` but not in a ``signed long``, nor will it ever handle a
|
|
wrapped C++ class with a user-defined implicit ``operator unsigned
|
|
int()`` conversion. Boost.Python's dynamic type conversion
|
|
registry allows users to add arbitrary conversion methods.
|
|
|
|
==================
|
|
Library Overview
|
|
==================
|
|
|
|
This section outlines some of the library's major features. Except as
|
|
neccessary to avoid confusion, details of library implementation are
|
|
omitted.
|
|
|
|
------------------
|
|
Exposing Classes
|
|
------------------
|
|
|
|
C++ classes and structs are exposed with a similarly-terse interface.
|
|
Given::
|
|
|
|
struct World
|
|
{
|
|
void set(std::string msg) { this->msg = msg; }
|
|
std::string greet() { return msg; }
|
|
std::string msg;
|
|
};
|
|
|
|
The following code will expose it in our extension module::
|
|
|
|
#include <boost/python.hpp>
|
|
BOOST_PYTHON_MODULE(hello)
|
|
{
|
|
class_<World>("World")
|
|
.def("greet", &World::greet)
|
|
.def("set", &World::set)
|
|
;
|
|
}
|
|
|
|
Although this code has a certain pythonic familiarity, people
|
|
sometimes find the syntax bit confusing because it doesn't look like
|
|
most of the C++ code they're used to. All the same, this is just
|
|
standard C++. Because of their flexible syntax and operator
|
|
overloading, C++ and Python are great for defining domain-specific
|
|
(sub)languages
|
|
(DSLs), and that's what we've done in Boost.Python. To break it down::
|
|
|
|
class_<World>("World")
|
|
|
|
constructs an unnamed object of type ``class_<World>`` and passes
|
|
``"World"`` to its constructor. This creates a new-style Python class
|
|
called ``World`` in the extension module, and associates it with the
|
|
C++ type ``World`` in the Boost.Python type conversion registry. We
|
|
might have also written::
|
|
|
|
class_<World> w("World");
|
|
|
|
but that would've been more verbose, since we'd have to name ``w``
|
|
again to invoke its ``def()`` member function::
|
|
|
|
w.def("greet", &World::greet)
|
|
|
|
There's nothing special about the location of the dot for member
|
|
access in the original example: C++ allows any amount of whitespace on
|
|
either side of a token, and placing the dot at the beginning of each
|
|
line allows us to chain as many successive calls to member functions
|
|
as we like with a uniform syntax. The other key fact that allows
|
|
chaining is that ``class_<>`` member functions all return a reference
|
|
to ``*this``.
|
|
|
|
So the example is equivalent to::
|
|
|
|
class_<World> w("World");
|
|
w.def("greet", &World::greet);
|
|
w.def("set", &World::set);
|
|
|
|
It's occasionally useful to be able to break down the components of a
|
|
Boost.Python class wrapper in this way, but the rest of this article
|
|
will stick to the terse syntax.
|
|
|
|
For completeness, here's the wrapped class in use: ::
|
|
|
|
>>> import hello
|
|
>>> planet = hello.World()
|
|
>>> planet.set('howdy')
|
|
>>> planet.greet()
|
|
'howdy'
|
|
|
|
Constructors
|
|
============
|
|
|
|
Since our ``World`` class is just a plain ``struct``, it has an
|
|
implicit no-argument (nullary) constructor. Boost.Python exposes the
|
|
nullary constructor by default, which is why we were able to write: ::
|
|
|
|
>>> planet = hello.World()
|
|
|
|
However, well-designed classes in any language may require constructor
|
|
arguments in order to establish their invariants. Unlike Python,
|
|
where ``__init__`` is just a specially-named method, In C++
|
|
constructors cannot be handled like ordinary member functions. In
|
|
particular, we can't take their address: ``&World::World`` is an
|
|
error. The library provides a different interface for specifying
|
|
constructors. Given::
|
|
|
|
struct World
|
|
{
|
|
World(std::string msg); // added constructor
|
|
...
|
|
|
|
we can modify our wrapping code as follows::
|
|
|
|
class_<World>("World", init<std::string>())
|
|
...
|
|
|
|
of course, a C++ class may have additional constructors, and we can
|
|
expose those as well by passing more instances of ``init<...>`` to
|
|
``def()``::
|
|
|
|
class_<World>("World", init<std::string>())
|
|
.def(init<double, double>())
|
|
...
|
|
|
|
Boost.Python allows wrapped functions, member functions, and
|
|
constructors to be overloaded to mirror C++ overloading.
|
|
|
|
Data Members and Properties
|
|
===========================
|
|
|
|
Any publicly-accessible data members in a C++ class can be easily
|
|
exposed as either ``readonly`` or ``readwrite`` attributes::
|
|
|
|
class_<World>("World", init<std::string>())
|
|
.def_readonly("msg", &World::msg)
|
|
...
|
|
|
|
and can be used directly in Python: ::
|
|
|
|
>>> planet = hello.World('howdy')
|
|
>>> planet.msg
|
|
'howdy'
|
|
|
|
This does *not* result in adding attributes to the ``World`` instance
|
|
``__dict__``, which can result in substantial memory savings when
|
|
wrapping large data structures. In fact, no instance ``__dict__``
|
|
will be created at all unless attributes are explicitly added from
|
|
Python. Boost.Python owes this capability to the new Python 2.2 type
|
|
system, in particular the descriptor interface and ``property`` type.
|
|
|
|
In C++, publicly-accessible data members are considered a sign of poor
|
|
design because they break encapsulation, and style guides usually
|
|
dictate the use of "getter" and "setter" functions instead. In
|
|
Python, however, ``__getattr__``, ``__setattr__``, and since 2.2,
|
|
``property`` mean that attribute access is just one more
|
|
well-encapsulated syntactic tool at the programmer's disposal.
|
|
Boost.Python bridges this idiomatic gap by making Python ``property``
|
|
creation directly available to users. If ``msg`` were private, we
|
|
could still expose it as attribute in Python as follows::
|
|
|
|
class_<World>("World", init<std::string>())
|
|
.add_property("msg", &World::greet, &World::set)
|
|
...
|
|
|
|
The example above mirrors the familiar usage of properties in Python
|
|
2.2+: ::
|
|
|
|
>>> class World(object):
|
|
... __init__(self, msg):
|
|
... self.__msg = msg
|
|
... def greet(self):
|
|
... return self.__msg
|
|
... def set(self, msg):
|
|
... self.__msg = msg
|
|
... msg = property(greet, set)
|
|
|
|
Operator Overloading
|
|
====================
|
|
|
|
The ability to write arithmetic operators for user-defined types has
|
|
been a major factor in the success of both languages for numerical
|
|
computation, and the success of packages like NumPy_ attests to the
|
|
power of exposing operators in extension modules. Boost.Python
|
|
provides a concise mechanism for wrapping operator overloads. The
|
|
example below shows a fragment from a wrapper for the Boost rational
|
|
number library::
|
|
|
|
class_<rational<int> >("rational_int")
|
|
.def(init<int, int>()) // constructor, e.g. rational_int(3,4)
|
|
.def("numerator", &rational<int>::numerator)
|
|
.def("denominator", &rational<int>::denominator)
|
|
.def(-self) // __neg__ (unary minus)
|
|
.def(self + self) // __add__ (homogeneous)
|
|
.def(self * self) // __mul__
|
|
.def(self + int()) // __add__ (heterogenous)
|
|
.def(int() + self) // __radd__
|
|
...
|
|
|
|
The magic is performed using a simplified application of "expression
|
|
templates" [VELD1995]_, a technique originally developed for
|
|
optimization of high-performance matrix algebra expressions. The
|
|
essence is that instead of performing the computation immediately,
|
|
operators are overloaded to construct a type *representing* the
|
|
computation. In matrix algebra, dramatic optimizations are often
|
|
available when the structure of an entire expression can be taken into
|
|
account, rather than evaluating each operation "greedily".
|
|
Boost.Python uses the same technique to build an appropriate Python
|
|
method object based on expressions involving ``self``.
|
|
|
|
.. _NumPy: http://www.pfdubois.com/numpy/
|
|
|
|
Inheritance
|
|
===========
|
|
|
|
C++ inheritance relationships can be represented to Boost.Python by adding
|
|
an optional ``bases<...>`` argument to the ``class_<...>`` template
|
|
parameter list as follows::
|
|
|
|
class_<Derived, bases<Base1,Base2> >("Derived")
|
|
...
|
|
|
|
This has two effects:
|
|
|
|
1. When the ``class_<...>`` is created, Python type objects
|
|
corresponding to ``Base1`` and ``Base2`` are looked up in
|
|
Boost.Python's registry, and are used as bases for the new Python
|
|
``Derived`` type object, so methods exposed for the Python ``Base1``
|
|
and ``Base2`` types are automatically members of the ``Derived``
|
|
type. Because the registry is global, this works correctly even if
|
|
``Derived`` is exposed in a different module from either of its
|
|
bases.
|
|
|
|
2. C++ conversions from ``Derived`` to its bases are added to the
|
|
Boost.Python registry. Thus wrapped C++ methods expecting (a
|
|
pointer or reference to) an object of either base type can be
|
|
called with an object wrapping a ``Derived`` instance. Wrapped
|
|
member functions of class ``T`` are treated as though they have an
|
|
implicit first argument of ``T&``, so these conversions are
|
|
neccessary to allow the base class methods to be called for derived
|
|
objects.
|
|
|
|
Of course it's possible to derive new Python classes from wrapped C++
|
|
class instances. Because Boost.Python uses the new-style class
|
|
system, that works very much as for the Python built-in types. There
|
|
is one significant detail in which it differs: the built-in types
|
|
generally establish their invariants in their ``__new__`` function, so
|
|
that derived classes do not need to call ``__init__`` on the base
|
|
class before invoking its methods : ::
|
|
|
|
>>> class L(list):
|
|
... def __init__(self):
|
|
... pass
|
|
...
|
|
>>> L().reverse()
|
|
>>>
|
|
|
|
Because C++ object construction is a one-step operation, C++ instance
|
|
data cannot be constructed until the arguments are available, in the
|
|
``__init__`` function: ::
|
|
|
|
>>> class D(SomeBoostPythonClass):
|
|
... def __init__(self):
|
|
... pass
|
|
...
|
|
>>> D().some_boost_python_method()
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in ?
|
|
TypeError: bad argument type for built-in operation
|
|
|
|
This happened because Boost.Python couldn't find instance data of type
|
|
``SomeBoostPythonClass`` within the ``D`` instance; ``D``'s ``__init__``
|
|
function masked construction of the base class. It could be corrected
|
|
by either removing ``D``'s ``__init__`` function or having it call
|
|
``SomeBoostPythonClass.__init__(...)`` explicitly.
|
|
|
|
Virtual Functions
|
|
=================
|
|
|
|
Deriving new types in Python from extension classes is not very
|
|
interesting unless they can be used polymorphically from C++. In
|
|
other words, Python method implementations should appear to override
|
|
the implementation of C++ virtual functions when called *through base
|
|
class pointers/references from C++*. Since the only way to alter the
|
|
behavior of a virtual function is to override it in a derived class,
|
|
the user must build a special derived class to dispatch a polymorphic
|
|
class' virtual functions::
|
|
|
|
//
|
|
// interface to wrap:
|
|
//
|
|
class Base
|
|
{
|
|
public:
|
|
virtual int f(std::string x) { return 42; }
|
|
virtual ~Base();
|
|
};
|
|
|
|
int calls_f(Base const& b, std::string x) { return b.f(x); }
|
|
|
|
//
|
|
// Wrapping Code
|
|
//
|
|
|
|
// Dispatcher class
|
|
struct BaseWrap : Base
|
|
{
|
|
// Store a pointer to the Python object
|
|
BaseWrap(PyObject* self_) : self(self_) {}
|
|
PyObject* self;
|
|
|
|
// Default implementation, for when f is not overridden
|
|
int f_default(std::string x) { return this->Base::f(x); }
|
|
// Dispatch implementation
|
|
int f(std::string x) { return call_method<int>(self, "f", x); }
|
|
};
|
|
|
|
...
|
|
def("calls_f", calls_f);
|
|
class_<Base, BaseWrap>("Base")
|
|
.def("f", &Base::f, &BaseWrap::f_default)
|
|
;
|
|
|
|
Now here's some Python code which demonstrates: ::
|
|
|
|
>>> class Derived(Base):
|
|
... def f(self, s):
|
|
... return len(s)
|
|
...
|
|
>>> calls_f(Base(), 'foo')
|
|
42
|
|
>>> calls_f(Derived(), 'forty-two')
|
|
9
|
|
|
|
Things to notice about the dispatcher class:
|
|
|
|
* The key element which allows overriding in Python is the
|
|
``call_method`` invocation, which uses the same global type
|
|
conversion registry as the C++ function wrapping does to convert its
|
|
arguments from C++ to Python and its return type from Python to C++.
|
|
|
|
* Any constructor signatures you wish to wrap must be replicated with
|
|
an initial ``PyObject*`` argument
|
|
|
|
* The dispatcher must store this argument so that it can be used to
|
|
invoke ``call_method``
|
|
|
|
* The ``f_default`` member function is needed when the function being
|
|
exposed is not pure virtual; there's no other way ``Base::f`` can be
|
|
called on an object of type ``BaseWrap``, since it overrides ``f``.
|
|
|
|
Deeper Reflection on the Horizon?
|
|
=================================
|
|
|
|
Admittedly, this formula is tedious to repeat, especially on a project
|
|
with many polymorphic classes. That it is neccessary reflects some
|
|
limitations in C++'s compile-time introspection capabilities: there's
|
|
no way to enumerate the members of a class and find out which are
|
|
virtual functions. At least one very promising project has been
|
|
started to write a front-end which can generate these dispatchers (and
|
|
other wrapping code) automatically from C++ headers.
|
|
|
|
Pyste_ is being developed by Bruno da Silva de Oliveira. It builds on
|
|
GCC_XML_, which generates an XML version of GCC's internal program
|
|
representation. Since GCC is a highly-conformant C++ compiler, this
|
|
ensures correct handling of the most-sophisticated template code and
|
|
full access to the underlying type system. In keeping with the
|
|
Boost.Python philosophy, a Pyste interface description is neither
|
|
intrusive on the code being wrapped, nor expressed in some unfamiliar
|
|
language: instead it is a 100% pure Python script. If Pyste is
|
|
successful it will mark a move away from wrapping everything directly
|
|
in C++ for many of our users. It will also allow us the choice to
|
|
shift some of the metaprogram code from C++ to Python. We expect that
|
|
soon, not only our users but the Boost.Python developers themselves
|
|
will be "thinking hybrid" about their own code.
|
|
|
|
.. _`GCC_XML`: http://www.gccxml.org/HTML/Index.html
|
|
.. _`Pyste`: http://www.boost.org/libs/python/pyste
|
|
|
|
---------------
|
|
Serialization
|
|
---------------
|
|
|
|
*Serialization* is the process of converting objects in memory to a
|
|
form that can be stored on disk or sent over a network connection. The
|
|
serialized object (most often a plain string) can be retrieved and
|
|
converted back to the original object. A good serialization system will
|
|
automatically convert entire object hierarchies. Python's standard
|
|
``pickle`` module is just such a system. It leverages the language's strong
|
|
runtime introspection facilities for serializing practically arbitrary
|
|
user-defined objects. With a few simple and unintrusive provisions this
|
|
powerful machinery can be extended to also work for wrapped C++ objects.
|
|
Here is an example::
|
|
|
|
#include <string>
|
|
|
|
struct World
|
|
{
|
|
World(std::string a_msg) : msg(a_msg) {}
|
|
std::string greet() const { return msg; }
|
|
std::string msg;
|
|
};
|
|
|
|
#include <boost/python.hpp>
|
|
using namespace boost::python;
|
|
|
|
struct World_picklers : pickle_suite
|
|
{
|
|
static tuple
|
|
getinitargs(World const& w) { return make_tuple(w.greet()); }
|
|
};
|
|
|
|
BOOST_PYTHON_MODULE(hello)
|
|
{
|
|
class_<World>("World", init<std::string>())
|
|
.def("greet", &World::greet)
|
|
.def_pickle(World_picklers())
|
|
;
|
|
}
|
|
|
|
Now let's create a ``World`` object and put it to rest on disk::
|
|
|
|
>>> import hello
|
|
>>> import pickle
|
|
>>> a_world = hello.World("howdy")
|
|
>>> pickle.dump(a_world, open("my_world", "w"))
|
|
|
|
In a potentially *different script* on a potentially *different
|
|
computer* with a potentially *different operating system*::
|
|
|
|
>>> import pickle
|
|
>>> resurrected_world = pickle.load(open("my_world", "r"))
|
|
>>> resurrected_world.greet()
|
|
'howdy'
|
|
|
|
Of course the ``cPickle`` module can also be used for faster
|
|
processing.
|
|
|
|
Boost.Python's ``pickle_suite`` fully supports the ``pickle`` protocol
|
|
defined in the standard Python documentation. Like a __getinitargs__
|
|
function in Python, the pickle_suite's getinitargs() is responsible for
|
|
creating the argument tuple that will be use to reconstruct the pickled
|
|
object. The other elements of the Python pickling protocol,
|
|
__getstate__ and __setstate__ can be optionally provided via C++
|
|
getstate and setstate functions. C++'s static type system allows the
|
|
library to ensure at compile-time that nonsensical combinations of
|
|
functions (e.g. getstate without setstate) are not used.
|
|
|
|
Enabling serialization of more complex C++ objects requires a little
|
|
more work than is shown in the example above. Fortunately the
|
|
``object`` interface (see next section) greatly helps in keeping the
|
|
code manageable.
|
|
|
|
------------------
|
|
Object interface
|
|
------------------
|
|
|
|
Experienced 'C' language extension module authors will be familiar
|
|
with the ubiquitous ``PyObject*``, manual reference-counting, and the
|
|
need to remember which API calls return "new" (owned) references or
|
|
"borrowed" (raw) references. These constraints are not just
|
|
cumbersome but also a major source of errors, especially in the
|
|
presence of exceptions.
|
|
|
|
Boost.Python provides a class ``object`` which automates reference
|
|
counting and provides conversion to Python from C++ objects of
|
|
arbitrary type. This significantly reduces the learning effort for
|
|
prospective extension module writers.
|
|
|
|
Creating an ``object`` from any other type is extremely simple::
|
|
|
|
object s("hello, world"); // s manages a Python string
|
|
|
|
``object`` has templated interactions with all other types, with
|
|
automatic to-python conversions. It happens so naturally that it's
|
|
easily overlooked::
|
|
|
|
object ten_Os = 10 * s[4]; // -> "oooooooooo"
|
|
|
|
In the example above, ``4`` and ``10`` are converted to Python objects
|
|
before the indexing and multiplication operations are invoked.
|
|
|
|
The ``extract<T>`` class template can be used to convert Python objects
|
|
to C++ types::
|
|
|
|
double x = extract<double>(o);
|
|
|
|
If a conversion in either direction cannot be performed, an
|
|
appropriate exception is thrown at runtime.
|
|
|
|
The ``object`` type is accompanied by a set of derived types
|
|
that mirror the Python built-in types such as ``list``, ``dict``,
|
|
``tuple``, etc. as much as possible. This enables convenient
|
|
manipulation of these high-level types from C++::
|
|
|
|
dict d;
|
|
d["some"] = "thing";
|
|
d["lucky_number"] = 13;
|
|
list l = d.keys();
|
|
|
|
This almost looks and works like regular Python code, but it is pure
|
|
C++. Of course we can wrap C++ functions which accept or return
|
|
``object`` instances.
|
|
|
|
=================
|
|
Thinking hybrid
|
|
=================
|
|
|
|
Because of the practical and mental difficulties of combining
|
|
programming languages, it is common to settle a single language at the
|
|
outset of any development effort. For many applications, performance
|
|
considerations dictate the use of a compiled language for the core
|
|
algorithms. Unfortunately, due to the complexity of the static type
|
|
system, the price we pay for runtime performance is often a
|
|
significant increase in development time. Experience shows that
|
|
writing maintainable C++ code usually takes longer and requires *far*
|
|
more hard-earned working experience than developing comparable Python
|
|
code. Even when developers are comfortable working exclusively in
|
|
compiled languages, they often augment their systems by some type of
|
|
ad hoc scripting layer for the benefit of their users without ever
|
|
availing themselves of the same advantages.
|
|
|
|
Boost.Python enables us to *think hybrid*. Python can be used for
|
|
rapidly prototyping a new application; its ease of use and the large
|
|
pool of standard libraries give us a head start on the way to a
|
|
working system. If necessary, the working code can be used to
|
|
discover rate-limiting hotspots. To maximize performance these can
|
|
be reimplemented in C++, together with the Boost.Python bindings
|
|
needed to tie them back into the existing higher-level procedure.
|
|
|
|
Of course, this *top-down* approach is less attractive if it is clear
|
|
from the start that many algorithms will eventually have to be
|
|
implemented in C++. Fortunately Boost.Python also enables us to
|
|
pursue a *bottom-up* approach. We have used this approach very
|
|
successfully in the development of a toolbox for scientific
|
|
applications. The toolbox started out mainly as a library of C++
|
|
classes with Boost.Python bindings, and for a while the growth was
|
|
mainly concentrated on the C++ parts. However, as the toolbox is
|
|
becoming more complete, more and more newly added functionality can be
|
|
implemented in Python.
|
|
|
|
.. image:: images/python_cpp_mix.png
|
|
|
|
This figure shows the estimated ratio of newly added C++ and Python
|
|
code over time as new algorithms are implemented. We expect this
|
|
ratio to level out near 70% Python. Being able to solve new problems
|
|
mostly in Python rather than a more difficult statically typed
|
|
language is the return on our investment in Boost.Python. The ability
|
|
to access all of our code from Python allows a broader group of
|
|
developers to use it in the rapid development of new applications.
|
|
|
|
=====================
|
|
Development history
|
|
=====================
|
|
|
|
The first version of Boost.Python was developed in 2000 by Dave
|
|
Abrahams at Dragon Systems, where he was privileged to have Tim Peters
|
|
as a guide to "The Zen of Python". One of Dave's jobs was to develop
|
|
a Python-based natural language processing system. Since it was
|
|
eventually going to be targeting embedded hardware, it was always
|
|
assumed that the compute-intensive core would be rewritten in C++ to
|
|
optimize speed and memory footprint [#proto]_. The project also wanted to
|
|
test all of its C++ code using Python test scripts [#test]_. The only
|
|
tool we knew of for binding C++ and Python was SWIG_, and at the time
|
|
its handling of C++ was weak. It would be false to claim any deep
|
|
insight into the possible advantages of Boost.Python's approach at
|
|
this point. Dave's interest and expertise in fancy C++ template
|
|
tricks had just reached the point where he could do some real damage,
|
|
and Boost.Python emerged as it did because it filled a need and
|
|
because it seemed like a cool thing to try.
|
|
|
|
This early version was aimed at many of the same basic goals we've
|
|
described in this paper, differing most-noticeably by having a
|
|
slightly more cumbersome syntax and by lack of special support for
|
|
operator overloading, pickling, and component-based development.
|
|
These last three features were quickly added by Ullrich Koethe and
|
|
Ralf Grosse-Kunstleve [#feature]_, and other enthusiastic contributors arrived
|
|
on the scene to contribute enhancements like support for nested
|
|
modules and static member functions.
|
|
|
|
By early 2001 development had stabilized and few new features were
|
|
being added, however a disturbing new fact came to light: Ralf had
|
|
begun testing Boost.Python on pre-release versions of a compiler using
|
|
the EDG_ front-end, and the mechanism at the core of Boost.Python
|
|
responsible for handling conversions between Python and C++ types was
|
|
failing to compile. As it turned out, we had been exploiting a very
|
|
common bug in the implementation of all the C++ compilers we had
|
|
tested. We knew that as C++ compilers rapidly became more
|
|
standards-compliant, the library would begin failing on more
|
|
platforms. Unfortunately, because the mechanism was so central to the
|
|
functioning of the library, fixing the problem looked very difficult.
|
|
|
|
Fortunately, later that year Lawrence Berkeley and later Lawrence
|
|
Livermore National labs contracted with `Boost Consulting`_ for support
|
|
and development of Boost.Python, and there was a new opportunity to
|
|
address fundamental issues and ensure a future for the library. A
|
|
redesign effort began with the low level type conversion architecture,
|
|
building in standards-compliance and support for component-based
|
|
development (in contrast to version 1 where conversions had to be
|
|
explicitly imported and exported across module boundaries). A new
|
|
analysis of the relationship between the Python and C++ objects was
|
|
done, resulting in more intuitive handling for C++ lvalues and
|
|
rvalues.
|
|
|
|
The emergence of a powerful new type system in Python 2.2 made the
|
|
choice of whether to maintain compatibility with Python 1.5.2 easy:
|
|
the opportunity to throw away a great deal of elaborate code for
|
|
emulating classic Python classes alone was too good to pass up. In
|
|
addition, Python iterators and descriptors provided crucial and
|
|
elegant tools for representing similar C++ constructs. The
|
|
development of the generalized ``object`` interface allowed us to
|
|
further shield C++ programmers from the dangers and syntactic burdens
|
|
of the Python 'C' API. A great number of other features including C++
|
|
exception translation, improved support for overloaded functions, and
|
|
most significantly, CallPolicies for handling pointers and
|
|
references, were added during this period.
|
|
|
|
In October 2002, version 2 of Boost.Python was released. Development
|
|
since then has concentrated on improved support for C++ runtime
|
|
polymorphism and smart pointers. Peter Dimov's ingenious
|
|
``boost::shared_ptr`` design in particular has allowed us to give the
|
|
hybrid developer a consistent interface for moving objects back and
|
|
forth across the language barrier without loss of information. At
|
|
first, we were concerned that the sophistication and complexity of the
|
|
Boost.Python v2 implementation might discourage contributors, but the
|
|
emergence of Pyste_ and several other significant feature
|
|
contributions have laid those fears to rest. Daily questions on the
|
|
Python C++-sig and a backlog of desired improvements show that the
|
|
library is getting used. To us, the future looks bright.
|
|
|
|
.. _`EDG`: http://www.edg.com
|
|
|
|
=============
|
|
Conclusions
|
|
=============
|
|
|
|
Boost.Python achieves seamless interoperability between two rich and
|
|
complimentary language environments. Because it leverages template
|
|
metaprogramming to introspect about types and functions, the user
|
|
never has to learn a third syntax: the interface definitions are
|
|
written in concise and maintainable C++. Also, the wrapping system
|
|
doesn't have to parse C++ headers or represent the type system: the
|
|
compiler does that work for us.
|
|
|
|
Computationally intensive tasks play to the strengths of C++ and are
|
|
often impossible to implement efficiently in pure Python, while jobs
|
|
like serialization that are trivial in Python can be very difficult in
|
|
pure C++. Given the luxury of building a hybrid software system from
|
|
the ground up, we can approach design with new confidence and power.
|
|
|
|
===========
|
|
Citations
|
|
===========
|
|
|
|
.. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report,
|
|
Vol. 7 No. 5 June 1995, pp. 26-31.
|
|
http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
|
|
|
|
===========
|
|
Footnotes
|
|
===========
|
|
|
|
.. [#proto] In retrospect, it seems that "thinking hybrid" from the
|
|
ground up might have been better for the NLP system: the
|
|
natural component boundaries defined by the pure python
|
|
prototype turned out to be inappropriate for getting the
|
|
desired performance and memory footprint out of the C++ core,
|
|
which eventually caused some redesign overhead on the Python
|
|
side when the core was moved to C++.
|
|
|
|
.. [#test] We also have some reservations about driving all C++
|
|
testing through a Python interface, unless that's the only way
|
|
it will be ultimately used. Any transition across language
|
|
boundaries with such different object models can inevitably
|
|
mask bugs.
|
|
|
|
.. [#feature] These features were expressed very differently in v1 of
|
|
Boost.Python
|