431 lines
20 KiB
HTML
431 lines
20 KiB
HTML
<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<!--
|
|
(C) Copyright 2002-4 Robert Ramey - http://www.rrsd.com .
|
|
Use, modification and distribution is subject to the Boost Software
|
|
License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt)
|
|
-->
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
|
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
|
<link rel="stylesheet" type="text/css" href="style.css">
|
|
<title>Serialization - Special Considerations</title>
|
|
</head>
|
|
<body link="#0000ff" vlink="#800080">
|
|
<table border="0" cellpadding="7" cellspacing="0" width="100%" summary="header">
|
|
<tr>
|
|
<td valign="top" width="300">
|
|
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3>
|
|
</td>
|
|
<td valign="top">
|
|
<h1 align="center">Serialization</h1>
|
|
<h2 align="center">Special Considerations</h2>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<hr>
|
|
<dl class="page-index">
|
|
<dt><a href="#objecttracking">Object Tracking</a>
|
|
<dt><a href="#export">Exporting Class Serialization</a>
|
|
<dt><a href="#classinfo">Class Information</a>
|
|
<dt><a href="#portability">Archive Portability</a>
|
|
<dl class="page-index">
|
|
<dt><a href="#numerics">Numerics</a>
|
|
<dt><a href="#traits">Traits</a>
|
|
</dl>
|
|
<dt><a href="#binary_archives">Binary Archives</a>
|
|
<dt><a href="#xml_archives">XML Archives</a>
|
|
<dt><a href="#dlls">DLLS - Serialization and Runtime Linking</a>
|
|
<dt><a href="#multi_threading">Multi-Threading</a>
|
|
<dt><a href="#optimizations">Optimizations</a>
|
|
<dt><a href="exceptions.html">Archive Exceptions</a>
|
|
<dt><a href="exception_safety.html">Exception Safety</a>
|
|
</dl>
|
|
|
|
<h3><a name="objecttracking">Object Tracking</a></h3>
|
|
Depending on how the class is used and other factors, serialized objects
|
|
may be tracked by memory address. This prevents the same object from being
|
|
written to or read from an archive multiple times. These stored addresses
|
|
can also be used to delete objects created during a loading process
|
|
that has been interrupted by throwing of an exception.
|
|
<p>
|
|
This could cause problems in
|
|
progams where the copies of different objects are saved from the same address.
|
|
<pre><code>
|
|
template<class Archive>
|
|
void save(boost::basic_oarchive & ar, const unsigned int version) const
|
|
{
|
|
for(int i = 0; i < 10; ++i){
|
|
A x = a[i];
|
|
ar << x;
|
|
}
|
|
}
|
|
</code></pre>
|
|
In this case, the data to be saved exists on the stack. Each iteration
|
|
of the loop updates the value on the stack. So although the data changes
|
|
each iteration, the address of the data doesn't. If a[i] is an array of
|
|
objects being tracked by memory address, the library will skip storing
|
|
objects after the first as it will be assumed that objects at the same address
|
|
are really the same object.
|
|
<p>
|
|
To help detect such cases, output archive operators expect to be passed
|
|
<code style="white-space: normal">const</code> reference arguments.
|
|
<p>
|
|
Given this, the above code will invoke a compile time assertion.
|
|
The obvious fix in this example is to use
|
|
<pre><code>
|
|
template<class Archive>
|
|
void save(boost::basic_oarchive & ar, const unsigned int version) const
|
|
{
|
|
for(int i = 0; i < 10; ++i){
|
|
ar << a[i];
|
|
}
|
|
}
|
|
</code></pre>
|
|
which will compile and run without problem.
|
|
The usage of <code style="white-space: normal">const</code> by the output archive operators
|
|
will ensure that the process of serialization doesn't
|
|
change the state of the objects being serialized. An attempt to do this
|
|
would constitute augmentation of the concept of saving of state with
|
|
some sort of non-obvious side effect. This would almost surely be a mistake
|
|
and a likely source of very subtle bugs.
|
|
<p>
|
|
Unfortunately, implementation issues currently prevent the detection of this kind of
|
|
error when the data item is wrapped as a name-value pair.
|
|
<p>
|
|
A similar problem can occur when different objects are loaded to and address
|
|
which is different from the final location:
|
|
<pre><code>
|
|
template<class Archive>
|
|
void load(boost::basic_oarchive & ar, const unsigned int version) const
|
|
{
|
|
for(int i = 0; i < 10; ++i){
|
|
A x;
|
|
ar >> x;
|
|
std::m_set.insert(x);
|
|
}
|
|
}
|
|
</code></pre>
|
|
In this case, the address of <code>x</code> is the one that is tracked rather than
|
|
the address of the new item added to the set. Left unaddressed
|
|
this will break the features that depend on tracking such as loading object through a pointer.
|
|
Subtle bugs will be introduced into the program. This can be
|
|
addressed by altering the above code thusly:
|
|
|
|
<pre><code>
|
|
template<class Archive>
|
|
void load(boost::basic_iarchive & ar, const unsigned int version) const
|
|
{
|
|
for(int i = 0; i < 10; ++i){
|
|
A x;
|
|
ar >> x;
|
|
std::pair<std::set::const_iterator, bool> result;
|
|
result = std::m_set.insert(x);
|
|
ar.reset_object_address(& (*result.first), &x);
|
|
}
|
|
}
|
|
</code></pre>
|
|
This will adjust the tracking information to reflect the final resting place of
|
|
the moved variable and thereby rectify the above problem.
|
|
<p>
|
|
If it is known a priori that no pointer
|
|
values are duplicated, overhead associated with object tracking can
|
|
be eliminated by setting the object tracking class serialization trait
|
|
appropriately.
|
|
<p>
|
|
By default, data types designated primitive by
|
|
<a target="detail" href="traits.html#level">Implementation Level</a>
|
|
class serialization trait are never tracked. If it is desired to
|
|
track a shared primitive object through a pointer (e.g. a
|
|
<code style="white-space: normal">long</code> used as a reference count), It should be wrapped
|
|
in a class/struct so that it is an identifiable type.
|
|
The alternative of changing the implementation level of a <code style="white-space: normal">long</code>
|
|
would affect all <code style="white-space: normal">long</code>s serialized in the whole
|
|
program - probably not what one would intend.
|
|
<p>
|
|
It is possible that we may want to track addresses even though
|
|
the object is never serialized through a pointer. For example,
|
|
a virtual base class need be saved/loaded only once. By setting
|
|
this serialization trait to <code style="white-space: normal">track_always</code>, we can suppress
|
|
redundant save/load operations.
|
|
<pre><code>
|
|
BOOST_CLASS_TRACKING(my_virtual_base_class, boost::serialization::track_always)
|
|
</code></pre>
|
|
|
|
<h3><a name="export">Exporting Class Serialization</a></h3>
|
|
<a target="detail" href="traits.html#export">Elsewhere</a> in this manual, we have described
|
|
<code style="white-space: normal">BOOST_CLASS_EXPORT</code>. This is used to make the serialization library aware
|
|
that code should be instantiated for serialization of a given class even though the
|
|
class hasn't been otherwise referred to by the program. This functionality
|
|
is necessary to implement serialization of pointers through a virtual base
|
|
class pointer. That is, a polymorphic pointer.
|
|
<p>
|
|
This macro specifies a "Globally Unique IDentifier" which corresponds to the
|
|
external representation of the class name. Generally this text representation
|
|
of the class name is sufficient for this purpose, but in certain
|
|
cases it maybe necessary to specify a different string by using
|
|
<code style="white-space: normal">BOOST_CLASS_EXPORT_GUID</code>
|
|
rather than a simple
|
|
<code style="white-space: normal">BOOST_CLASS_EXPORT</code>.
|
|
|
|
<p>
|
|
<code style="white-space: normal">BOOST_CLASS_EXPORT</code> would usually
|
|
be specified in the same header file as the class declaration to which it
|
|
corresponds. That is, <code style="white-space: normal">BOOST_CLASS_EXPORT(T)</code>
|
|
is a "trait" of the class T. So a program using this class will look
|
|
something like:
|
|
|
|
<pre><code>
|
|
#include <boost/archive/xml_oarchive.hpp>
|
|
.... // any other archive classes
|
|
#include "my_class.hpp" // which contains BOOST_CLASS_EXPORT(my_class)
|
|
</code></pre>
|
|
|
|
These headers can be in any order. (In boost versions 1.34
|
|
and earlier, the archive headers had to go before any headers which
|
|
contain <code style="white-space: normal">BOOST_CLASS_EXPORT</code>.)
|
|
Any code required to serialize types specified
|
|
by <code style="white-space: normal">BOOST_CLASS_EXPORT</code> will be
|
|
instantiated for each archive whose header is included. (note that the code
|
|
is instantiated regardless of whether or not it is actually invoked.)
|
|
If no archive headers are included - no code should be instantiated.
|
|
This will permit <code style="white-space: normal">BOOST_CLASS_EXPORT</code>
|
|
to be a permanent part of the <code style="white-space: normal">my_class.hpp</code> .
|
|
|
|
<h3><a name="classinfo">Class Information</a></h3>
|
|
By default, for each class serialized, class information is written to the archive.
|
|
This information includes version number, implementation level and tracking
|
|
behavior. This is necessary so that the archive can be correctly
|
|
deserialized even if a subsequent version of the program changes
|
|
some of the current trait values for a class. The space overhead for
|
|
this data is minimal. There is a little bit of runtime overhead
|
|
since each class has to be checked to see if it has already had its
|
|
class information included in the archive. In some cases, even this
|
|
might be considered too much. This extra overhead can be eliminated
|
|
by setting the
|
|
<a target="detail" href="traits.html#level">implementation level</a>
|
|
class trait to: <code style="white-space: normal">boost::serialization::object_serializable</code>.
|
|
<p>
|
|
<i>Turning off tracking and class information serialization will result
|
|
in pure template inline code that in principle could be optimised down
|
|
to a simple stream write/read.</i> Elimination of all serialization overhead
|
|
in this manner comes at a cost. Once archives are released to users, the
|
|
class serialization traits cannot be changed without invalidating the old
|
|
archives. Including the class information in the archive assures us
|
|
that they will be readable in the future even if the class definition
|
|
is revised. A light weight structure such as display pixel might be
|
|
declared in a header like this:
|
|
|
|
<pre><code>
|
|
#include <boost/serialization/serialization.hpp>
|
|
#include <boost/serialization/level.hpp>
|
|
#include <boost/serialization/tracking.hpp>
|
|
|
|
// a pixel is a light weight struct which is used in great numbers.
|
|
struct pixel
|
|
{
|
|
unsigned char red, green, blue;
|
|
template<class Archive>
|
|
void serialize(Archive & ar, const unsigned int /* version */){
|
|
ar << red << green << blue;
|
|
}
|
|
};
|
|
|
|
// elminate serialization overhead at the cost of
|
|
// never being able to increase the version.
|
|
BOOST_CLASS_IMPLEMENTATION(pixel, boost::serialization::object_serializable);
|
|
|
|
// eliminate object tracking (even if serialized through a pointer)
|
|
// at the risk of a programming error creating duplicate objects.
|
|
BOOST_CLASS_TRACKING(pixel, boost::serialization::track_never)
|
|
</code></pre>
|
|
|
|
<h3><a name="portability">Archive Portability</a></h3>
|
|
Several archive classes create their data in the form of text or portable a binary format.
|
|
It should be possible to save such an of such a class on one platform and load it on another.
|
|
This is subject to a couple of conditions.
|
|
<h4><a name="numerics">Numerics</a></h4>
|
|
The architecture of the machine reading the archive must be able hold the data
|
|
saved. For example, the gcc compiler reserves 4 bytes to store a variable of type
|
|
<code style="white-space: normal">wchar_t</code> while other compilers reserve only 2 bytes.
|
|
So its possible that a value could be written that couldn't be represented by the loading program. This is a
|
|
fairly obvious situation and easily handled by using the numeric types in
|
|
<a target="cstding" href="../../../boost/cstdint.hpp"><boost/cstdint.hpp></a>
|
|
<P>
|
|
A special integral type is <code>std::size_t</code> which is a typedef
|
|
of an integral types guaranteed to be large enough
|
|
to hold the size of any collection, but its actual size can differ depending
|
|
on the platform. The
|
|
<a href="wrappers.html#collection_size_type"><code>collection_size_type</code></a>
|
|
wrapper exists to enable a portable serialization of collection sizes by an archive.
|
|
Recommended choices for a portable serialization of collection sizes are to
|
|
use either 64-bit or variable length integer representation.
|
|
|
|
|
|
<h4><a name="traits">Traits</a></h4>
|
|
Another potential problem is illustrated by the following example:
|
|
<pre><code>
|
|
template<class T>
|
|
struct my_wrapper {
|
|
template<class Archive>
|
|
Archive & serialize ...
|
|
};
|
|
|
|
...
|
|
|
|
class my_class {
|
|
wchar_t a;
|
|
short unsigned b;
|
|
template<<class Archive>
|
|
Archive & serialize(Archive & ar, unsigned int version){
|
|
ar & my_wrapper(a);
|
|
ar & my_wrapper(b);
|
|
}
|
|
};
|
|
</code></pre>
|
|
If <code style="white-space: normal">my_wrapper</code> uses default serialization
|
|
traits there could be a problem. With the default traits, each time a new type is
|
|
added to the archive, bookkeeping information is added. So in this example, the
|
|
archive would include such bookkeeping information for
|
|
<code style="white-space: normal">my_wrapper<wchar_t></code> and for
|
|
<code style="white-space: normal">my_wrapper<short_unsigned></code>.
|
|
Or would it? What about compilers that treat
|
|
<code style="white-space: normal">wchar_t</code> as a
|
|
synonym for <code style="white-space: normal">unsigned short</code>?
|
|
In this case there is only one distinct type - not two. If archives are passed between
|
|
programs with compilers that differ in their treatment
|
|
of <code style="white-space: normal">wchar_t</code> the load operation will fail
|
|
in a catastrophic way.
|
|
<p>
|
|
One remedy for this is to assign serialization traits to the template
|
|
<code style="white-space: normal">my_template</code> such that class
|
|
information for instantiations of this template is never serialized. This
|
|
process is described <a target="detail" href="traits.html#templates">above</a> and
|
|
has been used for <a target="detail" href="wrappers.html#nvp"><strong>Name-Value Pairs</strong></a>.
|
|
Wrappers would typically be assigned such traits.
|
|
<p>
|
|
Another way to avoid this problem is to assign serialization traits
|
|
to all specializations of the template <code style="white-space: normal">my_wrapper</code>
|
|
for all primitive types so that class information is never saved. This is what has
|
|
been done for our implementation of serializations for STL collections.
|
|
|
|
<h3><a name="binary_archives">Binary Archives</a></h3>
|
|
Standard stream i/o on some systems will expand linefeed characters to carriage-return/linefeed
|
|
on output. This creates a problem for binary archives. The easiest way to handle this is to
|
|
open streams for binary archives in "binary mode" by using the flag
|
|
<code style="white-space: normal">ios::binary</code>. If this is not done, the archive generated
|
|
will be unreadable.
|
|
<p>
|
|
Unfortunately, no way has been found to detect this error before loading the archive. Debug builds
|
|
will assert when this is detected so that may be helpful in catching this error.
|
|
|
|
<h3><a name="xml_archives">XML Archives</a></h3>
|
|
XML archives present a somewhat special case.
|
|
XML format has a nested structure that maps well to the "recursive class member visitor" pattern
|
|
used by the serialization system. However, XML differs from other formats in that it
|
|
requires a name for each data member. Our goal is to add this information to the
|
|
class serialization specification while still permiting the the serialization code to be
|
|
used with any archive. This is achived by requiring that all data serialized to an XML archive
|
|
be serialized as a <a target="detail" href="wrappers.html#nvp">name-value pair</a>.
|
|
The first member is the name to be used as the XML tag for the
|
|
data item while the second is a reference to the data item itself. Any attempt to serialize data
|
|
not wrapped in a in a <a target="detail" href="wrappers.html#nvp">name-value pair</a> will
|
|
be trapped at compile time. The system is implemented in such a way that for other archive classes,
|
|
just the value portion of the data is serialized. The name portion is discarded during compilation.
|
|
So by always using <a target="detail" href="wrappers.html#nvp">name-value pairs</a>, it will
|
|
be guarenteed that all data can be serialized to all archive classes with maximum efficiency.
|
|
|
|
<h3><a name="dlls">DLLS - Serialization and Runtime Linking</a></h3>
|
|
Serialization code can be placed in libraries to be linked at runtime. That is,
|
|
code can be placed in DLLS(Windows) or Shared Libraries(*nix).
|
|
Along with the "export" facility, this
|
|
permits a program to written without knowledge of the actual types to be serialized.
|
|
This package doesn't include an example of this technique - but coding would be
|
|
very similar to the example
|
|
<a href = "../example/demo_pimpl.cpp" target="demo_pimpl">
|
|
<code style="white-space: normal">demo_pimpl.cpp</code>
|
|
</a>,
|
|
<a href = "../example/demo_pimpl_A.cpp" target="demo_pimpl">
|
|
<code style="white-space: normal">demo_pimpl_A.cpp</code>
|
|
</a>
|
|
and
|
|
<a href = "../example/demo_pimpl_A.hpp" target="demo_pimpl">
|
|
<code style="white-space: normal">demo_pimpl_A.hpp</code>
|
|
</a>
|
|
where implementation of serializaton is completely separate
|
|
from the main program.
|
|
|
|
<h3><a name="multi_threading">Multi-Threading</a></h3>
|
|
The nature of serialization would conflict with multiple thread concurrently
|
|
writing/reading from/to a single open archive. Since each archive is
|
|
independent from ever other one, there should be no problem
|
|
in having multiple open archives from one or more threads.
|
|
<p>
|
|
Well, not quite.
|
|
<p>
|
|
There are a couple of global data structures for holding
|
|
information of serializable types. These structures are
|
|
used to dispatch to correct code to handle each pair
|
|
of serializable types and archive types. Since this
|
|
information is shared among all archives, there is
|
|
potential for problems. This has been addressed
|
|
carefully implementing the library so that these
|
|
structures are all initialized before
|
|
<code style="white-space: normal">main(...)</code>
|
|
is called. From then on they are never altered. So
|
|
there SHOULD be no problem having mulitple archives
|
|
open simultaneously - be it from the same or different
|
|
threads.
|
|
<p>
|
|
Well, almost.
|
|
<p>
|
|
With dynamically loaded code - DLLS or Shared Libraries,
|
|
these global data structures can be altered when a library
|
|
is loaded or unloaded. That is, in this case, these
|
|
globa data structures can be altered after
|
|
<code style="white-space: normal">main(...)</code>
|
|
is called. So if a thread is dynamically loading/unloading
|
|
modules which contain serialization code while an
|
|
archive is open there could be problems. Also, if
|
|
such loading/unloading is happening concurrently
|
|
in different threads, there could also be problems.
|
|
<p>
|
|
It might not be easy to control this. Is possible that
|
|
some systems may not actually load modules until they
|
|
are actually needed. So even though we think that
|
|
there is not dynamic loading/unloading of such code
|
|
it could be occurring as "help" to manage resources.
|
|
On such systems, access to archive code would have
|
|
to be syncronized with some multi-threading construct
|
|
in order to be functional.
|
|
|
|
<h3><a name="optimizations">Optimizations</a></h3>
|
|
In performance critical applications that serialize large sets of contiguous data of homogeneous
|
|
types one wants to avoid the overhead of serializing each element individually, which is
|
|
the motivation for the <a href="wrappers.html#arrays"><code>array</code></a>
|
|
wrapper.
|
|
|
|
Serialization functions for data types containing contiguous arrays of homogeneous
|
|
types, such as for <code>std::vector</code>, <code>std::valarray</code> or
|
|
<code>boost::multiarray</code> should serialize them using an
|
|
<a href="wrappers.html#arrays"><code>array</code></a> wrapper to make use of
|
|
these optimizations.
|
|
|
|
Archive types that can provide optimized serialization for contiguous arrays of
|
|
homogeneous types should implement these by overloading the serialization of
|
|
the <a href="wrappers.html#arrays"><code>array</code></a> wrapper, as is done
|
|
for the binary archives.
|
|
|
|
|
|
<h3><a href="exceptions.html">Archive Exceptions</a></h3>
|
|
<h3><a href="exception_safety.html">Exception Safety</a></h3>
|
|
|
|
<hr>
|
|
<p><i>© Copyright <a href="http://www.rrsd.com">Robert Ramey</a> 2002-2004.
|
|
Distributed under the Boost Software License, Version 1.0. (See
|
|
accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
</i></p>
|
|
</body>
|
|
</html>
|