afb0d1460a
Put thread doc in its own file
159 lines
6.9 KiB
Plaintext
159 lines
6.9 KiB
Plaintext
[library Boost.MPI
|
|
[quickbook 1.6]
|
|
[authors [Gregor, Douglas], [Troyer, Matthias] ]
|
|
[copyright 2005 2006 2007 Douglas Gregor, Matthias Troyer, Trustees of Indiana University]
|
|
[id mpi]
|
|
[license
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt or copy at
|
|
<ulink url="http://www.boost.org/LICENSE_1_0.txt">
|
|
http://www.boost.org/LICENSE_1_0.txt
|
|
</ulink>)
|
|
]
|
|
]
|
|
|
|
[/ Links ]
|
|
[def _MPI_ [@http://www-unix.mcs.anl.gov/mpi/ MPI]]
|
|
[def _MPI_implementations_
|
|
[@http://www-unix.mcs.anl.gov/mpi/implementations.html
|
|
MPI implementations]]
|
|
[def _Serialization_ [@boost:/libs/serialization/doc
|
|
Boost.Serialization]]
|
|
[def _BoostPython_ [@http://www.boost.org/libs/python/doc
|
|
Boost.Python]]
|
|
[def _Python_ [@http://www.python.org Python]]
|
|
[def _MPICH_ [@http://www-unix.mcs.anl.gov/mpi/mpich/ MPICH2]]
|
|
[def _OpenMPI_ [@http://www.open-mpi.org OpenMPI]]
|
|
[def _IntelMPI_ [@https://software.intel.com/en-us/intel-mpi-library Intel MPI]]
|
|
[def _accumulate_ [@http://www.sgi.com/tech/stl/accumulate.html
|
|
`accumulate`]]
|
|
|
|
[include introduction.qbk]
|
|
[include getting_started.qbk]
|
|
[include tutorial.qbk]
|
|
[include c_mapping.qbk]
|
|
|
|
[xinclude mpi_autodoc.xml]
|
|
|
|
[include python.qbk]
|
|
|
|
[section:design Design Philosophy]
|
|
|
|
The design philosophy of the Parallel MPI library is very simple: be
|
|
both convenient and efficient. MPI is a library built for
|
|
high-performance applications, but it's FORTRAN-centric,
|
|
performance-minded design makes it rather inflexible from the C++
|
|
point of view: passing a string from one process to another is
|
|
inconvenient, requiring several messages and explicit buffering;
|
|
passing a container of strings from one process to another requires
|
|
an extra level of manual bookkeeping; and passing a map from strings
|
|
to containers of strings is positively infuriating. The Parallel MPI
|
|
library allows all of these data types to be passed using the same
|
|
simple `send()` and `recv()` primitives. Likewise, collective
|
|
operations such as [funcref boost::mpi::reduce `reduce()`]
|
|
allow arbitrary data types and function objects, much like the C++
|
|
Standard Library would.
|
|
|
|
The higher-level abstractions provided for convenience must not have
|
|
an impact on the performance of the application. For instance, sending
|
|
an integer via `send` must be as efficient as a call to `MPI_Send`,
|
|
which means that it must be implemented by a simple call to
|
|
`MPI_Send`; likewise, an integer [funcref boost::mpi::reduce
|
|
`reduce()`] using `std::plus<int>` must be implemented with a call to
|
|
`MPI_Reduce` on integers using the `MPI_SUM` operation: anything less
|
|
will impact performance. In essence, this is the "don't pay for what
|
|
you don't use" principle: if the user is not transmitting strings,
|
|
s/he should not pay the overhead associated with strings.
|
|
|
|
Sometimes, achieving maximal performance means foregoing convenient
|
|
abstractions and implementing certain functionality using lower-level
|
|
primitives. For this reason, it is always possible to extract enough
|
|
information from the abstractions in Boost.MPI to minimize
|
|
the amount of effort required to interface between Boost.MPI
|
|
and the C MPI library.
|
|
[endsect]
|
|
|
|
[section:performance Performance Evaluation]
|
|
|
|
Message-passing performance is crucial in high-performance distributed
|
|
computing. To evaluate the performance of Boost.MPI, we modified the
|
|
standard [@http://www.scl.ameslab.gov/netpipe/ NetPIPE] benchmark
|
|
(version 3.6.2) to use Boost.MPI and compared its performance against
|
|
raw MPI. We ran five different variants of the NetPIPE benchmark:
|
|
|
|
# MPI: The unmodified NetPIPE benchmark.
|
|
|
|
# Boost.MPI: NetPIPE modified to use Boost.MPI calls for
|
|
communication.
|
|
|
|
# MPI (Datatypes): NetPIPE modified to use a derived datatype (which
|
|
itself contains a single `MPI_BYTE`) rather than a fundamental
|
|
datatype.
|
|
|
|
# Boost.MPI (Datatypes): NetPIPE modified to use a user-defined type
|
|
`Char` in place of the fundamental `char` type. The `Char` type
|
|
contains a single `char`, a `serialize()` method to make it
|
|
serializable, and specializes [classref
|
|
boost::mpi::is_mpi_datatype is_mpi_datatype] to force
|
|
Boost.MPI to build a derived MPI data type for it.
|
|
|
|
# Boost.MPI (Serialized): NetPIPE modified to use a user-defined type
|
|
`Char` in place of the fundamental `char` type. This `Char` type
|
|
contains a single `char` and is serializable. Unlike the Datatypes
|
|
case, [classref boost::mpi::is_mpi_datatype
|
|
is_mpi_datatype] is *not* specialized, forcing Boost.MPI to perform
|
|
many, many serialization calls.
|
|
|
|
The actual tests were performed on the Odin cluster in the
|
|
[@http://www.cs.indiana.edu/ Department of Computer Science] at
|
|
[@http://www.iub.edu Indiana University], which contains 128 nodes
|
|
connected via Infiniband. Each node contains 4GB memory and two AMD
|
|
Opteron processors. The NetPIPE benchmarks were compiled with Intel's
|
|
C++ Compiler, version 9.0, Boost 1.35.0 (prerelease), and
|
|
[@http://www.open-mpi.org/ Open MPI] version 1.1. The NetPIPE results
|
|
follow:
|
|
|
|
[$../../libs/mpi/doc/netpipe.png]
|
|
|
|
There are a some observations we can make about these NetPIPE
|
|
results. First of all, the top two plots show that Boost.MPI performs
|
|
on par with MPI for fundamental types. The next two plots show that
|
|
Boost.MPI performs on par with MPI for derived data types, even though
|
|
Boost.MPI provides a much more abstract, completely transparent
|
|
approach to building derived data types than raw MPI. Overall
|
|
performance for derived data types is significantly worse than for
|
|
fundamental data types, but the bottleneck is in the underlying MPI
|
|
implementation itself. Finally, when forcing Boost.MPI to serialize
|
|
characters individually, performance suffers greatly. This particular
|
|
instance is the worst possible case for Boost.MPI, because we are
|
|
serializing millions of individual characters. Overall, the
|
|
additional abstraction provided by Boost.MPI does not impair its
|
|
performance.
|
|
|
|
[endsect]
|
|
|
|
[section:history Revision History]
|
|
|
|
* *Boost 1.36.0*:
|
|
* Support for non-blocking operations in Python, from Andreas Klöckner
|
|
|
|
* *Boost 1.35.0*: Initial release, containing the following post-review changes
|
|
* Support for arrays in all collective operations
|
|
* Support default-construction of [classref boost::mpi::environment environment]
|
|
|
|
* *2006-09-21*: Boost.MPI accepted into Boost.
|
|
|
|
[endsect:history]
|
|
|
|
[section:acknowledge Acknowledgments]
|
|
Boost.MPI was developed with support from Zurcher Kantonalbank. Daniel
|
|
Egloff and Michael Gauckler contributed many ideas to Boost.MPI's
|
|
design, particularly in the design of its abstractions for
|
|
MPI data types and the novel skeleton/context mechanism for large data
|
|
structures. Prabhanjan (Anju) Kambadur developed the predecessor to
|
|
Boost.MPI that proved the usefulness of the Serialization library in
|
|
an MPI setting and the performance benefits of specialization in a C++
|
|
abstraction layer for MPI. Jeremy Siek managed the formal review of Boost.MPI.
|
|
|
|
[endsect:acknowledge]
|