3b110f689b
fixed typo in the mpi tutorial that included code from the wrong source file
267 lines
9.9 KiB
Plaintext
267 lines
9.9 KiB
Plaintext
[/============================================================================
|
|
Boost.odeint
|
|
|
|
Copyright 2013 Karsten Ahnert
|
|
Copyright 2013 Pascal Germroth
|
|
Copyright 2013 Mario Mulansky
|
|
|
|
Use, modification and distribution is subject to the Boost Software License,
|
|
Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt)
|
|
=============================================================================/]
|
|
|
|
|
|
[section Parallel computation with OpenMP and MPI]
|
|
|
|
Parallelization is a key feature for modern numerical libraries due to the vast
|
|
availability of many cores nowadays, even on Laptops.
|
|
odeint currently supports parallelization with OpenMP and MPI, as described in
|
|
the following sections.
|
|
However, it should be made clear from the beginning that the difficulty of
|
|
efficiently distributing ODE integration on many cores/machines lies in the
|
|
parallelization of the system function, which is still the user's
|
|
responsibility.
|
|
Simply using a parallel odeint backend without parallelizing the system function
|
|
will bring you almost no performance gains.
|
|
|
|
[section OpenMP]
|
|
|
|
[import ../examples/openmp/phase_chain.cpp]
|
|
|
|
odeint's OpenMP support is implemented as an external backend, which needs to be
|
|
manually included. Depending on the compiler some additional flags may be
|
|
needed, i.e. [^-fopenmp] for GCC.
|
|
[phase_chain_openmp_header]
|
|
|
|
In the easiest parallelization approach with OpenMP we use a standard `vector`
|
|
as the state type:
|
|
[phase_chain_vector_state]
|
|
|
|
We initialize the state with some random data:
|
|
[phase_chain_init]
|
|
|
|
Now we have to configure the stepper to use the OpenMP backend.
|
|
This is done by explicitly providing the `openmp_range_algebra` as a template
|
|
parameter to the stepper.
|
|
This algebra requires the state type to be a model of Random Access Range and
|
|
will be used from multiple threads by the algebra.
|
|
[phase_chain_stepper]
|
|
|
|
Additional to providing the stepper with OpenMP parallelization we also need
|
|
a parallelized system function to exploit the available cores.
|
|
Here this is shown for a simple one-dimensional chain of phase oscillators with
|
|
nearest neighbor coupling:
|
|
[phase_chain_rhs]
|
|
|
|
[note In the OpenMP backends the system function will always be called
|
|
sequentially from the thread used to start the integration.]
|
|
|
|
Finally, we perform the integration by using one of the integrate functions from
|
|
odeint.
|
|
As you can see, the parallelization is completely hidden in the stepper and the
|
|
system function.
|
|
OpenMP will take care of distributing the work among the threads and join them
|
|
automatically.
|
|
[phase_chain_integrate]
|
|
|
|
After integrating, the data can be accessed immediately and be processed
|
|
further.
|
|
Note, that you can specify the OpenMP scheduling by calling `omp_set_schedule`
|
|
in the beginning of your program:
|
|
[phase_chain_scheduling]
|
|
|
|
See [github_link examples/openmp/phase_chain.cpp
|
|
openmp/phase_chain.cpp] for the complete example.
|
|
|
|
[heading Split state]
|
|
|
|
[import ../examples/openmp/phase_chain_omp_state.cpp]
|
|
|
|
For advanced cases odeint offers another approach to use OpenMP that allows for
|
|
a more exact control of the parallelization.
|
|
For example, for odd-sized data where OpenMP's thread boundaries don't match
|
|
cache lines and hurt performance it might be advisable to copy the data from the
|
|
continuous `vector<T>` into separate, individually aligned, vectors.
|
|
For this, odeint provides the `openmp_state<T>` type, essentially an alias for
|
|
`vector<vector<T>>`.
|
|
|
|
Here, the initialization is done with a `vector<double>`, but then we use
|
|
odeint's `split` function to fill an `openmp_state`.
|
|
The splitting is done such that the sizes of the individual regions differ at
|
|
most by 1 to make the computation as uniform as possible.
|
|
[phase_chain_state_init]
|
|
|
|
Of course, the system function has to be changed to deal with the
|
|
`openmp_state`.
|
|
Note that each sub-region of the state is computed in a single task, but at the
|
|
borders read access to the neighbouring regions is required.
|
|
[phase_chain_state_rhs]
|
|
|
|
Using the `openmp_state<T>` state type automatically selects `openmp_algebra`
|
|
which executes odeint's internal computations on parallel regions.
|
|
Hence, no manual configuration of the stepper is necessary.
|
|
At the end of the integration, we use `unsplit` to concatenate the sub-regions
|
|
back together into a single vector.
|
|
[phase_chain_state_integrate]
|
|
|
|
[note You don't actually need to use `openmp_state<T>` for advanced use cases,
|
|
`openmp_algebra` is simply an alias for `openmp_nested_algebra<range_algebra>`
|
|
and supports any model of Random Access Range as the outer, parallel state type,
|
|
and will use the given algebra on its elements.]
|
|
|
|
See [github_link examples/openmp/phase_chain_omp_state.cpp
|
|
openmp/phase_chain_omp_state.cpp] for the complete example.
|
|
|
|
[endsect]
|
|
|
|
[section MPI]
|
|
|
|
[import ../examples/mpi/phase_chain.cpp]
|
|
|
|
To expand the parallel computation across multiple machines we can use MPI.
|
|
|
|
The system function implementation is similar to the OpenMP variant with split
|
|
data, the main difference being that while OpenMP uses a spawn/join model where
|
|
everything not explicitly paralleled is only executed in the main thread, in
|
|
MPI's model each node enters the `main()` method independently, diverging based
|
|
on its rank and synchronizing through message-passing and explicit barriers.
|
|
|
|
odeint's MPI support is implemented as an external backend, too.
|
|
Depending on the MPI implementation the code might need to be compiled with i.e.
|
|
[^mpic++].
|
|
[phase_chain_mpi_header]
|
|
|
|
Instead of reading another thread's data, we asynchronously send and receive the
|
|
relevant data from neighbouring nodes, performing some computation in the interim
|
|
to hide the latency.
|
|
[phase_chain_mpi_rhs]
|
|
|
|
Analogous to `openmp_state<T>` we use `mpi_state< InnerState<T> >`, which
|
|
automatically selects `mpi_nested_algebra` and the appropriate MPI-oblivious
|
|
inner algebra (since our inner state is a `vector`, the inner algebra will be
|
|
`range_algebra` as in the OpenMP example).
|
|
[phase_chain_state]
|
|
|
|
In the main program we construct a `communicator` which tells us the `size` of
|
|
the cluster and the current node's `rank` within that.
|
|
We generate the input data on the master node only, avoiding unnecessary work on
|
|
the other nodes.
|
|
Instead of simply copying chunks, `split` acts as a MPI collective function here
|
|
and sends/receives regions from master to each slave.
|
|
The input argument is ignored on the slaves, but the master node receives
|
|
a region in its output and will participate in the computation.
|
|
[phase_chain_mpi_init]
|
|
|
|
Now that `x_split` contains (only) the local chunk for each node, we start the
|
|
integration.
|
|
|
|
To print the result on the master node, we send the processed data back using
|
|
`unsplit`.
|
|
[phase_chain_mpi_integrate]
|
|
|
|
[note `mpi_nested_algebra::for_each`[~N] doesn't use any MPI constructs, it
|
|
simply calls the inner algebra on the local chunk and the system function is not
|
|
guarded by any barriers either, so if you don't manually place any (for example
|
|
in parameter studies cases where the elements are completely independent) you
|
|
might see the nodes diverging, returning from this call at different times.]
|
|
|
|
See [github_link examples/mpi/phase_chain.cpp
|
|
mpi/phase_chain.cpp] for the complete example.
|
|
|
|
[endsect]
|
|
|
|
[section Concepts]
|
|
|
|
[section MPI State]
|
|
As used by `mpi_nested_algebra`.
|
|
[heading Notation]
|
|
[variablelist
|
|
[[`InnerState`] [The inner state type]]
|
|
[[`State`] [The MPI-state type]]
|
|
[[`state`] [Object of type `State`]]
|
|
[[`world`] [Object of type `boost::mpi::communicator`]]
|
|
]
|
|
[heading Valid Expressions]
|
|
[table
|
|
[[Name] [Expression] [Type] [Semantics]]
|
|
[[Construct a state with a communicator]
|
|
[`State(world)`] [`State`] [Constructs the State.]]
|
|
[[Construct a state with the default communicator]
|
|
[`State()`] [`State`] [Constructs the State.]]
|
|
[[Get the current node's inner state]
|
|
[`state()`] [`InnerState`] [Returns a (const) reference.]]
|
|
[[Get the communicator]
|
|
[`state.world`] [`boost::mpi::communicator`] [See __boost_mpi.]]
|
|
]
|
|
[heading Models]
|
|
* `mpi_state<InnerState>`
|
|
|
|
[endsect]
|
|
|
|
[section OpenMP Split State]
|
|
As used by `openmp_nested_algebra`, essentially a Random Access Container with
|
|
`ValueType = InnerState`.
|
|
[heading Notation]
|
|
[variablelist
|
|
[[`InnerState`] [The inner state type]]
|
|
[[`State`] [The split state type]]
|
|
[[`state`] [Object of type `State`]]
|
|
]
|
|
[heading Valid Expressions]
|
|
[table
|
|
[[Name] [Expression] [Type] [Semantics]]
|
|
[[Construct a state for `n` chunks]
|
|
[`State(n)`] [`State`] [Constructs underlying `vector`.]]
|
|
[[Get a chunk]
|
|
[`state[i]`] [`InnerState`] [Accesses underlying `vector`.]]
|
|
[[Get the number of chunks]
|
|
[`state.size()`] [`size_type`] [Returns size of underlying `vector`.]]
|
|
]
|
|
[heading Models]
|
|
* `openmp_state<ValueType>` with `InnerState = vector<ValueType>`
|
|
|
|
[endsect]
|
|
|
|
[section Splitter]
|
|
[heading Notation]
|
|
[variablelist
|
|
[[`Container1`] [The continuous-data container type]]
|
|
[[`x`] [Object of type `Container1`]]
|
|
[[`Container2`] [The chunked-data container type]]
|
|
[[`y`] [Object of type `Container2`]]
|
|
]
|
|
[heading Valid Expressions]
|
|
[table
|
|
[[Name] [Expression] [Type] [Semantics]]
|
|
[[Copy chunks of input to output elements]
|
|
[`split(x, y)`] [`void`]
|
|
[Calls `split_impl<Container1, Container2>::split(x, y)`, splits `x` into
|
|
`y.size()` chunks.]]
|
|
[[Join chunks of input elements to output]
|
|
[`unsplit(y, x)`] [`void`]
|
|
[Calls `unsplit_impl<Container2, Container1>::unsplit(y, x)`, assumes `x`
|
|
is of the correct size ['__sigma `y[i].size()`], does not resize `x`.]]
|
|
]
|
|
[heading Models]
|
|
* defined for `Container1` = __boost_range and `Container2 = openmp_state`
|
|
* and `Container2 = mpi_state`.
|
|
|
|
To implement splitters for containers incompatible with __boost_range,
|
|
specialize the `split_impl` and `unsplit_impl` types:
|
|
```
|
|
template< class Container1, class Container2 , class Enabler = void >
|
|
struct split_impl {
|
|
static void split( const Container1 &from , Container2 &to );
|
|
};
|
|
|
|
template< class Container2, class Container1 , class Enabler = void >
|
|
struct unsplit_impl {
|
|
static void unsplit( const Container2 &from , Container1 &to );
|
|
};
|
|
```
|
|
[endsect]
|
|
|
|
[endsect]
|
|
|
|
[endsect]
|