209 lines
11 KiB
Plaintext
209 lines
11 KiB
Plaintext
[/
|
|
Copyright 2018 Nick Thompson
|
|
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt).
|
|
]
|
|
|
|
[section:signal_statistics Signal Statistics]
|
|
|
|
[heading Synopsis]
|
|
|
|
``
|
|
#include <boost/math/statistics/signal_statistics.hpp>
|
|
|
|
namespace boost::math::statistics {
|
|
|
|
template<class Container>
|
|
auto absolute_gini_coefficient(Container & c);
|
|
|
|
template<class ForwardIterator>
|
|
auto absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);
|
|
|
|
template<class Container>
|
|
auto sample_absolute_gini_coefficient(Container & c);
|
|
|
|
template<class ForwardIterator>
|
|
auto sample_absolute_gini_coefficient(ForwardIterator first, ForwardIterator last);
|
|
|
|
template<class Container>
|
|
auto hoyer_sparsity(Container const & c);
|
|
|
|
template<class ForwardIterator>
|
|
auto hoyer_sparsity(ForwardIterator first, ForwardIterator last);
|
|
|
|
template<class Container>
|
|
auto oracle_snr(Container const & signal, Container const & noisy_signal);
|
|
|
|
template<class Container>
|
|
auto oracle_snr_db(Container const & signal, Container const & noisy_signal);
|
|
|
|
template<class ForwardIterator>
|
|
auto m2m4_snr_estimator(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3);
|
|
|
|
template<class Container>
|
|
auto m2m4_snr_estimator(Container const & noisy_signal, typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3);
|
|
|
|
template<class ForwardIterator>
|
|
auto m2m4_snr_estimator_db(ForwardIterator first, ForwardIterator last, decltype(*first) estimated_signal_kurtosis=1, decltype(*first) estimated_noise_kurtosis=3);
|
|
|
|
template<class Container>
|
|
auto m2m4_snr_estimator_db(Container const & noisy_signal,typename Container::value_type estimated_signal_kurtosis=1, typename Container::value_type estimate_noise_kurtosis=3);
|
|
|
|
}
|
|
``
|
|
|
|
[heading Description]
|
|
|
|
The file `boost/math/statistics/signal_statistics.hpp` is a set of facilities for computing quantities commonly used in signal analysis.
|
|
|
|
Our examples use `std::vector<double>` to hold the data, but this not required.
|
|
In general, you can store your data in an Eigen array, and Armadillo vector, `std::array`, and for many of the routines, a `std::forward_list`.
|
|
These routines are usable in float, double, long double, and Boost.Multiprecision precision, as well as their complex extensions whenever the computation is well-defined.
|
|
|
|
|
|
[heading Absolute Gini Coefficient]
|
|
|
|
The Gini coefficient, first used to measure wealth inequality, is also one of the best measures of the sparsity of an expansion in a basis.
|
|
A sparse expansion has most of its norm concentrated in just a few coefficients, making the connection with wealth inequality obvious.
|
|
See [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard] for details.
|
|
However, for measuring sparsity, the phase of the numbers is irrelevant, so we provide the `absolute_gini_coefficient`:
|
|
|
|
using boost::math::statistics::sample_absolute_gini_coefficient;
|
|
using boost::math::statistics::absolute_gini_coefficient;
|
|
std::vector<std::complex<double>> v{{0,1}, {0,0}, {0,0}, {0,0}};
|
|
double abs_gini = sample_absolute_gini_coefficient(v);
|
|
// now abs_gini = 1; maximally unequal
|
|
|
|
std::vector<std::complex<double>> w{{0,1}, {1,0}, {0,-1}, {-1,0}};
|
|
abs_gini = absolute_gini_coefficient(w);
|
|
// now abs_gini = 0; every element of the vector has equal magnitude
|
|
|
|
std::vector<double> u{-1, 1, -1};
|
|
abs_gini = absolute_gini_coefficient(u);
|
|
// now abs_gini = 0
|
|
// Alternative call useful for computing over subset of the input:
|
|
abs_gini = absolute_gini_coefficient(u.begin(), u.begin() + 1);
|
|
|
|
|
|
The sample Gini coefficient returns unity for a vector which has only one nonzero coefficient.
|
|
The population Gini coefficient of a vector with one non-zero element is dependent on the length of the input.
|
|
|
|
The sample Gini coefficient lacks one desirable property of the population Gini coefficient,
|
|
namely that "cloning" a vector has the same Gini coefficient; though cloning holds to very high accuracy with the sample Gini coefficient and can easily be recovered by a rescaling.
|
|
|
|
If sorting the input data is too much expense for a sparsity measure (is it going to be perfect anyway?),
|
|
consider calculating the Hoyer sparsity instead.
|
|
|
|
[heading Hoyer Sparsity]
|
|
|
|
The Hoyer sparsity measures a normalized ratio of the \u2113[super 1] and \u2113[super 2] norms.
|
|
As the name suggests, it is used to measure the sparsity of an expansion in some basis.
|
|
|
|
The Hoyer sparsity computes ([radic]/N/ - \u2113[super 1](v)/\u2113[super 2](v))/([radic]N -1).
|
|
For details, see [@http://www.jmlr.org/papers/volume5/hoyer04a/hoyer04a.pdf Hoyer] as well as [@https://arxiv.org/pdf/0811.4706.pdf Hurley and Rickard].
|
|
|
|
A few special cases will serve to clarify the intended use:
|
|
If /v/ has only one nonzero coefficient, the Hoyer sparsity attains its maxima of 1.
|
|
If the coefficients of /v/ all have the same magnitude, then the Hoyer sparsity attains its minima of zero.
|
|
If the elements of /v/ are uniformly distributed on an interval [0, /b/], then the Hoyer sparsity is approximately 0.133.
|
|
|
|
|
|
Usage:
|
|
|
|
std::vector<Real> v{1,0,0};
|
|
Real hs = boost::math::statistics::hoyer_sparsity(v);
|
|
// hs = 1
|
|
std::vector<Real> v{1,-1,1};
|
|
Real hs = boost::math::statistics::hoyer_sparsity(v.begin(), v.end());
|
|
// hs = 0
|
|
|
|
The container must be forward iterable and the contents are not modified.
|
|
Accepts real, complex, and integer inputs.
|
|
If the input is an integral type, the output is a double precision float.
|
|
|
|
|
|
[heading Oracle Signal-to-noise ratio]
|
|
|
|
The function `oracle_snr` computes the ratio \u2016 /s/ \u2016[sub 2][super 2] / \u2016 /s/ - /x/ \u2016[sub 2][super 2], where /s/ is signal and /x/ is a noisy signal.
|
|
The function `oracle_snr_db` computes 10`log`[sub 10](\u2016 /s/ \u2016[super 2] / \u2016 /s/ - /x/ \u2016[super 2]).
|
|
The functions are so named because in general, one does not know how to decompose a real signal /x/ into /s/ + /w/ and as such /s/ is regarded as oracle information.
|
|
Hence this function is mainly useful for unit testing other SNR estimators.
|
|
|
|
Usage:
|
|
|
|
std::vector<double> signal(500, 3.2);
|
|
std::vector<double> noisy_signal(500);
|
|
// fill 'noisy_signal' signal + noise
|
|
double snr_db = boost::math::statistics::oracle_snr_db(signal, noisy_signal);
|
|
double snr = boost::math::statistics::oracle_snr(signal, noisy_signal);
|
|
|
|
The input can be real, complex, or integral.
|
|
Integral inputs produce double precision floating point outputs.
|
|
The input data is not modified and must satisfy the requirements of a `RandomAccessContainer`.
|
|
|
|
[heading /M/[sub 2]/M/[sub 4] SNR Estimation]
|
|
|
|
Estimates the SNR of a noisy signal via the /M/[sub 2]/M/[sub 4] method.
|
|
See [@https://doi.org/10.1109/26.871393 Pauluzzi and N.C. Beaulieu] and [@https://doi.org/10.1109/ISIT.1994.394869 Matzner and Englberger] for details.
|
|
|
|
std::vector<double> noisy_signal(512);
|
|
// fill noisy_signal with data contaminated by Gaussian white noise:
|
|
double est_snr_db = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal);
|
|
|
|
The /M/[sub 2]/M/[sub 4] SNR estimator is an "in-service" estimator, meaning that the estimate is made using the noisy, data-bearing signal, and does not require a background estimate.
|
|
This estimator has been found to be work best between roughly -3 and 15db, tending to overestimate the noise below -3db, and underestimate the noise above 15db.
|
|
See [@https://www.mdpi.com/2078-2489/8/3/75/pdf Xue et al] for details.
|
|
|
|
The /M/[sub 2]/M/[sub 4] SNR estimator, by default, assumes that the kurtosis of the signal is 1 and the kurtosis of the noise is 3, the latter corresponding to Gaussian noise.
|
|
These parameters, however, can be overridden:
|
|
|
|
std::vector<double> noisy_signal(512);
|
|
// fill noisy_signal with the data:
|
|
double signal_kurtosis = 1.5;
|
|
// Noise is assumed to follow Laplace distribution, which has kurtosis of 6:
|
|
double noise_kurtosis = 6;
|
|
double est_snr = boost::math::statistics::m2m4_snr_estimator_db(noisy_signal, signal_kurtosis, noise_kurtosis);
|
|
|
|
Now, technically the method is a "blind SNR estimator", meaning that the no /a-priori/ information about the signal is required to use the method.
|
|
However, the performance of the method is /vastly/ better if you can come up with a better estimate of the signal and noise kurtosis.
|
|
How can we do this? Suppose we know that the SNR is much greater than 1.
|
|
Then we can estimate the signal kurtosis simply by using the noisy signal kurtosis.
|
|
If the SNR is much less than one, this method breaks down as the noisy signal kurtosis will tend to the noise kurtosis-though in this limit we have an excellent estimator of the noise kurtosis!
|
|
In addition, if you have a model of what your signal should look like, you can precompute the signal kurtosis.
|
|
For example, sinusoids have a kurtosis of 1.5.
|
|
See [@http://www.jcomputers.us/vol8/jcp0808-21.pdf here] for a study which uses estimates of this sort to improve the performance of the /M/[sub 2]/M/[sub 4] estimator.
|
|
|
|
|
|
/Nota bene/: The traditional definition of SNR is /not/ mean invariant.
|
|
By this we mean that if a constant is added to every sample of a signal, the SNR is changed.
|
|
For example, adding DC bias to a signal changes its SNR.
|
|
For most use cases, this is really not what you intend; for example a signal consisting of zeros plus Gaussian noise has an SNR of zero,
|
|
whereas a signal with a constant DC bias and random Gaussian noise might have a very large SNR.
|
|
|
|
The /M/[sub 2]/M/[sub 4] SNR estimator is computed from mean-invariant quantities,
|
|
and hence it should really be compared to the mean-invariant SNR.
|
|
|
|
/Nota bene/: This computation requires the solution of a system of quadratic equations involving the noise kurtosis, the signal kurtosis, and the second and fourth moments of the data.
|
|
There is no guarantee that a solution of this system exists for all value of these parameters, in fact nonexistence can easily be demonstrated for certain data.
|
|
If there is no solution to the system, then failure is communicated by returning NaNs.
|
|
This happens distressingly often; if a user is aware of any blind SNR estimators which do not suffer from this drawback, please open a github ticket and let us know.
|
|
|
|
The author has not managed to fully characterize the conditions under which a real solution with /S > 0/ and /N >0/ exists.
|
|
However, a very intuitive example demonstrates why nonexistence can occur.
|
|
Suppose the signal and noise kurtosis are equal.
|
|
Then the method has no way to distinguish between the signal and the noise, and the solution is non-unique.
|
|
|
|
|
|
[heading References]
|
|
|
|
* Mallat, Stephane. ['A wavelet tour of signal processing: the sparse way.] Academic press, 2008.
|
|
* Hurley, Niall, and Scott Rickard. ['Comparing measures of sparsity.] IEEE Transactions on Information Theory 55.10 (2009): 4723-4741.
|
|
* Jensen, Arne, and Anders la Cour-Harbo. ['Ripples in mathematics: the discrete wavelet transform.] Springer Science & Business Media, 2001.
|
|
* D. R. Pauluzzi and N. C. Beaulieu, ['A comparison of SNR estimation techniques for the AWGN channel,] IEEE Trans. Communications, Vol. 48, No. 10, pp. 1681-1691, 2000.
|
|
* Hoyer, Patrik O. ['Non-negative matrix factorization with sparseness constraints.], Journal of machine learning research 5.Nov (2004): 1457-1469.
|
|
|
|
[endsect]
|
|
[/section:signal_statistics Signal Statistics]
|