178 lines
9.7 KiB
Plaintext
178 lines
9.7 KiB
Plaintext
[/
|
|
Copyright (c) 2019 Nick Thompson
|
|
Use, modification and distribution are subject to the
|
|
Boost Software License, Version 1.0. (See accompanying file
|
|
LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
]
|
|
|
|
[section:runs_test Runs tests]
|
|
|
|
[heading Synopsis]
|
|
|
|
```
|
|
#include <boost/math/statistics/runs_test.hpp>
|
|
|
|
namespace boost::math::statistics {
|
|
|
|
template<typename RandomAccessContainer>
|
|
std::pair<Real, Real> runs_above_and_below_threshold(RandomAccessContainer const & v, typename RandomAccessContainer::value_type threshold);
|
|
|
|
template<typename RandomAccessContainer>
|
|
std::pair<Real, Real> runs_above_and_below_median(RandomAccessContainer const & v);
|
|
|
|
}}}
|
|
```
|
|
|
|
[heading Background]
|
|
|
|
The "runs above and below median test" tries to determine if a sequence is random by looking at the number of consecutive values which exceed (or are below) the median of the sequence.
|
|
For example, if we are given data {5, 2, 0, 4, 7, 9, 10, 6, 1, 8, 3}, first we find the median (5), and transform the vector into a list of + and -'s, depending on whether the value is greater or less than the median.
|
|
(Values equal to the median we simply ignore-this is a convention we have inherited from the wonderful `randtests` package in R.)
|
|
Hence our data vector is transformed into
|
|
|
|
{-,-,-,+,+,+,-,+,-}
|
|
|
|
which is 5 runs, with /n/[sub a] = 5 values above and /n/[sub b] = 5 values below the median.
|
|
[@https://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm NIST] tells us the expected number of runs and their variance:
|
|
|
|
[$../graphs/expected_runs_above_threshold.svg]
|
|
|
|
from which we derive the test statistic
|
|
|
|
[$../graphs/runs_test_statistic.svg]
|
|
|
|
whose distribution we approximate as normal to extract a /p/-value.
|
|
|
|
|
|
|
|
An example usage is as follows:
|
|
|
|
```
|
|
#include <vector>
|
|
#include <random>
|
|
#include <boost/math/statistics/runs_test.hpp>
|
|
|
|
std::random_device rd;
|
|
std::vector<double> v{5, 2, 0, 4, 7, 9, 10, 6, 1, 8, 3};
|
|
auto [t, p] = boost::math::statistics::runs_above_and_below_median(v);
|
|
// t = -0.670820393249936919; p = 0.502334954360502017;
|
|
```
|
|
We see that we have about a 50% chance of seeing this number of runs if the null hypothesis of randomness is true, and hence the assumption of randomness seems reasonable.
|
|
As always, the test statistic is the first element of the pair, and the /p/-value is the second element.
|
|
|
|
|
|
[heading Performance]
|
|
|
|
There are two cases: Where the threshold (typically the median) has already been computed, and the case where the mean and sample variance must be computed on the fly.
|
|
Computing the median is fairly expensive (requiring a call to `boost::math::statistics::median`), and since the order of the original data must be preserved, it must allocate.
|
|
If you believe your data to come from a distribution where the means and median coincide, or if you've already computed the median in the course of some other analysis, then you can get away with a call to `runs_above_and_below_threshold` via
|
|
|
|
```
|
|
Real threshold = 5;
|
|
auto [t, p] = boost::math::statistics::runs_above_and_below_threshold(v, threshold);
|
|
```
|
|
|
|
The performance differences between these two cases are obvious:
|
|
|
|
```
|
|
---------------------------------------------------------------------------------------
|
|
Benchmark Time Bytes/second
|
|
---------------------------------------------------------------------------------------
|
|
BMRunsAboveAndBelowMedian<float>/8 260 ns 118.421M/s
|
|
BMRunsAboveAndBelowMedian<float>/16 318 ns 192.797M/s
|
|
BMRunsAboveAndBelowMedian<float>/32 417 ns 303.509M/s
|
|
BMRunsAboveAndBelowMedian<float>/64 625 ns 390.578M/s
|
|
BMRunsAboveAndBelowMedian<float>/128 743 ns 657.827M/s
|
|
BMRunsAboveAndBelowMedian<float>/256 1308 ns 767.181M/s
|
|
BMRunsAboveAndBelowMedian<float>/512 1896 ns 1034.31M/s
|
|
BMRunsAboveAndBelowMedian<float>/1024 6582 ns 594.458M/s
|
|
BMRunsAboveAndBelowMedian<float>/2048 26067 ns 300.001M/s
|
|
BMRunsAboveAndBelowMedian<float>/4096 62023 ns 252.125M/s
|
|
BMRunsAboveAndBelowMedian<float>/8192 124976 ns 250.256M/s
|
|
BMRunsAboveAndBelowMedian<float>/16384 242171 ns 258.29M/s
|
|
BMRunsAboveAndBelowMedian<float>/32768 528683 ns 236.714M/s
|
|
BMRunsAboveAndBelowMedian<float>/65536 965354 ns 259.185M/s
|
|
BMRunsAboveAndBelowMedian<float>/131072 2514693 ns 199.068M/s
|
|
BMRunsAboveAndBelowMedian<float>/262144 4223084 ns 237.058M/s
|
|
BMRunsAboveAndBelowMedian<float>/524288 8638963 ns 231.755M/s
|
|
BMRunsAboveAndBelowMedian<float>/1048576 16215682 ns 246.995M/s
|
|
BMRunsAboveAndBelowMedian<float>/2097152 39180496 ns 204.443M/s
|
|
BMRunsAboveAndBelowMedian<float>/4194304 82495779 ns 194.307M/s
|
|
BMRunsAboveAndBelowMedian<float>/8388608 142675936 ns 224.547M/s
|
|
BMRunsAboveAndBelowMedian<float>/16777216 287638068 ns 223.088M/s
|
|
BMRunsAboveAndBelowMedian<float>_BigO 17.25 N
|
|
BMRunsAboveAndBelowMedian<double>/8 191 ns 320.129M/s
|
|
BMRunsAboveAndBelowMedian<double>/16 233 ns 523.526M/s
|
|
BMRunsAboveAndBelowMedian<double>/32 334 ns 730.8M/s
|
|
BMRunsAboveAndBelowMedian<double>/64 456 ns 1070.93M/s
|
|
BMRunsAboveAndBelowMedian<double>/128 688 ns 1.38789G/s
|
|
BMRunsAboveAndBelowMedian<double>/256 1257 ns 1.51807G/s
|
|
BMRunsAboveAndBelowMedian<double>/512 2663 ns 1.43406G/s
|
|
BMRunsAboveAndBelowMedian<double>/1024 4100 ns 1.86266G/s
|
|
BMRunsAboveAndBelowMedian<double>/2048 23493 ns 665.851M/s
|
|
BMRunsAboveAndBelowMedian<double>/4096 57968 ns 539.551M/s
|
|
BMRunsAboveAndBelowMedian<double>/8192 142272 ns 439.746M/s
|
|
BMRunsAboveAndBelowMedian<double>/16384 260948 ns 479.639M/s
|
|
BMRunsAboveAndBelowMedian<double>/32768 551577 ns 453.623M/s
|
|
BMRunsAboveAndBelowMedian<double>/65536 1056583 ns 473.654M/s
|
|
BMRunsAboveAndBelowMedian<double>/131072 2123956 ns 471.35M/s
|
|
BMRunsAboveAndBelowMedian<double>/262144 5028745 ns 398.111M/s
|
|
BMRunsAboveAndBelowMedian<double>/524288 10399212 ns 384.981M/s
|
|
BMRunsAboveAndBelowMedian<double>/1048576 23089767 ns 348.496M/s
|
|
BMRunsAboveAndBelowMedian<double>/2097152 37626884 ns 425.962M/s
|
|
BMRunsAboveAndBelowMedian<double>/4194304 79281747 ns 404.088M/s
|
|
BMRunsAboveAndBelowMedian<double>/8388608 172055781 ns 373.391M/s
|
|
BMRunsAboveAndBelowMedian<double>/16777216 391377449 ns 332.01M/s
|
|
BMRunsAboveAndBelowMedian<double>_BigO 22.52 N
|
|
BMRunsAboveAndBelowThreshold<float>/8 41.6 ns 739.55M/s
|
|
BMRunsAboveAndBelowThreshold<float>/16 58.4 ns 1050.48M/s
|
|
BMRunsAboveAndBelowThreshold<float>/32 66.5 ns 1.79606G/s
|
|
BMRunsAboveAndBelowThreshold<float>/64 115 ns 2.0762G/s
|
|
BMRunsAboveAndBelowThreshold<float>/128 198 ns 2.41515G/s
|
|
BMRunsAboveAndBelowThreshold<float>/256 365 ns 2.61328G/s
|
|
BMRunsAboveAndBelowThreshold<float>/512 720 ns 2.65053G/s
|
|
BMRunsAboveAndBelowThreshold<float>/1024 1424 ns 2.68123G/s
|
|
BMRunsAboveAndBelowThreshold<float>/2048 3009 ns 2.5379G/s
|
|
BMRunsAboveAndBelowThreshold<float>/4096 16748 ns 933.699M/s
|
|
BMRunsAboveAndBelowThreshold<float>/8192 40190 ns 778.105M/s
|
|
BMRunsAboveAndBelowThreshold<float>/16384 86500 ns 723.067M/s
|
|
BMRunsAboveAndBelowThreshold<float>/32768 176692 ns 708.108M/s
|
|
BMRunsAboveAndBelowThreshold<float>/65536 356863 ns 701.198M/s
|
|
BMRunsAboveAndBelowThreshold<float>/131072 714807 ns 700.08M/s
|
|
BMRunsAboveAndBelowThreshold<float>/262144 1429078 ns 700.415M/s
|
|
BMRunsAboveAndBelowThreshold<float>/524288 2877227 ns 695.785M/s
|
|
BMRunsAboveAndBelowThreshold<float>/1048576 5795662 ns 691.222M/s
|
|
BMRunsAboveAndBelowThreshold<float>/2097152 11562715 ns 692.427M/s
|
|
BMRunsAboveAndBelowThreshold<float>/4194304 23364846 ns 686.464M/s
|
|
BMRunsAboveAndBelowThreshold<float>/8388608 46442540 ns 689.871M/s
|
|
BMRunsAboveAndBelowThreshold<float>/16777216 92284501 ns 694.006M/s
|
|
BMRunsAboveAndBelowThreshold<float>_BigO 5.51 N
|
|
BMRunsAboveAndBelowThreshold<double>/8 45.1 ns 1.32169G/s
|
|
BMRunsAboveAndBelowThreshold<double>/16 53.6 ns 2.22712G/s
|
|
BMRunsAboveAndBelowThreshold<double>/32 71.4 ns 3.34079G/s
|
|
BMRunsAboveAndBelowThreshold<double>/64 112 ns 4.24946G/s
|
|
BMRunsAboveAndBelowThreshold<double>/128 196 ns 4.87317G/s
|
|
BMRunsAboveAndBelowThreshold<double>/256 378 ns 5.04476G/s
|
|
BMRunsAboveAndBelowThreshold<double>/512 702 ns 5.44134G/s
|
|
BMRunsAboveAndBelowThreshold<double>/1024 1417 ns 5.3865G/s
|
|
BMRunsAboveAndBelowThreshold<double>/2048 3031 ns 5.03872G/s
|
|
BMRunsAboveAndBelowThreshold<double>/4096 16813 ns 1.81669G/s
|
|
BMRunsAboveAndBelowThreshold<double>/8192 41182 ns 1.48565G/s
|
|
BMRunsAboveAndBelowThreshold<double>/16384 86939 ns 1.40536G/s
|
|
BMRunsAboveAndBelowThreshold<double>/32768 177255 ns 1.37892G/s
|
|
BMRunsAboveAndBelowThreshold<double>/65536 356391 ns 1.3713G/s
|
|
BMRunsAboveAndBelowThreshold<double>/131072 718417 ns 1.36057G/s
|
|
BMRunsAboveAndBelowThreshold<double>/262144 1442288 ns 1.35583G/s
|
|
BMRunsAboveAndBelowThreshold<double>/524288 2942259 ns 1.33217G/s
|
|
BMRunsAboveAndBelowThreshold<double>/1048576 5870235 ns 1.33244G/s
|
|
BMRunsAboveAndBelowThreshold<double>/2097152 11743081 ns 1.33192G/s
|
|
BMRunsAboveAndBelowThreshold<double>/4194304 23521002 ns 1.32976G/s
|
|
BMRunsAboveAndBelowThreshold<double>/8388608 46917407 ns 1.33339G/s
|
|
BMRunsAboveAndBelowThreshold<double>/16777216 93823876 ns 1.33305G/s
|
|
BMRunsAboveAndBelowThreshold<double>_BigO 5.59 N 5.59 N
|
|
```
|
|
|
|
|
|
[endsect]
|
|
[/section:runs_test]
|