math/doc/statistics/runs_test.qbk

[/
Copyright (c) 2019 Nick Thompson
Use, modification and distribution are subject to the
Boost Software License, Version 1.0. (See accompanying file
LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
]

[section:runs_test Runs tests]

[heading Synopsis]

```
#include <boost/math/statistics/runs_test.hpp>

namespace boost::math::statistics {

template<typename RandomAccessContainer>
std::pair<Real, Real> runs_above_and_below_threshold(RandomAccessContainer const & v, typename RandomAccessContainer::value_type threshold);

template<typename RandomAccessContainer>
std::pair<Real, Real> runs_above_and_below_median(RandomAccessContainer const & v);

}}}
```

[heading Background]

The "runs above and below median test" tries to determine if a sequence is random by looking at the number of consecutive values which exceed (or are below) the median of the sequence.
For example, if we are given data {5, 2, 0, 4, 7, 9, 10, 6, 1, 8, 3}, first we find the median (5), and transform the vector into a list of + and -'s, depending on whether the value is greater or less than the median.
(Values equal to the median we simply ignore-this is a convention we have inherited from the wonderful `randtests` package in R.)
Hence our data vector is transformed into

{-,-,-,+,+,+,-,+,-}

which is 5 runs, with /n/[sub a] = 5 values above and /n/[sub b] = 5 values below the median.
[@https://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm NIST] tells us the expected number of runs and their variance:

[$../graphs/expected_runs_above_threshold.svg]

from which we derive the test statistic

[$../graphs/runs_test_statistic.svg]

whose distribution we approximate as normal to extract a /p/-value.


An example usage is as follows:

```
#include <vector>
#include <random>
#include <boost/math/statistics/runs_test.hpp>

std::random_device rd;
std::vector<double> v{5, 2, 0, 4, 7, 9, 10, 6, 1, 8, 3};
auto [t, p] = boost::math::statistics::runs_above_and_below_median(v);
// t =  -0.670820393249936919; p = 0.502334954360502017;
```
We see that we have about a 50% chance of seeing this number of runs if the null hypothesis of randomness is true, and hence the assumption of randomness seems reasonable.
As always, the test statistic is the first element of the pair, and the /p/-value is the second element.


[heading Performance]

There are two cases: Where the threshold (typically the median) has already been computed, and the case where the mean and sample variance must be computed on the fly.
Computing the median is fairly expensive (requiring a call to `boost::math::statistics::median`), and since the order of the original data must be preserved, it must allocate.
If you believe your data to come from a distribution where the means and median coincide, or if you've already computed the median in the course of some other analysis, then you can get away with a call to `runs_above_and_below_threshold` via

```
Real threshold = 5;
auto [t, p] = boost::math::statistics::runs_above_and_below_threshold(v, threshold);
```

The performance differences between these two cases are obvious:

```
---------------------------------------------------------------------------------------
Benchmark                                            Time   Bytes/second
---------------------------------------------------------------------------------------
BMRunsAboveAndBelowMedian<float>/8                   260 ns 118.421M/s
BMRunsAboveAndBelowMedian<float>/16                  318 ns 192.797M/s
BMRunsAboveAndBelowMedian<float>/32                  417 ns 303.509M/s
BMRunsAboveAndBelowMedian<float>/64                  625 ns 390.578M/s
BMRunsAboveAndBelowMedian<float>/128                 743 ns 657.827M/s
BMRunsAboveAndBelowMedian<float>/256                1308 ns 767.181M/s
BMRunsAboveAndBelowMedian<float>/512                1896 ns 1034.31M/s
BMRunsAboveAndBelowMedian<float>/1024               6582 ns 594.458M/s
BMRunsAboveAndBelowMedian<float>/2048              26067 ns 300.001M/s
BMRunsAboveAndBelowMedian<float>/4096              62023 ns 252.125M/s
BMRunsAboveAndBelowMedian<float>/8192             124976 ns 250.256M/s
BMRunsAboveAndBelowMedian<float>/16384            242171 ns 258.29M/s
BMRunsAboveAndBelowMedian<float>/32768            528683 ns 236.714M/s
BMRunsAboveAndBelowMedian<float>/65536            965354 ns 259.185M/s
BMRunsAboveAndBelowMedian<float>/131072          2514693 ns 199.068M/s
BMRunsAboveAndBelowMedian<float>/262144          4223084 ns 237.058M/s
BMRunsAboveAndBelowMedian<float>/524288          8638963 ns 231.755M/s
BMRunsAboveAndBelowMedian<float>/1048576        16215682 ns 246.995M/s
BMRunsAboveAndBelowMedian<float>/2097152        39180496 ns 204.443M/s
BMRunsAboveAndBelowMedian<float>/4194304        82495779 ns 194.307M/s
BMRunsAboveAndBelowMedian<float>/8388608       142675936 ns 224.547M/s
BMRunsAboveAndBelowMedian<float>/16777216      287638068 ns 223.088M/s
BMRunsAboveAndBelowMedian<float>_BigO              17.25 N
BMRunsAboveAndBelowMedian<double>/8                  191 ns 320.129M/s
BMRunsAboveAndBelowMedian<double>/16                 233 ns 523.526M/s
BMRunsAboveAndBelowMedian<double>/32                 334 ns 730.8M/s
BMRunsAboveAndBelowMedian<double>/64                 456 ns 1070.93M/s
BMRunsAboveAndBelowMedian<double>/128                688 ns 1.38789G/s
BMRunsAboveAndBelowMedian<double>/256               1257 ns 1.51807G/s
BMRunsAboveAndBelowMedian<double>/512               2663 ns 1.43406G/s
BMRunsAboveAndBelowMedian<double>/1024              4100 ns 1.86266G/s
BMRunsAboveAndBelowMedian<double>/2048             23493 ns 665.851M/s
BMRunsAboveAndBelowMedian<double>/4096             57968 ns 539.551M/s
BMRunsAboveAndBelowMedian<double>/8192            142272 ns 439.746M/s
BMRunsAboveAndBelowMedian<double>/16384           260948 ns 479.639M/s
BMRunsAboveAndBelowMedian<double>/32768           551577 ns 453.623M/s
BMRunsAboveAndBelowMedian<double>/65536          1056583 ns 473.654M/s
BMRunsAboveAndBelowMedian<double>/131072         2123956 ns 471.35M/s
BMRunsAboveAndBelowMedian<double>/262144         5028745 ns 398.111M/s
BMRunsAboveAndBelowMedian<double>/524288        10399212 ns 384.981M/s
BMRunsAboveAndBelowMedian<double>/1048576       23089767 ns 348.496M/s
BMRunsAboveAndBelowMedian<double>/2097152       37626884 ns 425.962M/s
BMRunsAboveAndBelowMedian<double>/4194304       79281747 ns 404.088M/s
BMRunsAboveAndBelowMedian<double>/8388608      172055781 ns 373.391M/s
BMRunsAboveAndBelowMedian<double>/16777216     391377449 ns 332.01M/s
BMRunsAboveAndBelowMedian<double>_BigO             22.52 N
BMRunsAboveAndBelowThreshold<float>/8               41.6 ns 739.55M/s
BMRunsAboveAndBelowThreshold<float>/16              58.4 ns 1050.48M/s
BMRunsAboveAndBelowThreshold<float>/32              66.5 ns 1.79606G/s
BMRunsAboveAndBelowThreshold<float>/64               115 ns 2.0762G/s
BMRunsAboveAndBelowThreshold<float>/128              198 ns 2.41515G/s
BMRunsAboveAndBelowThreshold<float>/256              365 ns 2.61328G/s
BMRunsAboveAndBelowThreshold<float>/512              720 ns 2.65053G/s
BMRunsAboveAndBelowThreshold<float>/1024            1424 ns 2.68123G/s
BMRunsAboveAndBelowThreshold<float>/2048            3009 ns 2.5379G/s
BMRunsAboveAndBelowThreshold<float>/4096           16748 ns 933.699M/s
BMRunsAboveAndBelowThreshold<float>/8192           40190 ns 778.105M/s
BMRunsAboveAndBelowThreshold<float>/16384          86500 ns 723.067M/s
BMRunsAboveAndBelowThreshold<float>/32768         176692 ns 708.108M/s
BMRunsAboveAndBelowThreshold<float>/65536         356863 ns 701.198M/s
BMRunsAboveAndBelowThreshold<float>/131072        714807 ns 700.08M/s
BMRunsAboveAndBelowThreshold<float>/262144       1429078 ns 700.415M/s
BMRunsAboveAndBelowThreshold<float>/524288       2877227 ns 695.785M/s
BMRunsAboveAndBelowThreshold<float>/1048576      5795662 ns 691.222M/s
BMRunsAboveAndBelowThreshold<float>/2097152     11562715 ns 692.427M/s
BMRunsAboveAndBelowThreshold<float>/4194304     23364846 ns 686.464M/s
BMRunsAboveAndBelowThreshold<float>/8388608     46442540 ns 689.871M/s
BMRunsAboveAndBelowThreshold<float>/16777216    92284501 ns 694.006M/s
BMRunsAboveAndBelowThreshold<float>_BigO            5.51 N
BMRunsAboveAndBelowThreshold<double>/8              45.1 ns 1.32169G/s
BMRunsAboveAndBelowThreshold<double>/16             53.6 ns 2.22712G/s
BMRunsAboveAndBelowThreshold<double>/32             71.4 ns 3.34079G/s
BMRunsAboveAndBelowThreshold<double>/64              112 ns 4.24946G/s
BMRunsAboveAndBelowThreshold<double>/128             196 ns 4.87317G/s
BMRunsAboveAndBelowThreshold<double>/256             378 ns 5.04476G/s
BMRunsAboveAndBelowThreshold<double>/512             702 ns 5.44134G/s
BMRunsAboveAndBelowThreshold<double>/1024           1417 ns 5.3865G/s
BMRunsAboveAndBelowThreshold<double>/2048           3031 ns 5.03872G/s
BMRunsAboveAndBelowThreshold<double>/4096          16813 ns 1.81669G/s
BMRunsAboveAndBelowThreshold<double>/8192          41182 ns 1.48565G/s
BMRunsAboveAndBelowThreshold<double>/16384         86939 ns 1.40536G/s
BMRunsAboveAndBelowThreshold<double>/32768        177255 ns 1.37892G/s
BMRunsAboveAndBelowThreshold<double>/65536        356391 ns 1.3713G/s
BMRunsAboveAndBelowThreshold<double>/131072       718417 ns 1.36057G/s
BMRunsAboveAndBelowThreshold<double>/262144      1442288 ns 1.35583G/s
BMRunsAboveAndBelowThreshold<double>/524288      2942259 ns 1.33217G/s
BMRunsAboveAndBelowThreshold<double>/1048576     5870235 ns 1.33244G/s
BMRunsAboveAndBelowThreshold<double>/2097152    11743081 ns 1.33192G/s
BMRunsAboveAndBelowThreshold<double>/4194304    23521002 ns 1.32976G/s
BMRunsAboveAndBelowThreshold<double>/8388608    46917407 ns 1.33339G/s
BMRunsAboveAndBelowThreshold<double>/16777216   93823876 ns 1.33305G/s
BMRunsAboveAndBelowThreshold<double>_BigO           5.59 N          5.59 N
```


[endsect]
[/section:runs_test]