401 lines
14 KiB
Plaintext
401 lines
14 KiB
Plaintext
[section:binomial_dist Binomial Distribution]
|
|
|
|
``#include <boost/math/distributions/binomial.hpp>``
|
|
|
|
namespace boost{ namespace math{
|
|
|
|
template <class RealType = double,
|
|
class ``__Policy`` = ``__policy_class`` >
|
|
class binomial_distribution;
|
|
|
|
typedef binomial_distribution<> binomial;
|
|
|
|
template <class RealType, class ``__Policy``>
|
|
class binomial_distribution
|
|
{
|
|
public:
|
|
typedef RealType value_type;
|
|
typedef Policy policy_type;
|
|
|
|
static const ``['unspecified-type]`` clopper_pearson_exact_interval;
|
|
static const ``['unspecified-type]`` jeffreys_prior_interval;
|
|
|
|
// construct:
|
|
binomial_distribution(RealType n, RealType p);
|
|
|
|
// parameter access::
|
|
RealType success_fraction() const;
|
|
RealType trials() const;
|
|
|
|
// Bounds on success fraction:
|
|
static RealType find_lower_bound_on_p(
|
|
RealType trials,
|
|
RealType successes,
|
|
RealType probability,
|
|
``['unspecified-type]`` method = clopper_pearson_exact_interval);
|
|
static RealType find_upper_bound_on_p(
|
|
RealType trials,
|
|
RealType successes,
|
|
RealType probability,
|
|
``['unspecified-type]`` method = clopper_pearson_exact_interval);
|
|
|
|
// estimate min/max number of trials:
|
|
static RealType find_minimum_number_of_trials(
|
|
RealType k, // number of events
|
|
RealType p, // success fraction
|
|
RealType alpha); // risk level
|
|
|
|
static RealType find_maximum_number_of_trials(
|
|
RealType k, // number of events
|
|
RealType p, // success fraction
|
|
RealType alpha); // risk level
|
|
};
|
|
|
|
}} // namespaces
|
|
|
|
The class type `binomial_distribution` represents a
|
|
[@http://mathworld.wolfram.com/BinomialDistribution.html binomial distribution]:
|
|
it is used when there are exactly two mutually
|
|
exclusive outcomes of a trial. These outcomes are labelled
|
|
"success" and "failure". The
|
|
__binomial_distrib is used to obtain
|
|
the probability of observing k successes in N trials, with the
|
|
probability of success on a single trial denoted by p. The
|
|
binomial distribution assumes that p is fixed for all trials.
|
|
|
|
[note The random variable for the binomial distribution is the number of successes,
|
|
(the number of trials is a fixed property of the distribution)
|
|
whereas for the negative binomial,
|
|
the random variable is the number of trials, for a fixed number of successes.]
|
|
|
|
The PDF for the binomial distribution is given by:
|
|
|
|
[equation binomial_ref2]
|
|
|
|
The following two graphs illustrate how the PDF changes depending
|
|
upon the distributions parameters, first we'll keep the success
|
|
fraction /p/ fixed at 0.5, and vary the sample size:
|
|
|
|
[graph binomial_pdf_1]
|
|
|
|
Alternatively, we can keep the sample size fixed at N=20 and
|
|
vary the success fraction /p/:
|
|
|
|
[graph binomial_pdf_2]
|
|
|
|
[discrete_quantile_warning Binomial]
|
|
|
|
[h4 Member Functions]
|
|
|
|
[h5 Construct]
|
|
|
|
binomial_distribution(RealType n, RealType p);
|
|
|
|
Constructor: /n/ is the total number of trials, /p/ is the
|
|
probability of success of a single trial.
|
|
|
|
Requires `0 <= p <= 1`, and `n >= 0`, otherwise calls __domain_error.
|
|
|
|
[h5 Accessors]
|
|
|
|
RealType success_fraction() const;
|
|
|
|
Returns the parameter /p/ from which this distribution was constructed.
|
|
|
|
RealType trials() const;
|
|
|
|
Returns the parameter /n/ from which this distribution was constructed.
|
|
|
|
[h5 Lower Bound on the Success Fraction]
|
|
|
|
static RealType find_lower_bound_on_p(
|
|
RealType trials,
|
|
RealType successes,
|
|
RealType alpha,
|
|
``['unspecified-type]`` method = clopper_pearson_exact_interval);
|
|
|
|
Returns a lower bound on the success fraction:
|
|
|
|
[variablelist
|
|
[[trials][The total number of trials conducted.]]
|
|
[[successes][The number of successes that occurred.]]
|
|
[[alpha][The largest acceptable probability that the true value of
|
|
the success fraction is [*less than] the value returned.]]
|
|
[[method][An optional parameter that specifies the method to be used
|
|
to compute the interval (See below).]]
|
|
]
|
|
|
|
For example, if you observe /k/ successes from /n/ trials the
|
|
best estimate for the success fraction is simply ['k/n], but if you
|
|
want to be 95% sure that the true value is [*greater than] some value,
|
|
['p[sub min]], then:
|
|
|
|
p``[sub min]`` = binomial_distribution<RealType>::find_lower_bound_on_p(n, k, 0.05);
|
|
|
|
[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
|
|
|
|
There are currently two possible values available for the /method/
|
|
optional parameter: /clopper_pearson_exact_interval/
|
|
or /jeffreys_prior_interval/. These constants are both members of
|
|
class template `binomial_distribution`, so usage is for example:
|
|
|
|
p = binomial_distribution<RealType>::find_lower_bound_on_p(
|
|
n, k, 0.05, binomial_distribution<RealType>::jeffreys_prior_interval);
|
|
|
|
The default method if this parameter is not specified is the Clopper Pearson
|
|
"exact" interval. This produces an interval that guarantees at least
|
|
`100(1-alpha)%` coverage, but which is known to be overly conservative,
|
|
sometimes producing intervals with much greater than the requested coverage.
|
|
|
|
The alternative calculation method produces a non-informative
|
|
Jeffreys Prior interval. It produces `100(1-alpha)%` coverage only
|
|
['in the average case], though is typically very close to the requested
|
|
coverage level. It is one of the main methods of calculation recommended
|
|
in the review by Brown, Cai and DasGupta.
|
|
|
|
Please note that the "textbook" calculation method using
|
|
a normal approximation (the Wald interval) is deliberately
|
|
not provided: it is known to produce consistently poor results,
|
|
even when the sample size is surprisingly large.
|
|
Refer to Brown, Cai and DasGupta for a full explanation. Many other methods
|
|
of calculation are available, and may be more appropriate for specific
|
|
situations. Unfortunately there appears to be no consensus amongst
|
|
statisticians as to which is "best": refer to the discussion at the end of
|
|
Brown, Cai and DasGupta for examples.
|
|
|
|
The two methods provided here were chosen principally because they
|
|
can be used for both one and two sided intervals.
|
|
See also:
|
|
|
|
Lawrence D. Brown, T. Tony Cai and Anirban DasGupta (2001),
|
|
Interval Estimation for a Binomial Proportion,
|
|
Statistical Science, Vol. 16, No. 2, 101-133.
|
|
|
|
T. Tony Cai (2005),
|
|
One-sided confidence intervals in discrete distributions,
|
|
Journal of Statistical Planning and Inference 131, 63-88.
|
|
|
|
Agresti, A. and Coull, B. A. (1998). Approximate is better than
|
|
"exact" for interval estimation of binomial proportions. Amer.
|
|
Statist. 52 119-126.
|
|
|
|
Clopper, C. J. and Pearson, E. S. (1934). The use of confidence
|
|
or fiducial limits illustrated in the case of the binomial.
|
|
Biometrika 26 404-413.
|
|
|
|
[h5 Upper Bound on the Success Fraction]
|
|
|
|
static RealType find_upper_bound_on_p(
|
|
RealType trials,
|
|
RealType successes,
|
|
RealType alpha,
|
|
``['unspecified-type]`` method = clopper_pearson_exact_interval);
|
|
|
|
Returns an upper bound on the success fraction:
|
|
|
|
[variablelist
|
|
[[trials][The total number of trials conducted.]]
|
|
[[successes][The number of successes that occurred.]]
|
|
[[alpha][The largest acceptable probability that the true value of
|
|
the success fraction is [*greater than] the value returned.]]
|
|
[[method][An optional parameter that specifies the method to be used
|
|
to compute the interval. Refer to the documentation for
|
|
`find_upper_bound_on_p` above for the meaning of the
|
|
method options.]]
|
|
]
|
|
|
|
For example, if you observe /k/ successes from /n/ trials the
|
|
best estimate for the success fraction is simply ['k/n], but if you
|
|
want to be 95% sure that the true value is [*less than] some value,
|
|
['p[sub max]], then:
|
|
|
|
p``[sub max]`` = binomial_distribution<RealType>::find_upper_bound_on_p(n, k, 0.05);
|
|
|
|
[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
|
|
|
|
[note
|
|
In order to obtain a two sided bound on the success fraction, you
|
|
call both `find_lower_bound_on_p` *and* `find_upper_bound_on_p`
|
|
each with the same arguments.
|
|
|
|
If the desired risk level
|
|
that the true success fraction lies outside the bounds is [alpha],
|
|
then you pass [alpha]/2 to these functions.
|
|
|
|
So for example a two sided 95% confidence interval would be obtained
|
|
by passing [alpha] = 0.025 to each of the functions.
|
|
|
|
[link math_toolkit.stat_tut.weg.binom_eg.binom_conf See worked example.]
|
|
]
|
|
|
|
|
|
[h5 Estimating the Number of Trials Required for a Certain Number of Successes]
|
|
|
|
static RealType find_minimum_number_of_trials(
|
|
RealType k, // number of events
|
|
RealType p, // success fraction
|
|
RealType alpha); // probability threshold
|
|
|
|
This function estimates the minimum number of trials required to ensure that
|
|
more than k events is observed with a level of risk /alpha/ that k or
|
|
fewer events occur.
|
|
|
|
[variablelist
|
|
[[k][The number of success observed.]]
|
|
[[p][The probability of success for each trial.]]
|
|
[[alpha][The maximum acceptable probability that k events or fewer will be observed.]]
|
|
]
|
|
|
|
For example:
|
|
|
|
binomial_distribution<RealType>::find_number_of_trials(10, 0.5, 0.05);
|
|
|
|
Returns the smallest number of trials we must conduct to be 95% sure
|
|
of seeing 10 events that occur with frequency one half.
|
|
|
|
[h5 Estimating the Maximum Number of Trials to Ensure no more than a Certain Number of Successes]
|
|
|
|
static RealType find_maximum_number_of_trials(
|
|
RealType k, // number of events
|
|
RealType p, // success fraction
|
|
RealType alpha); // probability threshold
|
|
|
|
This function estimates the maximum number of trials we can conduct
|
|
to ensure that k successes or fewer are observed, with a risk /alpha/
|
|
that more than k occur.
|
|
|
|
[variablelist
|
|
[[k][The number of success observed.]]
|
|
[[p][The probability of success for each trial.]]
|
|
[[alpha][The maximum acceptable probability that more than k events will be observed.]]
|
|
]
|
|
|
|
For example:
|
|
|
|
binomial_distribution<RealType>::find_maximum_number_of_trials(0, 1e-6, 0.05);
|
|
|
|
Returns the largest number of trials we can conduct and still be 95% certain
|
|
of not observing any events that occur with one in a million frequency.
|
|
This is typically used in failure analysis.
|
|
|
|
[link math_toolkit.stat_tut.weg.binom_eg.binom_size_eg See Worked Example.]
|
|
|
|
[h4 Non-member Accessors]
|
|
|
|
All the [link math_toolkit.dist_ref.nmp usual non-member accessor functions]
|
|
that are generic to all distributions are supported: __usual_accessors.
|
|
|
|
The domain for the random variable /k/ is `0 <= k <= N`, otherwise a
|
|
__domain_error is returned.
|
|
|
|
It's worth taking a moment to define what these accessors actually mean in
|
|
the context of this distribution:
|
|
|
|
[table Meaning of the non-member accessors
|
|
[[Function][Meaning]]
|
|
[[__pdf]
|
|
[The probability of obtaining [*exactly k successes] from n trials
|
|
with success fraction p. For example:
|
|
|
|
`pdf(binomial(n, p), k)`]]
|
|
[[__cdf]
|
|
[The probability of obtaining [*k successes or fewer] from n trials
|
|
with success fraction p. For example:
|
|
|
|
`cdf(binomial(n, p), k)`]]
|
|
[[__ccdf]
|
|
[The probability of obtaining [*more than k successes] from n trials
|
|
with success fraction p. For example:
|
|
|
|
`cdf(complement(binomial(n, p), k))`]]
|
|
[[__quantile]
|
|
[Given a binomial distribution with ['n] trials, success fraction ['p] and probability ['P],
|
|
finds the largest number of successes ['k] whose CDF is less than ['P].
|
|
It is strongly recommended that you read the tutorial
|
|
[link math_toolkit.pol_tutorial.understand_dis_quant Understanding Quantiles of Discrete Distributions] before
|
|
using the quantile function.]]
|
|
[[__quantile_c]
|
|
[Given a binomial distribution with ['n] trials, success fraction ['p] and probability ['Q],
|
|
finds the smallest number of successes ['k] whose CDF is greater than ['1-Q].
|
|
It is strongly recommended that you read the tutorial
|
|
[link math_toolkit.pol_tutorial.understand_dis_quant Understanding Quantiles of Discrete Distributions] before
|
|
using the quantile function.]]
|
|
]
|
|
|
|
[h4 Examples]
|
|
|
|
Various [link math_toolkit.stat_tut.weg.binom_eg worked examples]
|
|
are available illustrating the use of the binomial distribution.
|
|
|
|
[h4 Accuracy]
|
|
|
|
This distribution is implemented using the
|
|
incomplete beta functions __ibeta and __ibetac,
|
|
please refer to these functions for information on accuracy.
|
|
|
|
[h4 Implementation]
|
|
|
|
In the following table /p/ is the probability that one trial will
|
|
be successful (the success fraction), /n/ is the number of trials,
|
|
/k/ is the number of successes, /p/ is the probability and /q = 1-p/.
|
|
|
|
[table
|
|
[[Function][Implementation Notes]]
|
|
[[pdf][Implementation is in terms of __ibeta_derivative: if [sub n]C[sub k ] is the binomial
|
|
coefficient of a and b, then we have:
|
|
|
|
[equation binomial_ref1]
|
|
|
|
Which can be evaluated as `ibeta_derivative(k+1, n-k+1, p) / (n+1)`
|
|
|
|
The function __ibeta_derivative is used here, since it has already
|
|
been optimised for the lowest possible error - indeed this is really
|
|
just a thin wrapper around part of the internals of the incomplete
|
|
beta function.
|
|
|
|
There are also various special cases: refer to the code for details.
|
|
]]
|
|
[[cdf][Using the relation:
|
|
|
|
``
|
|
p = I[sub 1-p](n - k, k + 1)
|
|
= 1 - I[sub p](k + 1, n - k)
|
|
= __ibetac(k + 1, n - k, p)``
|
|
|
|
There are also various special cases: refer to the code for details.
|
|
]]
|
|
[[cdf complement][Using the relation: q = __ibeta(k + 1, n - k, p)
|
|
|
|
There are also various special cases: refer to the code for details. ]]
|
|
[[quantile][Since the cdf is non-linear in variate /k/ none of the inverse
|
|
incomplete beta functions can be used here. Instead the quantile
|
|
is found numerically using a derivative free method
|
|
(__root_finding_TOMS748).]]
|
|
[[quantile from the complement][Found numerically as above.]]
|
|
[[mean][ `p * n` ]]
|
|
[[variance][ `p * n * (1-p)` ]]
|
|
[[mode][`floor(p * (n + 1))`]]
|
|
[[skewness][`(1 - 2 * p) / sqrt(n * p * (1 - p))`]]
|
|
[[kurtosis][`3 - (6 / n) + (1 / (n * p * (1 - p)))`]]
|
|
[[kurtosis excess][`(1 - 6 * p * q) / (n * p * q)`]]
|
|
[[parameter estimation][The member functions `find_upper_bound_on_p`
|
|
`find_lower_bound_on_p` and `find_number_of_trials` are
|
|
implemented in terms of the inverse incomplete beta functions
|
|
__ibetac_inv, __ibeta_inv, and __ibetac_invb respectively]]
|
|
]
|
|
|
|
[h4 References]
|
|
|
|
* [@http://mathworld.wolfram.com/BinomialDistribution.html Weisstein, Eric W. "Binomial Distribution." From MathWorld--A Wolfram Web Resource].
|
|
* [@http://en.wikipedia.org/wiki/Beta_distribution Wikipedia binomial distribution].
|
|
* [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm NIST Explorary Data Analysis].
|
|
|
|
[endsect] [/section:binomial_dist Binomial]
|
|
|
|
[/ binomial.qbk
|
|
Copyright 2006 John Maddock and Paul A. Bristow.
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt).
|
|
]
|