opda.utils module#

Utilities.

opda.utils.sort_by_first(*args)[source]#

Return the arrays sorted by the first array.

Parameters:
*argsarrays, required

The arrays to sort.

Returns:
arrays

The arrays sorted by the first array. Thus, the first array will be sorted and the other arrays will have their elements permuted the same way as the elements from the first array.

opda.utils.dkw_epsilon(n, confidence)[source]#

Return epsilon from the Dvoretzky-Kiefer-Wolfowitz inequaltiy.

The Dvoretzky-Kiefer-Wolfowitz inequality states that a confidence interval for the CDF is given by the empirical CDF plus or minus:

\[\epsilon = \sqrt{\frac{\log \frac{2}{\alpha}}{2n}}\]

Where \(1 - \alpha\) is the coverage.

Parameters:
npositive int, required

The number of samples.

confidencefloat from 0 to 1 inclusive, required

The desired confidence or coverage.

Returns:
non-negative float

The epsilon for the Dvoretzky-Kiefer-Wolfowitz inequality.

opda.utils.beta_equal_tailed_interval(a, b, coverage)[source]#

Return an interval containing coverage of the probability.

For the beta distribution with parameters a and b, return the equal-tailed interval that contains coverage of the probability mass.

Parameters:
afinite positive float or array of floats, required

The alpha parameter for the beta distribution.

bfinite positive float or array of floats, required

The beta parameter for the beta distribution.

coveragefloat or array of floats from 0 to 1 inclusive, required

The desired coverage for the returned intervals.

Returns:
pair of floats or arrays of floats from 0 to 1 inclusive

A pair of floats or arrays of floats with the shape determined by broadcasting a, b, and coverage together. The first returned value gives the lower bound and the second the upper bound for the equal-tailed intervals.

opda.utils.beta_highest_density_interval(a, b, coverage, *, atol=1e-10)[source]#

Return an interval containing coverage of the probability.

For the beta distribution with parameters a and b, return the shortest interval that contains coverage of the probability mass. Note that the highest density interval only exists if at least one of a or b is greater than 1.

Parameters:
afinite positive float or array of floats, required

The alpha parameter for the beta distribution.

bfinite positive float or array of floats, required

The beta parameter for the beta distribution.

coveragefloat or array of floats from 0 to 1 inclusive, required

The desired coverage for the returned intervals.

atolnon-negative float, optional

The absolute tolerance to use for stopping the iteration.

Returns:
pair of floats or arrays of floats from 0 to 1 inclusive

A pair of floats or arrays of floats with the shape determined by broadcasting a, b, and coverage together. The first returned value gives the lower bound and the second the upper bound for the intervals.

opda.utils.beta_equal_tailed_coverage(a, b, x)[source]#

Return the coverage of the smallest interval containing x.

For the beta distribution with parameters a and b, return the coverage of the smallest equal-tailed interval containing x. See the related function: beta_equal_tailed_interval().

Parameters:
afinite positive float or array of floats, required

The alpha parameter for the beta distribution.

bfinite positive float or array of floats, required

The beta parameter for the beta distribution.

xfloat or array of floats from 0 to 1 inclusive, required

The points defining the minimal equal-tailed intervals whose coverage to return.

Returns:
pair of floats or arrays of floats from 0 to 1 inclusive

A float or array of floats with shape determined by broadcasting a, b, and x together. The values represent the coverage of the minimal equal-tailed interval containing the corresponding value from x.

opda.utils.beta_highest_density_coverage(a, b, x, *, atol=1e-10)[source]#

Return the coverage of the smallest interval containing x.

For the beta distribution with parameters a and b, return the coverage of the smallest highest density interval containing x. Note that the highest density interval only exists if at least one of a or b is greater than 1. See the related function: beta_highest_density_interval().

Parameters:
afinite positive float or array of floats, required

The alpha parameter for the beta distribution.

bfinite positive float or array of floats, required

The beta parameter for the beta distribution.

xfloat or array of floats from 0 to 1 inclusive, required

The points defining the minimal intervals whose coverage to return.

atolnon-negative float, optional

The absolute tolerance to use for stopping the iteration.

Returns:
pair of floats or arrays of floats from 0 to 1 inclusive

A float or array of floats with shape determined by broadcasting a, b, and x together. The values represent the coverage of the minimal highest density interval containing the corresponding value from x.

opda.utils.binomial_confidence_interval(n_successes, n_total, confidence)[source]#

Return a confidence interval for the binomial distribution.

Given n_successes out of n_total, return an equal-tailed Clopper-Pearson confidence interval with coverage confidence.

Parameters:
n_successesnon-negative int or array of ints, required

An int or array of ints with each entry denoting the number of successes in a sample. Must be broadcastable with n_total.

n_totalpositive int or array of ints, required

An int or array of ints with each entry denoting the total number of observations in a sample. Must be broadcastable with n_successes.

confidencefloat or array of floats from 0 to 1 inclusive, required

A float or array of floats between zero and one denoting the desired confidence for each confidence interval. Must be broadcastable with n_successes broadcasted with n_total.

Returns:
pair of floats or arrays of floats from 0 to 1 inclusive

A possibly scalar array of floats representing the lower confidence bounds and a possibly scalar array of floats representing the upper confidence bounds.

Notes

The Clopper-Pearson interval [1] does not account for the binomial distribution’s discreteness. This lack of correction causes Clopper-Pearson intervals to be conservative. In addition, this function implements an equal-tailed version of the Clopper-Pearson interval which can be very conservative when the number of successes is zero or the total number of observations.

References

[1]

Clopper, C. and Pearson, E. S., “The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial” (1934). Biometrika. 26 (4): 404-413. doi:10.1093/biomet/26.4.404.

opda.utils.normal_pdf(xs)[source]#

Evaluate the PDF of the standard normal distribution.

Parameters:
xsfloat or array of floats, required

The points at which to evaluate the standard normal distribution’s probability density function.

Returns:
non-negative float or array of floats

The standard normal distribution’s probability density function evaluated at xs.

opda.utils.normal_cdf(xs)[source]#

Evaluate the CDF of the standard normal distribution.

Parameters:
xsfloat or array of floats, required

The points at which to evaluate the standard normal distribution’s cumulative distribution function.

Returns:
float or array of floats from 0 to 1 inclusive

The standard normal distribution’s cumulative distribution function evaluated at xs.

opda.utils.normal_ppf(qs)[source]#

Evaluate the PPF of the standard normal distribution.

Parameters:
qsfloat or array of floats from 0 to 1 inclusive, required

The points at which to evaluate the standard normal distribution’s quantile function.

Returns:
float or array of floats

The standard normal distribution’s quantile function evaluated at qs.