yat  0.11.3pre
Classes | Public Member Functions | List of all members
theplu::yat::statistics::ROC Class Reference

Reciever Operating Characteristic. More...

#include <yat/statistics/ROC.h>

Public Member Functions

 ROC (void)
 Default constructor.
 
void add (double value, bool target, double weight=1.0)
 Add a data value.
 
double area (void) const
 Area Under Curve, AUC.
 
unsigned int & minimum_size (void)
 threshold for p_value calculation
 
const unsigned int & minimum_size (void) const
 threshold for p_value calculation
 
double n (void) const
 number of samples
 
double n_neg (void) const
 number of negative samples
 
double n_pos (void) const
 number of positive samples
 
double p_left (void) const
 
double p_right (void) const
 One-sided P-value.
 
double p_value_one_sided (void) const
 
double p_value (void) const
 Two-sided p-value.
 
void remove (double value, bool target, double weight=1.0)
 remove a data value
 
void reset (void)
 Set everything to zero.
 

Detailed Description

Reciever Operating Characteristic.

As the area under an ROC curve is equivalent to Mann-Whitney U statistica, this class can be used to perform a Mann-Whitney U-test (aka Wilcoxon).

See Also
AUC

Member Function Documentation

void theplu::yat::statistics::ROC::add ( double  value,
bool  target,
double  weight = 1.0 
)

Add a data value.

Parameters
valuedata value
targettrue if value belongs to class positive
weightindicating how important the data point is. A zero weight implies the data point is ignored. A negative weight should be understood as removing a data point and thus typically only makes sense if there is a previously added data point with same value and target.
double theplu::yat::statistics::ROC::area ( void  ) const

Area Under Curve, AUC.

See Also
AUC for how the area is calculated
Returns
Area under curve.
unsigned int& theplu::yat::statistics::ROC::minimum_size ( void  )

threshold for p_value calculation

Function can used to change the minimum_size.

Returns
reference to threshold minimum size
const unsigned int& theplu::yat::statistics::ROC::minimum_size ( void  ) const

threshold for p_value calculation

Threshold deciding whether p-value is computed using exact method or a Gaussian approximation. If both number of positive samples, n_pos(void), and number of negative samples, n_neg(void), are smaller than minimum_size the exact method is used.

See Also
p_value
Returns
const reference to minimum_size
double theplu::yat::statistics::ROC::n ( void  ) const

number of samples

Returns
sum of weights
double theplu::yat::statistics::ROC::n_neg ( void  ) const

number of negative samples

Returns
sum of weights with negative target
double theplu::yat::statistics::ROC::n_pos ( void  ) const

number of positive samples

Returns
sum of weights with positive target
double theplu::yat::statistics::ROC::p_left ( void  ) const

Calculates the probability to get this area (or less).

See Also
p_right for more details
double theplu::yat::statistics::ROC::p_right ( void  ) const

One-sided P-value.

Calculates the one-sided p-value, i.e., probability to get this area (or greater) given that there is no difference between the two classes.

Exact method: In the exact method the function goes through all permutations and counts what fraction for which the area is greater (or equal) than area in original permutation. In case all non-zero weights are not equal, iterating through all permutations is not sufficient so algorithm goes through all combinations instead which quickly becomes a large number (N!).

Large-sample Approximation: When many data points are available, see minimum_size(), a Gaussian approximation is used and the p-value is calculated as

\[ P = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^z \exp{\left(-\frac{t^2}{2}\right)} dt \]

where

\[ z = \frac{\textrm{area} - 0.5 - 0.5/(n^+ \cdot n^-)}{s} \]

and

\[ s^2 = \frac{n+1+\sum \left(n_x \cdot (n_x^2-1)\right)} {12\cdot n^+\cdot n^-} \]

where sum runs over different data values (of ties) and $ n_x $ is number data points with that value. The sum is a correction term for ties and is zero if there are no ties.

The number of samples in a group, $ n^+ $, is calculated as $ n = (\sum w)^2 / \sum w^2 $

Returns
$ P(a \ge \textrm{area}) $
double theplu::yat::statistics::ROC::p_value ( void  ) const

Two-sided p-value.

Calculates the probability to get an area, a, equal or more extreme than area

\[ P(a \ge \textrm{max}(\textrm{area},1-\textrm{area})) + P(a \le \textrm{min}(\textrm{area}, 1-\textrm{area})) \]

If there are no ties, distribution of a is symmetric, so if area is greater than 0.5, this boils down to $ P = 2*P(a \ge \textrm{area}) = 2*P_\textrm{one-sided}$.

Returns
two-sided p-value
See Also
p_right
double theplu::yat::statistics::ROC::p_value_one_sided ( void  ) const
Deprecated:
Provided for backward compatibility with 0.10 API. Use p_right() instead.
void theplu::yat::statistics::ROC::remove ( double  value,
bool  target,
double  weight = 1.0 
)

remove a data value

A data point with identical value, target, and weight must have beed added prior calling this function; else an exception is thrown.

Since
New in yat 0.9

The documentation for this class was generated from the following file:

Generated on Sat May 24 2014 03:33:06 for yat by  doxygen 1.8.2