yat  0.10.4pre
Classes | Public Member Functions
theplu::yat::statistics::ROC Class Reference

Reciever Operating Characteristic. More...

#include </scratch/bob/jari/tmp/pristine/yat-0.10.x/yat/statistics/ROC.h>

List of all members.

Classes

struct  Weights

Public Member Functions

 ROC (void)
 Default constructor.
void add (double value, bool target, double weight=1.0)
 Add a data value.
double area (void)
 Area Under Curve, AUC.
unsigned int & minimum_size (void)
 threshold for p_value calculation
const unsigned int & minimum_size (void) const
 threshold for p_value calculation
double n (void) const
 number of samples
double n_neg (void) const
 number of negative samples
double n_pos (void) const
 number of positive samples
double p_value_one_sided (void) const
 One-sided P-value.
double p_value (void) const
 Two-sided p-value.
void remove (double value, bool target, double weight=1.0)
 remove a data value
void reset (void)
 Set everything to zero.

Detailed Description

Reciever Operating Characteristic.

As the area under an ROC curve is equivalent to Mann-Whitney U statistica, this class can be used to perform a Mann-Whitney U-test (aka Wilcoxon).

See also:
AUC

Member Function Documentation

void theplu::yat::statistics::ROC::add ( double  value,
bool  target,
double  weight = 1.0 
)

Add a data value.

Parameters:
valuedata value
targettrue if value belongs to class positive
weightindicating how important the data point is. A zero weight implies the data point is ignored. A negative weight should be understood as removing a data point and thus typically only makes sense if there is a previously added data point with same value and target.
double theplu::yat::statistics::ROC::area ( void  )

Area Under Curve, AUC.

See also:
AUC for how the area is calculated
Returns:
Area under curve.
unsigned int& theplu::yat::statistics::ROC::minimum_size ( void  )

threshold for p_value calculation

Function can used to change the minimum_size.

Returns:
reference to threshold minimum size
const unsigned int& theplu::yat::statistics::ROC::minimum_size ( void  ) const

threshold for p_value calculation

Threshold deciding whether p-value is computed using exact method or a Gaussian approximation. If both number of positive samples, n_pos(void), and number of negative samples, n_neg(void), are smaller than minimum_size the exact method is used.

See also:
p_value
Returns:
const reference to minimum_size
double theplu::yat::statistics::ROC::n ( void  ) const

number of samples

Returns:
sum of weights
double theplu::yat::statistics::ROC::n_neg ( void  ) const

number of negative samples

Returns:
sum of weights with negative target
double theplu::yat::statistics::ROC::n_pos ( void  ) const

number of positive samples

Returns:
sum of weights with positive target
double theplu::yat::statistics::ROC::p_value ( void  ) const

Two-sided p-value.

  Calculates the probability to get an area, \c a, equal or more
  extreme than \c area

\[ P(a \ge \textrm{max}(\textrm{area},1-\textrm{area})) + P(a \le \textrm{min}(\textrm{area}, 1-\textrm{area})) \]

  If there are no ties, distribution of \a a is symmetric, so if
  area is greater than 0.5, this boils down to \form#342@_fakenl.

  \return two-sided p-value

  \see p_value_one_sided
double theplu::yat::statistics::ROC::p_value_one_sided ( void  ) const

One-sided P-value.

  Calculates the one-sided p-value, i.e., probability to get this
  area (or greater) given that there is no difference
  between the two classes.

  \b Exact \b method: In the exact method the function goes
  through all permutations and counts what fraction for which the
  area is greater (or equal) than area in original
  permutation. In case all non-zero weights are not equal,
  iterating through all permutations is not sufficient so
  algorithm goes through all combinations instead which quickly
  becomes a large number (N!).

  \b Large-sample \b Approximation: When many data points are
  available, see minimum_size(), a Gaussian approximation is used
  and the p-value is calculated as

\[ P = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^z \exp{\left(-\frac{t^2}{2}\right)} dt \]

  where

\[ z = \frac{\textrm{area} - 0.5 - 0.5/(n^+ \cdot n^-)}{s} \]

  and

\[ s^2 = \frac{n+1+\sum \left(n_x \cdot (n_x^2-1)\right)} {12\cdot n^+\cdot n^-} \]

  where sum runs over different data values (of ties) and \form#337@_fakenl is number data points with that value. The sum is a
  correction term for ties and is zero if there are no ties.

  The number of samples in a group, \form#338, is calculated as

$ n = (\sum w)^2 / \sum w^2 $

  \return  \form#340
void theplu::yat::statistics::ROC::remove ( double  value,
bool  target,
double  weight = 1.0 
)

remove a data value

A data point with identical value, target, and weight must have beed added prior calling this function; else an exception is thrown.

Since:
New in yat 0.9

The documentation for this class was generated from the following file:

Generated on Mon Nov 11 2013 09:41:45 for yat by  doxygen 1.8.1