Reciever Operating Characteristic. More...

#include </scratch/bob/jari/tmp/pristine/yat-0.10.x/yat/statistics/ROC.h>

Classes
struct	Weights

Public Member Functions
	ROC (void)
	Default constructor.
void	add (double value, bool target, double weight=1.0)
	Add a data value.
double	area (void)
	Area Under Curve, AUC.
unsigned int &	minimum_size (void)
	threshold for p_value calculation
const unsigned int &	minimum_size (void) const
	threshold for p_value calculation
double	n (void) const
	number of samples
double	n_neg (void) const
	number of negative samples
double	n_pos (void) const
	number of positive samples
double	p_value_one_sided (void) const
	One-sided P-value.
double	p_value (void) const
	Two-sided p-value.
void	remove (double value, bool target, double weight=1.0)
	remove a data value
void	reset (void)
	Set everything to zero.

Detailed Description

Reciever Operating Characteristic.

As the area under an ROC curve is equivalent to Mann-Whitney U statistica, this class can be used to perform a Mann-Whitney U-test (aka Wilcoxon).

See also:: AUC

Member Function Documentation

void theplu::yat::statistics::ROC::add	(	double	value,
		bool	target,
		double	weight = `1.0`
	)

Add a data value.

Parameters:

value	data value
target	`true` if value belongs to class positive
weight	indicating how important the data point is. A zero weight implies the data point is ignored. A negative weight should be understood as removing a data point and thus typically only makes sense if there is a previously added data point with same value and target.

double theplu::yat::statistics::ROC::area ( void )

Area Under Curve, AUC.

See also:: AUC for how the area is calculated

Returns:: Area under curve.

unsigned int& theplu::yat::statistics::ROC::minimum_size ( void )

threshold for p_value calculation

Function can used to change the minimum_size.

Returns:: reference to threshold minimum size

const unsigned int& theplu::yat::statistics::ROC::minimum_size ( void ) const

threshold for p_value calculation

Threshold deciding whether p-value is computed using exact method or a Gaussian approximation. If both number of positive samples, n_pos(void), and number of negative samples, n_neg(void), are smaller than minimum_size the exact method is used.

See also:: p_value

Returns:: const reference to minimum_size

double theplu::yat::statistics::ROC::n ( void ) const

number of samples

Returns:: sum of weights

double theplu::yat::statistics::ROC::n_neg ( void ) const

number of negative samples

Returns:: sum of weights with negative target

double theplu::yat::statistics::ROC::n_pos ( void ) const

number of positive samples

Returns:: sum of weights with positive target

double theplu::yat::statistics::ROC::p_value ( void ) const

Two-sided p-value.

  Calculates the probability to get an area, \c a, equal or more
  extreme than \c area

$P(a \ge \textrm{max}(\textrm{area},1-\textrm{area})) + P(a \le \textrm{min}(\textrm{area}, 1-\textrm{area}))$

  If there are no ties, distribution of \a a is symmetric, so if
  area is greater than 0.5, this boils down to \form#342@_fakenl.

  \return two-sided p-value

  \see p_value_one_sided

double theplu::yat::statistics::ROC::p_value_one_sided ( void ) const

One-sided P-value.

  Calculates the one-sided p-value, i.e., probability to get this
  area (or greater) given that there is no difference
  between the two classes.

  \b Exact \b method: In the exact method the function goes
  through all permutations and counts what fraction for which the
  area is greater (or equal) than area in original
  permutation. In case all non-zero weights are not equal,
  iterating through all permutations is not sufficient so
  algorithm goes through all combinations instead which quickly
  becomes a large number (N!).

  \b Large-sample \b Approximation: When many data points are
  available, see minimum_size(), a Gaussian approximation is used
  and the p-value is calculated as

$P = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^z \exp{\left(-\frac{t^2}{2}\right)} dt$

  where

$z = \frac{\textrm{area} - 0.5 - 0.5/(n^+ \cdot n^-)}{s}$

and

$s^2 = \frac{n+1+\sum \left(n_x \cdot (n_x^2-1)\right)} {12\cdot n^+\cdot n^-}$

  where sum runs over different data values (of ties) and \form#337@_fakenl is number data points with that value. The sum is a
  correction term for ties and is zero if there are no ties.

  The number of samples in a group, \form#338, is calculated as

$n = (\sum w)^2 / \sum w^2$

  \return  \form#340

void theplu::yat::statistics::ROC::remove	(	double	value,
		bool	target,
		double	weight = `1.0`
	)

remove a data value

A data point with identical value, target, and weight must have beed added prior calling this function; else an exception is thrown.

Since:: New in yat 0.9

The documentation for this class was generated from the following file:

/scratch/bob/jari/tmp/pristine/yat-0.10.x/yat/statistics/ROC.h

Classes

Public Member Functions

Detailed Description

Member Function Documentation