svndigest - svndigest

doc/Statistics.doxygen

: Code
: Comments
: Other

Rev	Date	Author	Line
1109	19 Feb 08	peter	1	// $Id$
1109	19 Feb 08	peter	2	//
1109	19 Feb 08	peter	3	// Copyright (C) 2005 Peter Johansson
2119	12 Dec 09	peter	4	// Copyright (C) 2006 Jari Häkkinen, Peter Johansson, Markus Ringnér
4359	23 Aug 23	peter	5	// Copyright (C) 2007 Peter Johansson
4359	23 Aug 23	peter	6	// Copyright (C) 2008 Jari Häkkinen, Peter Johansson
2787	23 Jul 12	peter	7	// Copyright (C) 2012 Peter Johansson
1109	19 Feb 08	peter	8	//
1437	25 Aug 08	peter	9	// This file is part of the yat library, http://dev.thep.lu.se/yat
1109	19 Feb 08	peter	10	//
1109	19 Feb 08	peter	11	// The yat library is free software; you can redistribute it and/or
1109	19 Feb 08	peter	12	// modify it under the terms of the GNU General Public License as
1486	09 Sep 08	jari	13	// published by the Free Software Foundation; either version 3 of the
1109	19 Feb 08	peter	14	// License, or (at your option) any later version.
1109	19 Feb 08	peter	15	//
1109	19 Feb 08	peter	16	// The yat library is distributed in the hope that it will be useful,
1109	19 Feb 08	peter	17	// but WITHOUT ANY WARRANTY; without even the implied warranty of
1109	19 Feb 08	peter	18	// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
1109	19 Feb 08	peter	19	// General Public License for more details.
1109	19 Feb 08	peter	20	//
1109	19 Feb 08	peter	21	// You should have received a copy of the GNU General Public License
1487	10 Sep 08	jari	22	// along with yat. If not, see <http://www.gnu.org/licenses/>.
338	03 Jun 05	peter	23
675	10 Oct 06	jari	24
1109	19 Feb 08	peter	25	/**
1109	19 Feb 08	peter	26	\page weighted_statistics Weighted Statistics
494	10 Jan 06	peter	27
1109	19 Feb 08	peter	28	\section Introduction
494	10 Jan 06	peter	29	There are several different reasons why a statistical analysis needs
494	10 Jan 06	peter	30	to adjust for weighting. In literature reasons are mainly diveded in
494	10 Jan 06	peter	31	to groups.
338	03 Jun 05	peter	32
494	10 Jan 06	peter	33	The first group is when some of the measurements are known to be more
586	19 Jun 06	peter	34	precise than others. The more precise a measurement is, the larger
494	10 Jan 06	peter	35	weight it is given. The simplest case is when the weight are given
494	10 Jan 06	peter	36	before the measurements and they can be treated as deterministic. It
494	10 Jan 06	peter	37	becomes more complicated when the weight can be determined not until
494	10 Jan 06	peter	38	afterwards, and even more complicated if the weight depends on the
494	10 Jan 06	peter	39	value of the observable.
338	03 Jun 05	peter	40
494	10 Jan 06	peter	41	The second group of situations is when calculating averages over one
494	10 Jan 06	peter	42	distribution and sampling from another distribution. Compensating for
494	10 Jan 06	peter	43	this discrepency weights are introduced to the analysis. A simple
494	10 Jan 06	peter	44	example may be that we are interviewing people but for economical
494	10 Jan 06	peter	45	reasons we choose to interview more people from the city than from the
494	10 Jan 06	peter	46	countryside. When summarizing the statistics the answers from the city
494	10 Jan 06	peter	47	are given a smaller weight. In this example we are choosing the
494	10 Jan 06	peter	48	proportions of people from countryside and people from city being
494	10 Jan 06	peter	49	intervied. Hence, we can determine the weights before and consider
494	10 Jan 06	peter	50	them to be deterministic. In other situations the proportions are not
494	10 Jan 06	peter	51	deterministic, but rather a result from the sampling and the weights
494	10 Jan 06	peter	52	must be treated as stochastic and only in rare situations the weights
494	10 Jan 06	peter	53	can be treated as independent of the observable.
338	03 Jun 05	peter	54
586	19 Jun 06	peter	55	Since there are various origins for a weight occuring in a statistical
586	19 Jun 06	peter	56	analysis, there are various ways to treat the weights and in general
494	10 Jan 06	peter	57	the analysis should be tailored to treat the weights correctly. We
494	10 Jan 06	peter	58	have not chosen one situation for our implementations, so see specific
494	10 Jan 06	peter	59	function documentation for what assumtions are made. Though, common
744	10 Feb 07	peter	60	for implementations are the following:
1109	19 Feb 08	peter	61
1109	19 Feb 08	peter	62	- Setting all weights to unity yields the same result as the
494	10 Jan 06	peter	63	non-weighted version.
1109	19 Feb 08	peter	64	- Rescaling the weights does not change any function.
1109	19 Feb 08	peter	65	- Setting a weight to zero is equivalent to removing the data point.
1109	19 Feb 08	peter	66
1639	11 Dec 08	peter	67	The last point implies that a data point with zero weight is ignored
1639	11 Dec 08	peter	68	also when the value is NaN.
494	10 Jan 06	peter	69	An important case is when weights are binary (either 1 or 0). Then we
586	19 Jun 06	peter	70	get the same result using the weighted version as using the data with
494	10 Jan 06	peter	71	weight not equal to zero and the non-weighted version. Hence, using
494	10 Jan 06	peter	72	binary weights and the weighted version missing values can be treated
494	10 Jan 06	peter	73	in a proper way.
338	03 Jun 05	peter	74
1109	19 Feb 08	peter	75	\section AveragerWeighted
338	03 Jun 05	peter	76
338	03 Jun 05	peter	77
338	03 Jun 05	peter	78
1109	19 Feb 08	peter	79	\subsection Mean
338	03 Jun 05	peter	80
494	10 Jan 06	peter	81	For any situation the weight is always designed so the weighted mean
1109	19 Feb 08	peter	82	is calculated as \f$ m=\frac{\sum w_ix_i}{\sum w_i} \f$, which obviously
494	10 Jan 06	peter	83	fulfills the conditions above.
338	03 Jun 05	peter	84
1109	19 Feb 08	peter	85
1109	19 Feb 08	peter	86
494	10 Jan 06	peter	87	In the case of varying measurement error, it could be motivated that
1109	19 Feb 08	peter	88	the weight shall be \f$ w_i = 1/\sigma_i^2 \f$. We assume measurement error
494	10 Jan 06	peter	89	to be Gaussian and the likelihood to get our measurements is
1109	19 Feb 08	peter	90	\f$ L(m)=\prod
1109	19 Feb 08	peter	91	(2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}} \f$. We
1109	19 Feb 08	peter	92	maximize the likelihood by taking the derivity with respect to \f$ m \f$ on
1109	19 Feb 08	peter	93	the logarithm of the likelihood \f$ \frac{d\ln L(m)}{dm}=\sum
1109	19 Feb 08	peter	94	\frac{x_i-m}{\sigma_i^2} \f$. Hence, the Maximum Likelihood method yields
1109	19 Feb 08	peter	95	the estimator \f$ m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2} \f$.
338	03 Jun 05	peter	96
338	03 Jun 05	peter	97
1109	19 Feb 08	peter	98	\subsection Variance
494	10 Jan 06	peter	99	In case of varying variance, there is no point estimating a variance
494	10 Jan 06	peter	100	since it is different for each data point.
338	03 Jun 05	peter	101
494	10 Jan 06	peter	102	Instead we look at the case when we want to estimate the variance over
1109	19 Feb 08	peter	103	\f$f\f$ but are sampling from \f$ f' \f$. For the mean of an observable \f$ O \f$ we
1109	19 Feb 08	peter	104	have \f$ \widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum
1109	19 Feb 08	peter	105	w_i} \f$. Hence, an estimator of the variance of \f$ X \f$ is
1109	19 Feb 08	peter	106
1109	19 Feb 08	peter	107	\f$
2753	24 Jun 12	peter	108	s^2 = <X^2>-<X>^2=
2753	24 Jun 12	peter	109	\f$
1109	19 Feb 08	peter	110
1109	19 Feb 08	peter	111	\f$
1109	19 Feb 08	peter	112	= \frac{\sum w_ix_i^2}{\sum w_i}-\frac{(\sum w_ix_i)^2}{(\sum w_i)^2}=
1109	19 Feb 08	peter	113	\f$
1109	19 Feb 08	peter	114
1109	19 Feb 08	peter	115	\f$
1109	19 Feb 08	peter	116	= \frac{\sum w_i(x_i^2-m^2)}{\sum w_i}=
1109	19 Feb 08	peter	117	\f$
1109	19 Feb 08	peter	118
1109	19 Feb 08	peter	119	\f$
1109	19 Feb 08	peter	120	= \frac{\sum w_i(x_i^2-2mx_i+m^2)}{\sum w_i}=
1109	19 Feb 08	peter	121	\f$
1109	19 Feb 08	peter	122
1109	19 Feb 08	peter	123	\f$
1109	19 Feb 08	peter	124	= \frac{\sum w_i(x_i-m)^2}{\sum w_i}
1109	19 Feb 08	peter	125	\f$
1109	19 Feb 08	peter	126
2752	24 Jun 12	peter	127	This estimator is invariant under rescaling and
494	10 Jan 06	peter	128	having a weight equal to zero is equivalent to removing the data
2752	24 Jun 12	peter	129	point. Having all weights equal to unity results in \f$ s^2=\frac{\sum
1109	19 Feb 08	peter	130	(x_i-m)^2}{N} \f$, which is the same as returned from Averager. Hence,
494	10 Jan 06	peter	131	this estimator is slightly biased, but still very efficient.
338	03 Jun 05	peter	132
1109	19 Feb 08	peter	133	\subsection standard_error Standard Error
494	10 Jan 06	peter	134	The standard error squared is equal to the expexted squared error of
1109	19 Feb 08	peter	135	the estimation of \f$m\f$. The squared error consists of two parts, the
1109	19 Feb 08	peter	136	variance of the estimator and the squared bias:
1109	19 Feb 08	peter	137
1109	19 Feb 08	peter	138	\f$
1109	19 Feb 08	peter	139	<m-\mu>^2=<m-<m>+<m>-\mu>^2=
1109	19 Feb 08	peter	140	\f$
1109	19 Feb 08	peter	141	\f$
1109	19 Feb 08	peter	142	<m-<m>>^2+(<m>-\mu)^2
2753	24 Jun 12	peter	143	\f$.
1109	19 Feb 08	peter	144
1109	19 Feb 08	peter	145	In the case when weights are included in analysis due to varying
1109	19 Feb 08	peter	146	measurement errors and the weights can be treated as deterministic, we
1109	19 Feb 08	peter	147	have
1109	19 Feb 08	peter	148
1109	19 Feb 08	peter	149	\f$
492	09 Jan 06	peter	150	Var(m)=\frac{\sum w_i^2\sigma_i^2}{\left(\sum w_i\right)^2}=
1109	19 Feb 08	peter	151	\f$
1109	19 Feb 08	peter	152	\f$
494	10 Jan 06	peter	153	\frac{\sum w_i^2\frac{\sigma_0^2}{w_i}}{\left(\sum w_i\right)^2}=
1109	19 Feb 08	peter	154	\f$
1109	19 Feb 08	peter	155	\f$
492	09 Jan 06	peter	156	\frac{\sigma_0^2}{\sum w_i},
1109	19 Feb 08	peter	157	\f$
1109	19 Feb 08	peter	158
1109	19 Feb 08	peter	159	where we need to estimate \f$ \sigma_0^2 \f$. Again we have the likelihood
1109	19 Feb 08	peter	160
1109	19 Feb 08	peter	161	\f$
1109	19 Feb 08	peter	162	L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{(-\frac{w_i(x-m)^2}{2\sigma_0^2})}
1109	19 Feb 08	peter	163	\f$
2753	24 Jun 12	peter	164	and taking the derivity with respect to
2753	24 Jun 12	peter	165	\f$\sigma_o^2\f$,
1109	19 Feb 08	peter	166
1109	19 Feb 08	peter	167	\f$
1109	19 Feb 08	peter	168	\frac{d\ln L}{d\sigma_i^2}=
1109	19 Feb 08	peter	169	\f$
1109	19 Feb 08	peter	170	\f$
1109	19 Feb 08	peter	171	\sum -\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2}
2753	24 Jun 12	peter	172	\f$
1109	19 Feb 08	peter	173
1109	19 Feb 08	peter	174	which
1109	19 Feb 08	peter	175	yields an estimator \f$ \sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2 \f$. This
494	10 Jan 06	peter	176	estimator is not ignoring weights equal to zero, because deviation is
494	10 Jan 06	peter	177	most often smaller than the expected infinity. Therefore, we modify
1109	19 Feb 08	peter	178	the expression as follows \f$\sigma_0^2=\frac{\sum w_i^2}{\left(\sum
1109	19 Feb 08	peter	179	w_i\right)^2}\sum w_i(x-m)^2\f$ and we get the following estimator of
1109	19 Feb 08	peter	180	the variance of the mean \f$\sigma_0^2=\frac{\sum w_i^2}{\left(\sum
1109	19 Feb 08	peter	181	w_i\right)^3}\sum w_i(x-m)^2\f$. This estimator fulfills the conditions
494	10 Jan 06	peter	182	above: adding a weight zero does not change it: rescaling the weights
494	10 Jan 06	peter	183	does not change it, and setting all weights to unity yields the same
494	10 Jan 06	peter	184	expression as in the non-weighted case.
338	03 Jun 05	peter	185
494	10 Jan 06	peter	186	In a case when it is not a good approximation to treat the weights as
494	10 Jan 06	peter	187	deterministic, there are two ways to get a better estimation. The
1109	19 Feb 08	peter	188	first one is to linearize the expression \f$\left<\frac{\sum
1109	19 Feb 08	peter	189	w_ix_i}{\sum w_i}\right>\f$. The second method when the situation is
494	10 Jan 06	peter	190	more complicated is to estimate the standard error using a
494	10 Jan 06	peter	191	bootstrapping method.
338	03 Jun 05	peter	192
1109	19 Feb 08	peter	193	\section AveragerPairWeighted
1109	19 Feb 08	peter	194	Here data points come in pairs (x,y). We are sampling from \f$f'_{XY}\f$
1109	19 Feb 08	peter	195	but want to measure from \f$f_{XY}\f$. To compensate for this decrepency,
1109	19 Feb 08	peter	196	averages of \f$g(x,y)\f$ are taken as \f$\sum \frac{f}{f'}g(x,y)\f$. Even
1109	19 Feb 08	peter	197	though, \f$X\f$ and \f$Y\f$ are not independent \f$(f_{XY}\neq f_Xf_Y)\f$ we
1109	19 Feb 08	peter	198	assume that we can factorize the ratio and get \f$\frac{\sum
1109	19 Feb 08	peter	199	w_xw_yg(x,y)}{\sum w_xw_y}\f$
1109	19 Feb 08	peter	200	\subsection Covariance
494	10 Jan 06	peter	201	Following the variance calculations for AveragerWeighted we have
1109	19 Feb 08	peter	202	\f$Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}\f$ where
1109	19 Feb 08	peter	203	\f$m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}\f$
338	03 Jun 05	peter	204
1109	19 Feb 08	peter	205	\subsection Correlation
338	03 Jun 05	peter	206
2753	24 Jun 12	peter	207	As the mean is estimated as
2753	24 Jun 12	peter	208	\f$
2753	24 Jun 12	peter	209	m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}
1109	19 Feb 08	peter	210	\f$,
2753	24 Jun 12	peter	211	the variance is estimated as
2753	24 Jun 12	peter	212	\f$
2753	24 Jun 12	peter	213	\sigma_x^2=\frac{\sum w_xw_y(x-m_x)^2}{\sum w_xw_y}
2753	24 Jun 12	peter	214	\f$.
1109	19 Feb 08	peter	215	As in the non-weighted case we define the correlation to be the ratio
1109	19 Feb 08	peter	216	between the covariance and geometrical average of the variances
338	03 Jun 05	peter	217
1109	19 Feb 08	peter	218	\f$
1109	19 Feb 08	peter	219	\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum
1109	19 Feb 08	peter	220	w_xw_y(y-m_y)^2}}
1109	19 Feb 08	peter	221	\f$.
338	03 Jun 05	peter	222
1109	19 Feb 08	peter	223
492	09 Jan 06	peter	224	This expression fulfills the following
1109	19 Feb 08	peter	225	- Having N equal weights the expression reduces to the non-weighted expression.
1109	19 Feb 08	peter	226	- Adding a pair of data, in which one weight is zero is equivalent
494	10 Jan 06	peter	227	to ignoring the data pair.
1109	19 Feb 08	peter	228	- Correlation is equal to unity if and only if \f$x\f$ is equal to
1109	19 Feb 08	peter	229	\f$y\f$. Otherwise the correlation is between -1 and 1.
338	03 Jun 05	peter	230
1109	19 Feb 08	peter	231	\section Score
338	03 Jun 05	peter	232
1109	19 Feb 08	peter	233	\subsection Pearson
338	03 Jun 05	peter	234
1109	19 Feb 08	peter	235	\f$\frac{\sum w(x-m_x)(y-m_y)}{\sqrt{\sum w(x-m_x)^2\sum w(y-m_y)^2}}\f$.
338	03 Jun 05	peter	236
492	09 Jan 06	peter	237	See AveragerPairWeighted correlation.
338	03 Jun 05	peter	238
1109	19 Feb 08	peter	239	\subsection ROC
338	03 Jun 05	peter	240
494	10 Jan 06	peter	241	An interpretation of the ROC curve area is the probability that if we
1109	19 Feb 08	peter	242	take one sample from class \f$+\f$ and one sample from class \f$-\f$, what is
1109	19 Feb 08	peter	243	the probability that the sample from class \f$+\f$ has greater value. The
494	10 Jan 06	peter	244	ROC curve area calculates the ratio of pairs fulfilling this
338	03 Jun 05	peter	245
1109	19 Feb 08	peter	246	\f$
492	09 Jan 06	peter	247	\frac{\sum_{\{i,j\}:x^-_i<x^+_j}1}{\sum_{i,j}1}.
1109	19 Feb 08	peter	248	\f$
338	03 Jun 05	peter	249
2695	28 Feb 12	peter	250	A geometrical interpretation is to have a number of squares where
494	10 Jan 06	peter	251	each square correspond to a pair of samples. The ROC curve follows the
1109	19 Feb 08	peter	252	border between pairs in which the samples from class \f$+\f$ has a greater
494	10 Jan 06	peter	253	value and pairs in which this is not fulfilled. The ROC curve area is
494	10 Jan 06	peter	254	the area of those latter squares and a natural extension is to weight
494	10 Jan 06	peter	255	each pair with its two weights and consequently the weighted ROC curve
494	10 Jan 06	peter	256	area becomes
338	03 Jun 05	peter	257
1109	19 Feb 08	peter	258	\f$
492	09 Jan 06	peter	259	\frac{\sum_{\{i,j\}:x^-_i<x^+_j}w^-_iw^+_j}{\sum_{i,j}w^-_iw^+_j}
1109	19 Feb 08	peter	260	\f$
338	03 Jun 05	peter	261
494	10 Jan 06	peter	262	This expression is invariant under a rescaling of weight. Adding a
494	10 Jan 06	peter	263	data value with weight zero adds nothing to the exprssion, and having
494	10 Jan 06	peter	264	all weight equal to unity yields the non-weighted ROC curve area.
338	03 Jun 05	peter	265
1109	19 Feb 08	peter	266	\subsection tScore
338	03 Jun 05	peter	267
1109	19 Feb 08	peter	268	Assume that \f$x\f$ and \f$y\f$ originate from the same distribution
1109	19 Feb 08	peter	269	\f$N(\mu,\sigma_i^2)\f$ where \f$\sigma_i^2=\frac{\sigma_0^2}{w_i}\f$. We then
1109	19 Feb 08	peter	270	estimate \f$\sigma_0^2\f$ as
1109	19 Feb 08	peter	271	\f$
492	09 Jan 06	peter	272	\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2}
2753	24 Jun 12	peter	273	{\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+
492	09 Jan 06	peter	274	\frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2}
1109	19 Feb 08	peter	275	\f$
492	09 Jan 06	peter	276	The variance of difference of the means becomes
1109	19 Feb 08	peter	277	\f$
494	10 Jan 06	peter	278	Var(m_x)+Var(m_y)=\\\frac{\sum w_i^2Var(x_i)}{\left(\sum
494	10 Jan 06	peter	279	w_i\right)^2}+\frac{\sum w_i^2Var(y_i)}{\left(\sum w_i\right)^2}=
492	09 Jan 06	peter	280	\frac{\sigma_0^2}{\sum w_i}+\frac{\sigma_0^2}{\sum w_i},
1109	19 Feb 08	peter	281	\f$
492	09 Jan 06	peter	282	and consequently the t-score becomes
1109	19 Feb 08	peter	283	\f$
492	09 Jan 06	peter	284	\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2}
2753	24 Jun 12	peter	285	{\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+
492	09 Jan 06	peter	286	\frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2}
492	09 Jan 06	peter	287	\left(\frac{1}{\sum w_i}+\frac{1}{\sum w_i}\right),
1109	19 Feb 08	peter	288	\f$
338	03 Jun 05	peter	289
1109	19 Feb 08	peter	290	For a \f$w_i=w\f$ we this expression get condensed down to
1109	19 Feb 08	peter	291	\f$
492	09 Jan 06	peter	292	\frac{w\sum (x-m_x)^2+w\sum (y-m_y)^2}
2753	24 Jun 12	peter	293	{n_x+n_y-2}
492	09 Jan 06	peter	294	\left(\frac{1}{wn_x}+\frac{1}{wn_y}\right),
1109	19 Feb 08	peter	295	\f$
2753	24 Jun 12	peter	296	in other words the good old expression as for non-weighted.
338	03 Jun 05	peter	297
1109	19 Feb 08	peter	298	\subsection FoldChange
494	10 Jan 06	peter	299	Fold-Change is simply the difference between the weighted mean of the
1109	19 Feb 08	peter	300	two groups \f$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}\f$
338	03 Jun 05	peter	301
1109	19 Feb 08	peter	302	\subsection WilcoxonFoldChange
1109	19 Feb 08	peter	303	Taking all pair samples (one from class \f$+\f$ and one from class \f$-\f$)
494	10 Jan 06	peter	304	and calculating the weighted median of the distances.
338	03 Jun 05	peter	305
2753	24 Jun 12	peter	306	\section Distance
1153	26 Feb 08	peter	307
1181	27 Feb 08	peter	308	A \ref concept_distance measures how far apart two ranges are. A Distance should
1154	26 Feb 08	peter	309	preferably meet some criteria:
1154	26 Feb 08	peter	310
1154	26 Feb 08	peter	311	- It is symmetric, \f$ d(x,y) = d(y,x) \f$, that is distance from \f$
1159	26 Feb 08	peter	312	x \f$ to \f$ y \f$ equals the distance from \f$ y \f$ to \f$ x \f$.
1154	26 Feb 08	peter	313	- Zero self-distance: \f$ d(x,x) = 0 \f$
1154	26 Feb 08	peter	314	- Triangle inequality: \f$ d(x,z) \le d(x,y) + d(y,z) \f$
1154	26 Feb 08	peter	315
1154	26 Feb 08	peter	316	\subsection weighted_distance Weighted Distance
1154	26 Feb 08	peter	317
1154	26 Feb 08	peter	318	Weighted Distance is an extension of usual unweighted distances, in
1154	26 Feb 08	peter	319	which each data point is accompanied with a weight. A weighted
1154	26 Feb 08	peter	320	distance should meet some criteria:
1154	26 Feb 08	peter	321
1154	26 Feb 08	peter	322	- Having all unity weights should yield the unweighted case.
1181	27 Feb 08	peter	323	- Rescaling the weights, \f$ w_i = Cw_i \f$, does not change the distance.
1154	26 Feb 08	peter	324	- Having a \f$ w_x = 0 \f$ the distance should ignore corresponding
1154	26 Feb 08	peter	325	\f$ x \f$, \f$ y \f$, and \f$ w_y \f$.
1154	26 Feb 08	peter	326	- A zero weight should not result in a very different distance than a
1154	26 Feb 08	peter	327	small weight, in other words, modifying a weight should change the
1154	26 Feb 08	peter	328	distance in a continuous manner.
1154	26 Feb 08	peter	329	- The duplicate property. If data is coming in duplicate such that
1154	26 Feb 08	peter	330	\f$ x_{2i}=x_{2i+1} \f$, then the case when \f$ w_{2i}=w_{2i+1} \f$
1154	26 Feb 08	peter	331	should equal to if you set \f$ w_{2i}=0 \f$.
1154	26 Feb 08	peter	332
1181	27 Feb 08	peter	333	The last condition, duplicate property, implies that setting a weight
1181	27 Feb 08	peter	334	to zero is not equivalent to removing the data point. This behavior is
1181	27 Feb 08	peter	335	sensible because otherwise we would have a bias towards having ranges
1181	27 Feb 08	peter	336	with small weights being close to other ranges. For a weighted
1181	27 Feb 08	peter	337	distance, meeting these criteria, it might be difficult to show that
1181	27 Feb 08	peter	338	the triangle inequality is fulfilled. For most algorithms the triangle
1181	27 Feb 08	peter	339	inequality is not essential for the distance to work properly, so if
1181	27 Feb 08	peter	340	you need to choose between fulfilling triangle inequality and these
1181	27 Feb 08	peter	341	latter criteria it is preferable to meet the latter criteria.
1154	26 Feb 08	peter	342
1235	15 Mar 08	peter	343	In test/distance_test.cc there are tests for testing these properties.
1235	15 Mar 08	peter	344
1109	19 Feb 08	peter	345	\section Kernel
1125	22 Feb 08	peter	346	\subsection polynomial_kernel Polynomial Kernel
1109	19 Feb 08	peter	347	The polynomial kernel of degree \f$N\f$ is defined as \f$(1+<x,y>)^N\f$, where
1109	19 Feb 08	peter	348	\f$<x,y>\f$ is the linear kernel (usual scalar product). For the weighted
2753	24 Jun 12	peter	349	case we define the linear kernel to be
1159	26 Feb 08	peter	350	\f$<x,y>=\frac{\sum {w_xw_yxy}}{\sum{w_xw_y}}\f$ and the
627	05 Sep 06	peter	351	polynomial kernel can be calculated as before
2753	24 Jun 12	peter	352	\f$(1+<x,y>)^N\f$.
1153	26 Feb 08	peter	353
1125	22 Feb 08	peter	354	\subsection gaussian_kernel Gaussian Kernel
1153	26 Feb 08	peter	355	We define the weighted Gaussian kernel as \f$\exp\left(-N\frac{\sum
1153	26 Feb 08	peter	356	w_xw_y(x-y)^2}{\sum w_xw_y}\right)\f$.
338	03 Jun 05	peter	357
1109	19 Feb 08	peter	358	\section Regression
1109	19 Feb 08	peter	359	\subsection Naive
1109	19 Feb 08	peter	360	\subsection Linear
338	03 Jun 05	peter	361	We have the model
338	03 Jun 05	peter	362
2753	24 Jun 12	peter	363	\f$
2753	24 Jun 12	peter	364	y_i=\alpha+\beta (x-m_x)+\epsilon_i,
2753	24 Jun 12	peter	365	\f$
338	03 Jun 05	peter	366
1109	19 Feb 08	peter	367	where \f$\epsilon_i\f$ is the noise. The variance of the noise is
338	03 Jun 05	peter	368	inversely proportional to the weight,
1109	19 Feb 08	peter	369	\f$Var(\epsilon_i)=\frac{\sigma^2}{w_i}\f$. In order to determine the
338	03 Jun 05	peter	370	model parameters, we minimimize the sum of quadratic errors.
338	03 Jun 05	peter	371
1109	19 Feb 08	peter	372	\f$
338	03 Jun 05	peter	373	Q_0 = \sum \epsilon_i^2
1109	19 Feb 08	peter	374	\f$
338	03 Jun 05	peter	375
1109	19 Feb 08	peter	376	Taking the derivity with respect to \f$\alpha\f$ and \f$\beta\f$ yields two conditions
338	03 Jun 05	peter	377
2753	24 Jun 12	peter	378	\f$
494	10 Jan 06	peter	379	\frac{\partial Q_0}{\partial \alpha} = -2 \sum w_i(y_i - \alpha -
2753	24 Jun 12	peter	380	\beta (x_i-m_x)=0
1109	19 Feb 08	peter	381	\f$
338	03 Jun 05	peter	382
338	03 Jun 05	peter	383	and
338	03 Jun 05	peter	384
1109	19 Feb 08	peter	385	\f$ \frac{\partial Q_0}{\partial \beta} = -2 \sum
2753	24 Jun 12	peter	386	w_i(x_i-m_x)(y_i-\alpha-\beta(x_i-m_x)=0
1109	19 Feb 08	peter	387	\f$
338	03 Jun 05	peter	388
338	03 Jun 05	peter	389	or equivalently
338	03 Jun 05	peter	390
1109	19 Feb 08	peter	391	\f$
338	03 Jun 05	peter	392	\alpha = \frac{\sum w_iy_i}{\sum w_i}=m_y
1109	19 Feb 08	peter	393	\f$
338	03 Jun 05	peter	394
338	03 Jun 05	peter	395	and
338	03 Jun 05	peter	396
1109	19 Feb 08	peter	397	\f$ \beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum
2753	24 Jun 12	peter	398	w_i(x_i-m_x)^2}=\frac{Cov(x,y)}{Var(x)}
1109	19 Feb 08	peter	399	\f$
338	03 Jun 05	peter	400
338	03 Jun 05	peter	401	Note, by having all weights equal we get back the unweighted
338	03 Jun 05	peter	402	case. Furthermore, we calculate the variance of the estimators of
1109	19 Feb 08	peter	403	\f$\alpha\f$ and \f$\beta\f$.
338	03 Jun 05	peter	404
2753	24 Jun 12	peter	405	\f$
338	03 Jun 05	peter	406	\textrm{Var}(\alpha )=\frac{w_i^2\frac{\sigma^2}{w_i}}{(\sum w_i)^2}=
338	03 Jun 05	peter	407	\frac{\sigma^2}{\sum w_i}
1109	19 Feb 08	peter	408	\f$
338	03 Jun 05	peter	409
338	03 Jun 05	peter	410	and
1109	19 Feb 08	peter	411	\f$
338	03 Jun 05	peter	412	\textrm{Var}(\beta )= \frac{w_i^2(x_i-m_x)^2\frac{\sigma^2}{w_i}}
338	03 Jun 05	peter	413	{(\sum w_i(x_i-m_x)^2)^2}=
338	03 Jun 05	peter	414	\frac{\sigma^2}{\sum w_i(x_i-m_x)^2}
1109	19 Feb 08	peter	415	\f$
338	03 Jun 05	peter	416
1109	19 Feb 08	peter	417	Finally, we estimate the level of noise, \f$\sigma^2\f$. Inspired by the
338	03 Jun 05	peter	418	unweighted estimation
338	03 Jun 05	peter	419
1109	19 Feb 08	peter	420	\f$
338	03 Jun 05	peter	421	s^2=\frac{\sum (y_i-\alpha-\beta (x_i-m_x))^2}{n-2}
1109	19 Feb 08	peter	422	\f$
338	03 Jun 05	peter	423
338	03 Jun 05	peter	424	we suggest the following estimator
338	03 Jun 05	peter	425
1109	19 Feb 08	peter	426	\f$ s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum
1109	19 Feb 08	peter	427	w_i-2\frac{\sum w_i^2}{\sum w_i}} \f$
338	03 Jun 05	peter	428
1109	19 Feb 08	peter	429	*/