1109 |
19 Feb 08 |
peter |
// $Id$ |
1109 |
19 Feb 08 |
peter |
2 |
// |
1109 |
19 Feb 08 |
peter |
// Copyright (C) 2005 Peter Johansson |
2119 |
12 Dec 09 |
peter |
// Copyright (C) 2006 Jari Häkkinen, Peter Johansson, Markus Ringnér |
4359 |
23 Aug 23 |
peter |
// Copyright (C) 2007 Peter Johansson |
4359 |
23 Aug 23 |
peter |
// Copyright (C) 2008 Jari Häkkinen, Peter Johansson |
2787 |
23 Jul 12 |
peter |
// Copyright (C) 2012 Peter Johansson |
1109 |
19 Feb 08 |
peter |
8 |
// |
1437 |
25 Aug 08 |
peter |
// This file is part of the yat library, http://dev.thep.lu.se/yat |
1109 |
19 Feb 08 |
peter |
10 |
// |
1109 |
19 Feb 08 |
peter |
// The yat library is free software; you can redistribute it and/or |
1109 |
19 Feb 08 |
peter |
// modify it under the terms of the GNU General Public License as |
1486 |
09 Sep 08 |
jari |
// published by the Free Software Foundation; either version 3 of the |
1109 |
19 Feb 08 |
peter |
// License, or (at your option) any later version. |
1109 |
19 Feb 08 |
peter |
15 |
// |
1109 |
19 Feb 08 |
peter |
// The yat library is distributed in the hope that it will be useful, |
1109 |
19 Feb 08 |
peter |
// but WITHOUT ANY WARRANTY; without even the implied warranty of |
1109 |
19 Feb 08 |
peter |
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU |
1109 |
19 Feb 08 |
peter |
// General Public License for more details. |
1109 |
19 Feb 08 |
peter |
20 |
// |
1109 |
19 Feb 08 |
peter |
// You should have received a copy of the GNU General Public License |
1487 |
10 Sep 08 |
jari |
// along with yat. If not, see <http://www.gnu.org/licenses/>. |
338 |
03 Jun 05 |
peter |
23 |
|
675 |
10 Oct 06 |
jari |
24 |
|
1109 |
19 Feb 08 |
peter |
25 |
/** |
1109 |
19 Feb 08 |
peter |
\page weighted_statistics Weighted Statistics |
494 |
10 Jan 06 |
peter |
27 |
|
1109 |
19 Feb 08 |
peter |
\section Introduction |
494 |
10 Jan 06 |
peter |
There are several different reasons why a statistical analysis needs |
494 |
10 Jan 06 |
peter |
to adjust for weighting. In literature reasons are mainly diveded in |
494 |
10 Jan 06 |
peter |
to groups. |
338 |
03 Jun 05 |
peter |
32 |
|
494 |
10 Jan 06 |
peter |
The first group is when some of the measurements are known to be more |
586 |
19 Jun 06 |
peter |
precise than others. The more precise a measurement is, the larger |
494 |
10 Jan 06 |
peter |
weight it is given. The simplest case is when the weight are given |
494 |
10 Jan 06 |
peter |
before the measurements and they can be treated as deterministic. It |
494 |
10 Jan 06 |
peter |
becomes more complicated when the weight can be determined not until |
494 |
10 Jan 06 |
peter |
afterwards, and even more complicated if the weight depends on the |
494 |
10 Jan 06 |
peter |
value of the observable. |
338 |
03 Jun 05 |
peter |
40 |
|
494 |
10 Jan 06 |
peter |
The second group of situations is when calculating averages over one |
494 |
10 Jan 06 |
peter |
distribution and sampling from another distribution. Compensating for |
494 |
10 Jan 06 |
peter |
this discrepency weights are introduced to the analysis. A simple |
494 |
10 Jan 06 |
peter |
example may be that we are interviewing people but for economical |
494 |
10 Jan 06 |
peter |
reasons we choose to interview more people from the city than from the |
494 |
10 Jan 06 |
peter |
countryside. When summarizing the statistics the answers from the city |
494 |
10 Jan 06 |
peter |
are given a smaller weight. In this example we are choosing the |
494 |
10 Jan 06 |
peter |
proportions of people from countryside and people from city being |
494 |
10 Jan 06 |
peter |
intervied. Hence, we can determine the weights before and consider |
494 |
10 Jan 06 |
peter |
them to be deterministic. In other situations the proportions are not |
494 |
10 Jan 06 |
peter |
deterministic, but rather a result from the sampling and the weights |
494 |
10 Jan 06 |
peter |
must be treated as stochastic and only in rare situations the weights |
494 |
10 Jan 06 |
peter |
can be treated as independent of the observable. |
338 |
03 Jun 05 |
peter |
54 |
|
586 |
19 Jun 06 |
peter |
Since there are various origins for a weight occuring in a statistical |
586 |
19 Jun 06 |
peter |
analysis, there are various ways to treat the weights and in general |
494 |
10 Jan 06 |
peter |
the analysis should be tailored to treat the weights correctly. We |
494 |
10 Jan 06 |
peter |
have not chosen one situation for our implementations, so see specific |
494 |
10 Jan 06 |
peter |
function documentation for what assumtions are made. Though, common |
744 |
10 Feb 07 |
peter |
for implementations are the following: |
1109 |
19 Feb 08 |
peter |
61 |
|
1109 |
19 Feb 08 |
peter |
- Setting all weights to unity yields the same result as the |
494 |
10 Jan 06 |
peter |
non-weighted version. |
1109 |
19 Feb 08 |
peter |
- Rescaling the weights does not change any function. |
1109 |
19 Feb 08 |
peter |
- Setting a weight to zero is equivalent to removing the data point. |
1109 |
19 Feb 08 |
peter |
66 |
|
1639 |
11 Dec 08 |
peter |
The last point implies that a data point with zero weight is ignored |
1639 |
11 Dec 08 |
peter |
also when the value is NaN. |
494 |
10 Jan 06 |
peter |
An important case is when weights are binary (either 1 or 0). Then we |
586 |
19 Jun 06 |
peter |
get the same result using the weighted version as using the data with |
494 |
10 Jan 06 |
peter |
weight not equal to zero and the non-weighted version. Hence, using |
494 |
10 Jan 06 |
peter |
binary weights and the weighted version missing values can be treated |
494 |
10 Jan 06 |
peter |
in a proper way. |
338 |
03 Jun 05 |
peter |
74 |
|
1109 |
19 Feb 08 |
peter |
\section AveragerWeighted |
338 |
03 Jun 05 |
peter |
76 |
|
338 |
03 Jun 05 |
peter |
77 |
|
338 |
03 Jun 05 |
peter |
78 |
|
1109 |
19 Feb 08 |
peter |
\subsection Mean |
338 |
03 Jun 05 |
peter |
80 |
|
494 |
10 Jan 06 |
peter |
For any situation the weight is always designed so the weighted mean |
1109 |
19 Feb 08 |
peter |
is calculated as \f$ m=\frac{\sum w_ix_i}{\sum w_i} \f$, which obviously |
494 |
10 Jan 06 |
peter |
fulfills the conditions above. |
338 |
03 Jun 05 |
peter |
84 |
|
1109 |
19 Feb 08 |
peter |
85 |
|
1109 |
19 Feb 08 |
peter |
86 |
|
494 |
10 Jan 06 |
peter |
In the case of varying measurement error, it could be motivated that |
1109 |
19 Feb 08 |
peter |
the weight shall be \f$ w_i = 1/\sigma_i^2 \f$. We assume measurement error |
494 |
10 Jan 06 |
peter |
to be Gaussian and the likelihood to get our measurements is |
1109 |
19 Feb 08 |
peter |
\f$ L(m)=\prod |
1109 |
19 Feb 08 |
peter |
(2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}} \f$. We |
1109 |
19 Feb 08 |
peter |
maximize the likelihood by taking the derivity with respect to \f$ m \f$ on |
1109 |
19 Feb 08 |
peter |
the logarithm of the likelihood \f$ \frac{d\ln L(m)}{dm}=\sum |
1109 |
19 Feb 08 |
peter |
\frac{x_i-m}{\sigma_i^2} \f$. Hence, the Maximum Likelihood method yields |
1109 |
19 Feb 08 |
peter |
the estimator \f$ m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2} \f$. |
338 |
03 Jun 05 |
peter |
96 |
|
338 |
03 Jun 05 |
peter |
97 |
|
1109 |
19 Feb 08 |
peter |
\subsection Variance |
494 |
10 Jan 06 |
peter |
In case of varying variance, there is no point estimating a variance |
494 |
10 Jan 06 |
peter |
since it is different for each data point. |
338 |
03 Jun 05 |
peter |
101 |
|
494 |
10 Jan 06 |
peter |
Instead we look at the case when we want to estimate the variance over |
1109 |
19 Feb 08 |
peter |
\f$f\f$ but are sampling from \f$ f' \f$. For the mean of an observable \f$ O \f$ we |
1109 |
19 Feb 08 |
peter |
have \f$ \widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum |
1109 |
19 Feb 08 |
peter |
w_i} \f$. Hence, an estimator of the variance of \f$ X \f$ is |
1109 |
19 Feb 08 |
peter |
106 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
2753 |
24 Jun 12 |
peter |
s^2 = <X^2>-<X>^2= |
2753 |
24 Jun 12 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
110 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
= \frac{\sum w_ix_i^2}{\sum w_i}-\frac{(\sum w_ix_i)^2}{(\sum w_i)^2}= |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
114 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
= \frac{\sum w_i(x_i^2-m^2)}{\sum w_i}= |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
118 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
= \frac{\sum w_i(x_i^2-2mx_i+m^2)}{\sum w_i}= |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
122 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
= \frac{\sum w_i(x_i-m)^2}{\sum w_i} |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
126 |
|
2752 |
24 Jun 12 |
peter |
This estimator is invariant under rescaling and |
494 |
10 Jan 06 |
peter |
having a weight equal to zero is equivalent to removing the data |
2752 |
24 Jun 12 |
peter |
point. Having all weights equal to unity results in \f$ s^2=\frac{\sum |
1109 |
19 Feb 08 |
peter |
(x_i-m)^2}{N} \f$, which is the same as returned from Averager. Hence, |
494 |
10 Jan 06 |
peter |
this estimator is slightly biased, but still very efficient. |
338 |
03 Jun 05 |
peter |
132 |
|
1109 |
19 Feb 08 |
peter |
\subsection standard_error Standard Error |
494 |
10 Jan 06 |
peter |
The standard error squared is equal to the expexted squared error of |
1109 |
19 Feb 08 |
peter |
the estimation of \f$m\f$. The squared error consists of two parts, the |
1109 |
19 Feb 08 |
peter |
variance of the estimator and the squared bias: |
1109 |
19 Feb 08 |
peter |
137 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
<m-\mu>^2=<m-<m>+<m>-\mu>^2= |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
<m-<m>>^2+(<m>-\mu)^2 |
2753 |
24 Jun 12 |
peter |
\f$. |
1109 |
19 Feb 08 |
peter |
144 |
|
1109 |
19 Feb 08 |
peter |
In the case when weights are included in analysis due to varying |
1109 |
19 Feb 08 |
peter |
measurement errors and the weights can be treated as deterministic, we |
1109 |
19 Feb 08 |
peter |
have |
1109 |
19 Feb 08 |
peter |
148 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
Var(m)=\frac{\sum w_i^2\sigma_i^2}{\left(\sum w_i\right)^2}= |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
\f$ |
494 |
10 Jan 06 |
peter |
\frac{\sum w_i^2\frac{\sigma_0^2}{w_i}}{\left(\sum w_i\right)^2}= |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
\frac{\sigma_0^2}{\sum w_i}, |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
158 |
|
1109 |
19 Feb 08 |
peter |
where we need to estimate \f$ \sigma_0^2 \f$. Again we have the likelihood |
1109 |
19 Feb 08 |
peter |
160 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{(-\frac{w_i(x-m)^2}{2\sigma_0^2})} |
1109 |
19 Feb 08 |
peter |
\f$ |
2753 |
24 Jun 12 |
peter |
and taking the derivity with respect to |
2753 |
24 Jun 12 |
peter |
\f$\sigma_o^2\f$, |
1109 |
19 Feb 08 |
peter |
166 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
\frac{d\ln L}{d\sigma_i^2}= |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
\sum -\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2} |
2753 |
24 Jun 12 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
173 |
|
1109 |
19 Feb 08 |
peter |
which |
1109 |
19 Feb 08 |
peter |
yields an estimator \f$ \sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2 \f$. This |
494 |
10 Jan 06 |
peter |
estimator is not ignoring weights equal to zero, because deviation is |
494 |
10 Jan 06 |
peter |
most often smaller than the expected infinity. Therefore, we modify |
1109 |
19 Feb 08 |
peter |
the expression as follows \f$\sigma_0^2=\frac{\sum w_i^2}{\left(\sum |
1109 |
19 Feb 08 |
peter |
w_i\right)^2}\sum w_i(x-m)^2\f$ and we get the following estimator of |
1109 |
19 Feb 08 |
peter |
the variance of the mean \f$\sigma_0^2=\frac{\sum w_i^2}{\left(\sum |
1109 |
19 Feb 08 |
peter |
w_i\right)^3}\sum w_i(x-m)^2\f$. This estimator fulfills the conditions |
494 |
10 Jan 06 |
peter |
above: adding a weight zero does not change it: rescaling the weights |
494 |
10 Jan 06 |
peter |
does not change it, and setting all weights to unity yields the same |
494 |
10 Jan 06 |
peter |
expression as in the non-weighted case. |
338 |
03 Jun 05 |
peter |
185 |
|
494 |
10 Jan 06 |
peter |
In a case when it is not a good approximation to treat the weights as |
494 |
10 Jan 06 |
peter |
deterministic, there are two ways to get a better estimation. The |
1109 |
19 Feb 08 |
peter |
first one is to linearize the expression \f$\left<\frac{\sum |
1109 |
19 Feb 08 |
peter |
w_ix_i}{\sum w_i}\right>\f$. The second method when the situation is |
494 |
10 Jan 06 |
peter |
more complicated is to estimate the standard error using a |
494 |
10 Jan 06 |
peter |
bootstrapping method. |
338 |
03 Jun 05 |
peter |
192 |
|
1109 |
19 Feb 08 |
peter |
\section AveragerPairWeighted |
1109 |
19 Feb 08 |
peter |
Here data points come in pairs (x,y). We are sampling from \f$f'_{XY}\f$ |
1109 |
19 Feb 08 |
peter |
but want to measure from \f$f_{XY}\f$. To compensate for this decrepency, |
1109 |
19 Feb 08 |
peter |
averages of \f$g(x,y)\f$ are taken as \f$\sum \frac{f}{f'}g(x,y)\f$. Even |
1109 |
19 Feb 08 |
peter |
though, \f$X\f$ and \f$Y\f$ are not independent \f$(f_{XY}\neq f_Xf_Y)\f$ we |
1109 |
19 Feb 08 |
peter |
assume that we can factorize the ratio and get \f$\frac{\sum |
1109 |
19 Feb 08 |
peter |
w_xw_yg(x,y)}{\sum w_xw_y}\f$ |
1109 |
19 Feb 08 |
peter |
\subsection Covariance |
494 |
10 Jan 06 |
peter |
Following the variance calculations for AveragerWeighted we have |
1109 |
19 Feb 08 |
peter |
\f$Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}\f$ where |
1109 |
19 Feb 08 |
peter |
\f$m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}\f$ |
338 |
03 Jun 05 |
peter |
204 |
|
1109 |
19 Feb 08 |
peter |
\subsection Correlation |
338 |
03 Jun 05 |
peter |
206 |
|
2753 |
24 Jun 12 |
peter |
As the mean is estimated as |
2753 |
24 Jun 12 |
peter |
\f$ |
2753 |
24 Jun 12 |
peter |
m_x=\frac{\sum w_xw_yx}{\sum w_xw_y} |
1109 |
19 Feb 08 |
peter |
\f$, |
2753 |
24 Jun 12 |
peter |
the variance is estimated as |
2753 |
24 Jun 12 |
peter |
\f$ |
2753 |
24 Jun 12 |
peter |
\sigma_x^2=\frac{\sum w_xw_y(x-m_x)^2}{\sum w_xw_y} |
2753 |
24 Jun 12 |
peter |
\f$. |
1109 |
19 Feb 08 |
peter |
As in the non-weighted case we define the correlation to be the ratio |
1109 |
19 Feb 08 |
peter |
between the covariance and geometrical average of the variances |
338 |
03 Jun 05 |
peter |
217 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
1109 |
19 Feb 08 |
peter |
\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum |
1109 |
19 Feb 08 |
peter |
w_xw_y(y-m_y)^2}} |
1109 |
19 Feb 08 |
peter |
\f$. |
338 |
03 Jun 05 |
peter |
222 |
|
1109 |
19 Feb 08 |
peter |
223 |
|
492 |
09 Jan 06 |
peter |
This expression fulfills the following |
1109 |
19 Feb 08 |
peter |
- Having N equal weights the expression reduces to the non-weighted expression. |
1109 |
19 Feb 08 |
peter |
- Adding a pair of data, in which one weight is zero is equivalent |
494 |
10 Jan 06 |
peter |
to ignoring the data pair. |
1109 |
19 Feb 08 |
peter |
- Correlation is equal to unity if and only if \f$x\f$ is equal to |
1109 |
19 Feb 08 |
peter |
\f$y\f$. Otherwise the correlation is between -1 and 1. |
338 |
03 Jun 05 |
peter |
230 |
|
1109 |
19 Feb 08 |
peter |
\section Score |
338 |
03 Jun 05 |
peter |
232 |
|
1109 |
19 Feb 08 |
peter |
\subsection Pearson |
338 |
03 Jun 05 |
peter |
234 |
|
1109 |
19 Feb 08 |
peter |
\f$\frac{\sum w(x-m_x)(y-m_y)}{\sqrt{\sum w(x-m_x)^2\sum w(y-m_y)^2}}\f$. |
338 |
03 Jun 05 |
peter |
236 |
|
492 |
09 Jan 06 |
peter |
See AveragerPairWeighted correlation. |
338 |
03 Jun 05 |
peter |
238 |
|
1109 |
19 Feb 08 |
peter |
\subsection ROC |
338 |
03 Jun 05 |
peter |
240 |
|
494 |
10 Jan 06 |
peter |
An interpretation of the ROC curve area is the probability that if we |
1109 |
19 Feb 08 |
peter |
take one sample from class \f$+\f$ and one sample from class \f$-\f$, what is |
1109 |
19 Feb 08 |
peter |
the probability that the sample from class \f$+\f$ has greater value. The |
494 |
10 Jan 06 |
peter |
ROC curve area calculates the ratio of pairs fulfilling this |
338 |
03 Jun 05 |
peter |
245 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
\frac{\sum_{\{i,j\}:x^-_i<x^+_j}1}{\sum_{i,j}1}. |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
249 |
|
2695 |
28 Feb 12 |
peter |
A geometrical interpretation is to have a number of squares where |
494 |
10 Jan 06 |
peter |
each square correspond to a pair of samples. The ROC curve follows the |
1109 |
19 Feb 08 |
peter |
border between pairs in which the samples from class \f$+\f$ has a greater |
494 |
10 Jan 06 |
peter |
value and pairs in which this is not fulfilled. The ROC curve area is |
494 |
10 Jan 06 |
peter |
the area of those latter squares and a natural extension is to weight |
494 |
10 Jan 06 |
peter |
each pair with its two weights and consequently the weighted ROC curve |
494 |
10 Jan 06 |
peter |
area becomes |
338 |
03 Jun 05 |
peter |
257 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
\frac{\sum_{\{i,j\}:x^-_i<x^+_j}w^-_iw^+_j}{\sum_{i,j}w^-_iw^+_j} |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
261 |
|
494 |
10 Jan 06 |
peter |
This expression is invariant under a rescaling of weight. Adding a |
494 |
10 Jan 06 |
peter |
data value with weight zero adds nothing to the exprssion, and having |
494 |
10 Jan 06 |
peter |
all weight equal to unity yields the non-weighted ROC curve area. |
338 |
03 Jun 05 |
peter |
265 |
|
1109 |
19 Feb 08 |
peter |
\subsection tScore |
338 |
03 Jun 05 |
peter |
267 |
|
1109 |
19 Feb 08 |
peter |
Assume that \f$x\f$ and \f$y\f$ originate from the same distribution |
1109 |
19 Feb 08 |
peter |
\f$N(\mu,\sigma_i^2)\f$ where \f$\sigma_i^2=\frac{\sigma_0^2}{w_i}\f$. We then |
1109 |
19 Feb 08 |
peter |
estimate \f$\sigma_0^2\f$ as |
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2} |
2753 |
24 Jun 12 |
peter |
{\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+ |
492 |
09 Jan 06 |
peter |
\frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2} |
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
The variance of difference of the means becomes |
1109 |
19 Feb 08 |
peter |
\f$ |
494 |
10 Jan 06 |
peter |
Var(m_x)+Var(m_y)=\\\frac{\sum w_i^2Var(x_i)}{\left(\sum |
494 |
10 Jan 06 |
peter |
w_i\right)^2}+\frac{\sum w_i^2Var(y_i)}{\left(\sum w_i\right)^2}= |
492 |
09 Jan 06 |
peter |
\frac{\sigma_0^2}{\sum w_i}+\frac{\sigma_0^2}{\sum w_i}, |
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
and consequently the t-score becomes |
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2} |
2753 |
24 Jun 12 |
peter |
{\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+ |
492 |
09 Jan 06 |
peter |
\frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2} |
492 |
09 Jan 06 |
peter |
\left(\frac{1}{\sum w_i}+\frac{1}{\sum w_i}\right), |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
289 |
|
1109 |
19 Feb 08 |
peter |
For a \f$w_i=w\f$ we this expression get condensed down to |
1109 |
19 Feb 08 |
peter |
\f$ |
492 |
09 Jan 06 |
peter |
\frac{w\sum (x-m_x)^2+w\sum (y-m_y)^2} |
2753 |
24 Jun 12 |
peter |
{n_x+n_y-2} |
492 |
09 Jan 06 |
peter |
\left(\frac{1}{wn_x}+\frac{1}{wn_y}\right), |
1109 |
19 Feb 08 |
peter |
\f$ |
2753 |
24 Jun 12 |
peter |
in other words the good old expression as for non-weighted. |
338 |
03 Jun 05 |
peter |
297 |
|
1109 |
19 Feb 08 |
peter |
\subsection FoldChange |
494 |
10 Jan 06 |
peter |
Fold-Change is simply the difference between the weighted mean of the |
1109 |
19 Feb 08 |
peter |
two groups \f$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}\f$ |
338 |
03 Jun 05 |
peter |
301 |
|
1109 |
19 Feb 08 |
peter |
\subsection WilcoxonFoldChange |
1109 |
19 Feb 08 |
peter |
Taking all pair samples (one from class \f$+\f$ and one from class \f$-\f$) |
494 |
10 Jan 06 |
peter |
and calculating the weighted median of the distances. |
338 |
03 Jun 05 |
peter |
305 |
|
2753 |
24 Jun 12 |
peter |
\section Distance |
1153 |
26 Feb 08 |
peter |
307 |
|
1181 |
27 Feb 08 |
peter |
A \ref concept_distance measures how far apart two ranges are. A Distance should |
1154 |
26 Feb 08 |
peter |
preferably meet some criteria: |
1154 |
26 Feb 08 |
peter |
310 |
|
1154 |
26 Feb 08 |
peter |
- It is symmetric, \f$ d(x,y) = d(y,x) \f$, that is distance from \f$ |
1159 |
26 Feb 08 |
peter |
x \f$ to \f$ y \f$ equals the distance from \f$ y \f$ to \f$ x \f$. |
1154 |
26 Feb 08 |
peter |
- Zero self-distance: \f$ d(x,x) = 0 \f$ |
1154 |
26 Feb 08 |
peter |
- Triangle inequality: \f$ d(x,z) \le d(x,y) + d(y,z) \f$ |
1154 |
26 Feb 08 |
peter |
315 |
|
1154 |
26 Feb 08 |
peter |
\subsection weighted_distance Weighted Distance |
1154 |
26 Feb 08 |
peter |
317 |
|
1154 |
26 Feb 08 |
peter |
Weighted Distance is an extension of usual unweighted distances, in |
1154 |
26 Feb 08 |
peter |
which each data point is accompanied with a weight. A weighted |
1154 |
26 Feb 08 |
peter |
distance should meet some criteria: |
1154 |
26 Feb 08 |
peter |
321 |
|
1154 |
26 Feb 08 |
peter |
- Having all unity weights should yield the unweighted case. |
1181 |
27 Feb 08 |
peter |
- Rescaling the weights, \f$ w_i = Cw_i \f$, does not change the distance. |
1154 |
26 Feb 08 |
peter |
- Having a \f$ w_x = 0 \f$ the distance should ignore corresponding |
1154 |
26 Feb 08 |
peter |
\f$ x \f$, \f$ y \f$, and \f$ w_y \f$. |
1154 |
26 Feb 08 |
peter |
- A zero weight should not result in a very different distance than a |
1154 |
26 Feb 08 |
peter |
small weight, in other words, modifying a weight should change the |
1154 |
26 Feb 08 |
peter |
distance in a continuous manner. |
1154 |
26 Feb 08 |
peter |
- The duplicate property. If data is coming in duplicate such that |
1154 |
26 Feb 08 |
peter |
\f$ x_{2i}=x_{2i+1} \f$, then the case when \f$ w_{2i}=w_{2i+1} \f$ |
1154 |
26 Feb 08 |
peter |
should equal to if you set \f$ w_{2i}=0 \f$. |
1154 |
26 Feb 08 |
peter |
332 |
|
1181 |
27 Feb 08 |
peter |
The last condition, duplicate property, implies that setting a weight |
1181 |
27 Feb 08 |
peter |
to zero is not equivalent to removing the data point. This behavior is |
1181 |
27 Feb 08 |
peter |
sensible because otherwise we would have a bias towards having ranges |
1181 |
27 Feb 08 |
peter |
with small weights being close to other ranges. For a weighted |
1181 |
27 Feb 08 |
peter |
distance, meeting these criteria, it might be difficult to show that |
1181 |
27 Feb 08 |
peter |
the triangle inequality is fulfilled. For most algorithms the triangle |
1181 |
27 Feb 08 |
peter |
inequality is not essential for the distance to work properly, so if |
1181 |
27 Feb 08 |
peter |
you need to choose between fulfilling triangle inequality and these |
1181 |
27 Feb 08 |
peter |
latter criteria it is preferable to meet the latter criteria. |
1154 |
26 Feb 08 |
peter |
342 |
|
1235 |
15 Mar 08 |
peter |
In test/distance_test.cc there are tests for testing these properties. |
1235 |
15 Mar 08 |
peter |
344 |
|
1109 |
19 Feb 08 |
peter |
\section Kernel |
1125 |
22 Feb 08 |
peter |
\subsection polynomial_kernel Polynomial Kernel |
1109 |
19 Feb 08 |
peter |
The polynomial kernel of degree \f$N\f$ is defined as \f$(1+<x,y>)^N\f$, where |
1109 |
19 Feb 08 |
peter |
\f$<x,y>\f$ is the linear kernel (usual scalar product). For the weighted |
2753 |
24 Jun 12 |
peter |
case we define the linear kernel to be |
1159 |
26 Feb 08 |
peter |
\f$<x,y>=\frac{\sum {w_xw_yxy}}{\sum{w_xw_y}}\f$ and the |
627 |
05 Sep 06 |
peter |
polynomial kernel can be calculated as before |
2753 |
24 Jun 12 |
peter |
\f$(1+<x,y>)^N\f$. |
1153 |
26 Feb 08 |
peter |
353 |
|
1125 |
22 Feb 08 |
peter |
\subsection gaussian_kernel Gaussian Kernel |
1153 |
26 Feb 08 |
peter |
We define the weighted Gaussian kernel as \f$\exp\left(-N\frac{\sum |
1153 |
26 Feb 08 |
peter |
w_xw_y(x-y)^2}{\sum w_xw_y}\right)\f$. |
338 |
03 Jun 05 |
peter |
357 |
|
1109 |
19 Feb 08 |
peter |
\section Regression |
1109 |
19 Feb 08 |
peter |
\subsection Naive |
1109 |
19 Feb 08 |
peter |
\subsection Linear |
338 |
03 Jun 05 |
peter |
We have the model |
338 |
03 Jun 05 |
peter |
362 |
|
2753 |
24 Jun 12 |
peter |
\f$ |
2753 |
24 Jun 12 |
peter |
y_i=\alpha+\beta (x-m_x)+\epsilon_i, |
2753 |
24 Jun 12 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
366 |
|
1109 |
19 Feb 08 |
peter |
where \f$\epsilon_i\f$ is the noise. The variance of the noise is |
338 |
03 Jun 05 |
peter |
inversely proportional to the weight, |
1109 |
19 Feb 08 |
peter |
\f$Var(\epsilon_i)=\frac{\sigma^2}{w_i}\f$. In order to determine the |
338 |
03 Jun 05 |
peter |
model parameters, we minimimize the sum of quadratic errors. |
338 |
03 Jun 05 |
peter |
371 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
Q_0 = \sum \epsilon_i^2 |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
375 |
|
1109 |
19 Feb 08 |
peter |
Taking the derivity with respect to \f$\alpha\f$ and \f$\beta\f$ yields two conditions |
338 |
03 Jun 05 |
peter |
377 |
|
2753 |
24 Jun 12 |
peter |
\f$ |
494 |
10 Jan 06 |
peter |
\frac{\partial Q_0}{\partial \alpha} = -2 \sum w_i(y_i - \alpha - |
2753 |
24 Jun 12 |
peter |
\beta (x_i-m_x)=0 |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
382 |
|
338 |
03 Jun 05 |
peter |
and |
338 |
03 Jun 05 |
peter |
384 |
|
1109 |
19 Feb 08 |
peter |
\f$ \frac{\partial Q_0}{\partial \beta} = -2 \sum |
2753 |
24 Jun 12 |
peter |
w_i(x_i-m_x)(y_i-\alpha-\beta(x_i-m_x)=0 |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
388 |
|
338 |
03 Jun 05 |
peter |
or equivalently |
338 |
03 Jun 05 |
peter |
390 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
\alpha = \frac{\sum w_iy_i}{\sum w_i}=m_y |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
394 |
|
338 |
03 Jun 05 |
peter |
and |
338 |
03 Jun 05 |
peter |
396 |
|
1109 |
19 Feb 08 |
peter |
\f$ \beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum |
2753 |
24 Jun 12 |
peter |
w_i(x_i-m_x)^2}=\frac{Cov(x,y)}{Var(x)} |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
400 |
|
338 |
03 Jun 05 |
peter |
Note, by having all weights equal we get back the unweighted |
338 |
03 Jun 05 |
peter |
case. Furthermore, we calculate the variance of the estimators of |
1109 |
19 Feb 08 |
peter |
\f$\alpha\f$ and \f$\beta\f$. |
338 |
03 Jun 05 |
peter |
404 |
|
2753 |
24 Jun 12 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
\textrm{Var}(\alpha )=\frac{w_i^2\frac{\sigma^2}{w_i}}{(\sum w_i)^2}= |
338 |
03 Jun 05 |
peter |
\frac{\sigma^2}{\sum w_i} |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
409 |
|
338 |
03 Jun 05 |
peter |
and |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
\textrm{Var}(\beta )= \frac{w_i^2(x_i-m_x)^2\frac{\sigma^2}{w_i}} |
338 |
03 Jun 05 |
peter |
{(\sum w_i(x_i-m_x)^2)^2}= |
338 |
03 Jun 05 |
peter |
\frac{\sigma^2}{\sum w_i(x_i-m_x)^2} |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
416 |
|
1109 |
19 Feb 08 |
peter |
Finally, we estimate the level of noise, \f$\sigma^2\f$. Inspired by the |
338 |
03 Jun 05 |
peter |
unweighted estimation |
338 |
03 Jun 05 |
peter |
419 |
|
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
s^2=\frac{\sum (y_i-\alpha-\beta (x_i-m_x))^2}{n-2} |
1109 |
19 Feb 08 |
peter |
\f$ |
338 |
03 Jun 05 |
peter |
423 |
|
338 |
03 Jun 05 |
peter |
we suggest the following estimator |
338 |
03 Jun 05 |
peter |
425 |
|
1109 |
19 Feb 08 |
peter |
\f$ s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum |
1109 |
19 Feb 08 |
peter |
w_i-2\frac{\sum w_i^2}{\sum w_i}} \f$ |
338 |
03 Jun 05 |
peter |
428 |
|
1109 |
19 Feb 08 |
peter |
429 |
*/ |