theplu::yat::classifier::SubsetGenerator< Data > Class Template Reference

Class splitting Data into training and validation set. More...

#include <yat/classifier/SubsetGenerator.h>

List of all members.

Public Types

typedef Data value_type

Public Member Functions

 SubsetGenerator (const Sampler &sampler, const Data &data)
 Create SubDataSets.
 SubsetGenerator (const Sampler &sampler, const Data &data, FeatureSelector &fs)
 Create SubDataSets with feature selection.
 ~SubsetGenerator ()
size_t size (void) const
const Targettarget (void) const
const Data & training_data (size_t i) const
const utility::Indextraining_features (size_t i) const
const utility::Indextraining_index (size_t i) const
const Targettraining_target (size_t i) const
const Data & validation_data (size_t i) const
const utility::Indexvalidation_index (size_t i) const
const Targetvalidation_target (size_t i) const


Detailed Description

template<typename Data>
class theplu::yat::classifier::SubsetGenerator< Data >

Class splitting Data into training and validation set.

A SubsetGenerator splits a Data into several training and validation data. A Sampler is used to select samples for a training Data set and a validation Data set, respectively. In addition a FeatureSelector can be used to select Features. For more details see constructors.

Note:
Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

Member Typedef Documentation

template<typename Data >
typedef Data theplu::yat::classifier::SubsetGenerator< Data >::value_type

type of Data that is stored in SubsetGenerator


Constructor & Destructor Documentation

template<typename Data >
theplu::yat::classifier::SubsetGenerator< Data >::SubsetGenerator ( const Sampler sampler,
const Data &  data 
) [inline]

Create SubDataSets.

Creates N training data sets and N validation data sets, where N equals the size of sampler. Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

In case of MatrixLookup or MatrixLookupWeighted, each column corresponds to a sample and the sampler is used to select columns. Sampler::training_index(size_t) is used to select columns for the corresponding traing_data, and Sampler::validation_index(size_t) is used to select columns for the corresponding validation_data.

In case of a KernelLookup it is a bit different. A symmetric training kernel is created using Sampler::training_index(size_t) to select rows and columns. The validation kernel is typically not symmetric, but the columns correspond to a validation sample and each row corresponds to a training sample. Consequently Sampler::training_index(size_t) is used to select rows, and Sampler::validation_index(size_t) is used to select columns.

Parameters:
sampler Sampler that is used to select samples.
data Data to split up in validation and training.

template<typename Data >
theplu::yat::classifier::SubsetGenerator< Data >::SubsetGenerator ( const Sampler sampler,
const Data &  data,
FeatureSelector fs 
) [inline]

Create SubDataSets with feature selection.

Creates N training data sets and N validation data sets, where N equals the size of sampler. The Sampler defines which samples are included in a subset. Likewise a FeatureSelector, fs, is used to select features. The selection is based on not based on the entire dataset but solely on the training dataset. Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

In case of MatrixLookup or MatrixLookupWeighted, each column corresponds to a sample and the sampler is used to select columns. Sampler::training_index(size_t) is used to select columns for the corresponding traing_data, and Sampler::validation_index(size_t) is used to select columns for the corresponding validation_data. The FeatureSelector is used to select features, i.e., to select rows to be included in the subsets.

In case of a KernelLookup it is a bit different. A symmetric training kernel is created using Sampler::training_index(size_t) to select rows and columns. However, the created KernelLookup is not simply the subkernel of data, but each element is recalculated using the features selected by FeatureSelector fs. In the validation kernel each column corresponds to a validation sample and each row corresponds to a training sample. Consequently Sampler::training_index(size_t) is used to select rows, and Sampler::validation_index(size_t) is used to select columns. The same set of features are used to caclulate the elements as for the training kernel, i.e., feature selection is based on training data.

Parameters:
sampler taking care of partioning dataset
data data to be split up in validation and training.
fs Object selecting features for each subset

template<typename Data >
theplu::yat::classifier::SubsetGenerator< Data >::~SubsetGenerator (  )  [inline]

Destructor


Member Function Documentation

template<typename Data >
size_t theplu::yat::classifier::SubsetGenerator< Data >::size ( void   )  const [inline]

Returns:
number of subsets

template<typename Data >
const Target & theplu::yat::classifier::SubsetGenerator< Data >::target ( void   )  const [inline]

Returns:
the target for the total set

template<typename Data >
const Data & theplu::yat::classifier::SubsetGenerator< Data >::training_data ( size_t  i  )  const [inline]

See constructors for details on how training data are generated.

Returns:
ith training data

template<typename Data >
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::training_features ( size_t  i  )  const [inline]

Features that are used to create ith training data and validation data.

Returns:
training features

template<typename Data >
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::training_index ( size_t  i  )  const [inline]

Returns:
Index of samples included in ith training data.

template<typename Data >
const Target & theplu::yat::classifier::SubsetGenerator< Data >::training_target ( size_t  i  )  const [inline]

Returns:
Targets of ith set of training samples

template<typename Data >
const Data & theplu::yat::classifier::SubsetGenerator< Data >::validation_data ( size_t  i  )  const [inline]

See constructors for details on how validation data are generated.

Returns:
ith validation data

template<typename Data >
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::validation_index ( size_t  i  )  const [inline]

Returns:
Index of samples included in ith validation data.

template<typename Data >
const Target & theplu::yat::classifier::SubsetGenerator< Data >::validation_target ( size_t  i  )  const [inline]

Returns:
Targets of ith set validation samples


The documentation for this class was generated from the following file:

Generated on Mon Nov 7 02:25:52 2011 for yat by  doxygen 1.5.9