yat  0.11.3pre
Public Types | Public Member Functions | List of all members
theplu::yat::classifier::SubsetGenerator< Data > Class Template Reference

Class splitting Data into training and validation set. More...

#include <yat/classifier/SubsetGenerator.h>

Public Types

typedef Data value_type
 

Public Member Functions

 SubsetGenerator (const Sampler &sampler, const Data &data)
 Create SubDataSets.
 
 SubsetGenerator (const Sampler &sampler, const Data &data, FeatureSelector &fs)
 Create SubDataSets with feature selection.
 
 ~SubsetGenerator ()
 
size_t size (void) const
 
const Targettarget (void) const
 
const Data & training_data (size_t i) const
 
const utility::Indextraining_features (size_t i) const
 
const utility::Indextraining_index (size_t i) const
 
const Targettraining_target (size_t i) const
 
const Data & validation_data (size_t i) const
 
const utility::Indexvalidation_index (size_t i) const
 
const Targetvalidation_target (size_t i) const
 

Detailed Description

template<typename Data>
class theplu::yat::classifier::SubsetGenerator< Data >

Class splitting Data into training and validation set.

A SubsetGenerator splits a Data into several training and validation data. A Sampler is used to select samples for a training Data set and a validation Data set, respectively. In addition a FeatureSelector can be used to select Features. For more details see constructors.

Note
Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

Member Typedef Documentation

template<typename Data >
typedef Data theplu::yat::classifier::SubsetGenerator< Data >::value_type

type of Data that is stored in SubsetGenerator

Constructor & Destructor Documentation

template<typename Data >
theplu::yat::classifier::SubsetGenerator< Data >::SubsetGenerator ( const Sampler sampler,
const Data &  data 
)

Create SubDataSets.

Creates N training data sets and N validation data sets, where N equals the size of sampler. Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

In case of MatrixLookup or MatrixLookupWeighted, each column corresponds to a sample and the sampler is used to select columns. Sampler::training_index(size_t) is used to select columns for the corresponding traing_data, and Sampler::validation_index(size_t) is used to select columns for the corresponding validation_data.

In case of a KernelLookup it is a bit different. A symmetric training kernel is created using Sampler::training_index(size_t) to select rows and columns. The validation kernel is typically not symmetric, but the columns correspond to a validation sample and each row corresponds to a training sample. Consequently Sampler::training_index(size_t) is used to select rows, and Sampler::validation_index(size_t) is used to select columns.

Parameters
samplerSampler that is used to select samples.
dataData to split up in validation and training.
template<typename Data >
theplu::yat::classifier::SubsetGenerator< Data >::SubsetGenerator ( const Sampler sampler,
const Data &  data,
FeatureSelector fs 
)

Create SubDataSets with feature selection.

Creates N training data sets and N validation data sets, where N equals the size of sampler. The Sampler defines which samples are included in a subset. Likewise a FeatureSelector, fs, is used to select features. The selection is based on not based on the entire dataset but solely on the training dataset. Data must be one of MatrixLookup, MatrixLookupWeighted, or KernelLookup.

In case of MatrixLookup or MatrixLookupWeighted, each column corresponds to a sample and the sampler is used to select columns. Sampler::training_index(size_t) is used to select columns for the corresponding traing_data, and Sampler::validation_index(size_t) is used to select columns for the corresponding validation_data. The FeatureSelector is used to select features, i.e., to select rows to be included in the subsets.

In case of a KernelLookup it is a bit different. A symmetric training kernel is created using Sampler::training_index(size_t) to select rows and columns. However, the created KernelLookup is not simply the subkernel of data, but each element is recalculated using the features selected by FeatureSelector fs. In the validation kernel each column corresponds to a validation sample and each row corresponds to a training sample. Consequently Sampler::training_index(size_t) is used to select rows, and Sampler::validation_index(size_t) is used to select columns. The same set of features are used to caclulate the elements as for the training kernel, i.e., feature selection is based on training data.

Parameters
samplertaking care of partioning dataset
datadata to be split up in validation and training.
fsObject selecting features for each subset
template<typename Data >
theplu::yat::classifier::SubsetGenerator< Data >::~SubsetGenerator ( )

Destructor

Member Function Documentation

template<typename Data >
size_t theplu::yat::classifier::SubsetGenerator< Data >::size ( void  ) const
Returns
number of subsets
template<typename Data >
const Target & theplu::yat::classifier::SubsetGenerator< Data >::target ( void  ) const
Returns
the target for the total set
template<typename Data >
const Data & theplu::yat::classifier::SubsetGenerator< Data >::training_data ( size_t  i) const

See constructors for details on how training data are generated.

Returns
ith training data
template<typename Data >
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::training_features ( size_t  i) const

Features that are used to create ith training data and validation data.

Returns
training features
template<typename Data >
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::training_index ( size_t  i) const
Returns
Index of samples included in ith training data.
template<typename Data >
const Target & theplu::yat::classifier::SubsetGenerator< Data >::training_target ( size_t  i) const
Returns
Targets of ith set of training samples
template<typename Data >
const Data & theplu::yat::classifier::SubsetGenerator< Data >::validation_data ( size_t  i) const

See constructors for details on how validation data are generated.

Returns
ith validation data
template<typename Data >
const utility::Index & theplu::yat::classifier::SubsetGenerator< Data >::validation_index ( size_t  i) const
Returns
Index of samples included in ith validation data.
template<typename Data >
const Target & theplu::yat::classifier::SubsetGenerator< Data >::validation_target ( size_t  i) const
Returns
Targets of ith set validation samples

The documentation for this class was generated from the following file:

Generated on Sat May 24 2014 03:33:05 for yat by  doxygen 1.8.2