Subset Selection

Introduction

The time and space complexity of any classifier or regressor depends on the number of inputs(d) and the size of the data sample(N). Subset Selection is a type of feature extraction method. The process of feature extraction basically involves finding k out of the d dimensions and then discarding the (d-k) dimensions. This results in finding the best subset of the set of features. This “best” subset has the least number of dimensions that contributes the most to accuracy, the rest is discarded. 

Subset Selection can be used for both classification and regression problems. There are two possible approaches to this subset selection process:
  • Forward Approach
  • Backwards Approach
In either approach, we should do the checking on the validation set distinct from the training set.

Notation

F: Feature set of input dimensions 
E(F): Error calculated when the inputs of F are used

Forward Approach

In this approach, we start with no features and keep adding those which result in the maximum decrease in error

Sequential Forward Sequence: 
  1. Start with a null Feature set
  2. In each iteration, we train our model on the training set and calculate E(F U x) on the validation set
  3. We choose x such that the error E(F U x) is minimized
  4. We add x to F if E(F U x) < E(F)
  5. Stop if there is no decrease in Error or the change is too less 

This process of testing features one by one is costly. Because to decrease the number of dimensions from d to k, we need to do testing and training d + (d-1) + … (d-k) times. Which results in a quadratic complexity function. 


In the “Floating Search Method”, we can change the number of added and removed features per step.

Backward Approach


In this approach, we start with all the features and then remove those ensuring the maximum decrease in error.

Sequential Backward Selection: 
  1. Start with F containing all the features
  2. Remove the feature such that E(F-x) results in max efficiency 
  3. Remove the feature such that E(F-x) < E(F)
  4. Stop if the feature doesn’t decrease the error or only causes a slight decrease in error
This has the same complexity as the Sequential Forward Approach.

Point to be Noted

Feature selection isn’t applicable in all domains. In face recognition, for instance, the individual pixels don’t carry much information it’s the combined representation which is important.

Leave a Reply

Your email address will not be published. Required fields are marked *