# Decision Trees: Overview

### Introduction

A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions.

### Terminologies

1. Root Node: Represents the entire population or sample, and this further gets divided into two or more homogeneous sets.

2. Leaf Node: That node which can’t get segregated into further nodes

3. Splitting: Dividing the root node/subtree into different parts based on some condition

4. Branch / Subtree: formed by splitting the tree-node

5. Pruning: Opposite of splitting. Removing unwanted branches from the tree

### Where to Split?

#### Gini Index

The measure of impurity or purity used in building a decision tree

#### Information Gain

A decrease in entropy after a dataset is split based on an attribute. Constructing a decision tree is about finding attributes that return the highest Information Gain

#### Reduction in Variance

The algorithm is used for continuous target variables. The split with lower variance is selected as the criteria to split the population

#### Chi-Square

The algorithm used to find out the statistical significance between the differences between sub-tree and parent nodes

#### Entropy

• It characterizes the impurity of an arbitrary collection of examples.

• Defines randomness in the data

• Given a collection S, containing some positive and negative examples of some target concept, the entropy of S relative to this boolean classification

• Entropy is 0 if all members belong to the same class

• Entropy is 1 when the collection contains an equal number of positive and negative examples.

### Appropriate Problems for decision-tree learning

• Instances are represented as attribute-value pairs, where attributes take a small number of disjoint possible values

• The target function has discrete output values, such as yes or no.

• A decision tree can be used when training data contains errors or when it contains missing attribute values.

### Issues in Decision Trees

A decision tree can be used when training data contains errors or when it contains missing attribute values.

### Avoiding Overfitting of Data

We say that a hypothesis overfits the training examples if a different hypothesis that fits the training examples with less accuracy performs better over the entire distribution of instances.
There are several steps to avoid this:

1. Approaches that stop the growth of the tree

2. Approaches that allow the tree to overfit post-prune it ( More successful in practice )

### Reduced Error Pruning Approach

In this method, each decision node in the tree is considered a candidate for pruning. The subtree rooted at the node is replaced by a leaf node assigning it the most common classification. Nodes are removed if the resulted trees have no worse performance than the original. Nodes are removed iteratively, always choosing the node whose removal most increases the accuracy and stops when further pruning is harmful.

### Including Costs

In some learning tasks, the instance attributes have associated costs. Iterative Dichotomiser 3 ( ID3 ) is modified to consider the costs by introducing a cost term into the attribute selection process.