Introduction
A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions.
Terminologies

Root Node: Represents the entire population or sample, and this further gets divided into two or more homogeneous sets.

Leaf Node: That node which can’t get segregated into further nodes

Splitting: Dividing the root node/subtree into different parts based on some condition

Branch / Subtree: formed by splitting the treenode

Pruning: Opposite of splitting. Removing unwanted branches from the tree
Where to Split?
Gini Index
The measure of impurity or purity used in building a decision tree
Information Gain
A decrease in entropy after a dataset is split based on an attribute. Constructing a decision tree is about finding attributes that return the highest Information Gain
Reduction in Variance
The algorithm is used for continuous target variables. The split with lower variance is selected as the criteria to split the population
ChiSquare
The algorithm used to find out the statistical significance between the differences between subtree and parent nodes
Entropy

It characterizes the impurity of an arbitrary collection of examples.

Defines randomness in the data

Given a collection S, containing some positive and negative examples of some target concept, the entropy of S relative to this boolean classification

Entropy is 0 if all members belong to the same class

Entropy is 1 when the collection contains an equal number of positive and negative examples.
Appropriate Problems for decisiontree learning

Instances are represented as attributevalue pairs, where attributes take a small number of disjoint possible values

The target function has discrete output values, such as yes or no.

A decision tree can be used when training data contains errors or when it contains missing attribute values.
Issues in Decision Trees
A decision tree can be used when training data contains errors or when it contains missing attribute values.
Avoiding Overfitting of Data
We say that a hypothesis overfits the training examples if a different hypothesis that fits the training examples with less accuracy performs better over the entire distribution of instances.
There are several steps to avoid this:

Approaches that stop the growth of the tree

Approaches that allow the tree to overfit postprune it ( More successful in practice )
Reduced Error Pruning Approach
In this method, each decision node in the tree is considered a candidate for pruning. The subtree rooted at the node is replaced by a leaf node assigning it the most common classification. Nodes are removed if the resulted trees have no worse performance than the original. Nodes are removed iteratively, always choosing the node whose removal most increases the accuracy and stops when further pruning is harmful.
Including Costs
In some learning tasks, the instance attributes have associated costs. Iterative Dichotomiser 3 ( ID3 ) is modified to consider the costs by introducing a cost term into the attribute selection process.