Pacific-Design.com

    
Home Index

1. Machine Learning

2. 6 Confusion Matrix

Machine Learning / 6 Confusion Matrix /

Decision Tree - Entropy and Information Gain Computation

1.  Compute information gain for refund.
2.  Compute information gain for marital status.
3.  Between refund and marital status, 
      which should be chosen as an attribute to split the root node of a decision tree?
%----------------------------------------------------------------------%
% Compute entropy and information gain for marital status.
% Higher entropy means lower gain
% Refund Information Gain = 0.5857
%----------------------------------------------------------------------%

%-- Entropy Cheat 
Cheat=-(3/10)*log2(3/10)-(7/10)*log2(7/10)

%-- Entropy Refund Yes
% RefundYes=-(3/3)*log2(3/3)-(0/3)*log2(0/3)
  RefundYes=-(3/3)*log2(4/4)-(0/3)*log2(1/4)

%-- Entropy Refund No
RefundNo=-(4/7)*log2(4/7)-(3/7)*log2(3/7)

%-- Compute for Info Gain for Refund
Gain_Refund =Cheat-(7/10)*RefundYes-(3/10)*RefundNo
%----------------------------------------------------------------------%

%----------------------------------------------------------------------%
% Compute entropy and information gain for marital status.
% Marital Information Gain = 0.2813
%----------------------------------------------------------------------%

%-- Entropy for Cheat Class 
Class_Cheat=-(3/10)*log2(3/10)-(7/10)*log2(7/10)

%-- Entropy for Single
Single=-(2/4)*log2(2/4)-(2/4)*log2(2/4)

%-- Entropy for Married
% Because we can't do log2(0/4) we add +1 in boths sides
% Married=-(0/4)*log2(0/4)-(4/4)*log2(4/4)
Married=-(0/4)*log2(1/5)-(4/4)*log2(5/5)

%-- Entropy for Divorce
Divorced=-(1/2)*log2(1/2)-(1/2)*log2(1/2)

%-- Compute for Infor Gain
Gain_Mariage=Class_Cheat-(4/10)*Single-(4/10)*Married-(2/10)*Divorced
%----------------------------------------------------------------------%