Decision Tree in Machine Learning – Simple Guide with Examples

Machine Learning Algorithm: Decision Tree Explained for Beginners | CodeMyFYP
Decision Tree in Machine Learning – Simple Guide with Examples

Decision Tree is one of the most powerful and easy-to-understand algorithms in Machine Learning. It is widely used for both classification and regression problems. What makes a Decision Tree special is that it tries to mimic the human way of making decisions — breaking problems into smaller choices and reaching a conclusion step-by-step.

In this complete guide, you will learn everything about the Decision Tree algorithm:

  • What Decision Trees are and how they work
  • Splitting criteria: Gini impurity, Entropy, Information Gain
  • Decision Trees for regression
  • Advantages and disadvantages
  • Perfect beginner-friendly examples
  • Python implementation using scikit-learn
  • Real-world use cases

1️⃣ What is a Decision Tree?

A Decision Tree is a supervised machine learning algorithm used for both classification (categorical output) and regression (continuous output).

It works by continuously splitting the dataset based on feature values, forming a tree-like structure. Each internal node represents a decision, each branch represents an outcome, and each leaf node represents the final prediction.

Simple Example:

Suppose you want to decide whether a customer will buy a product. The tree may ask questions like:

  • Is age > 30?
  • If yes → Is income > 50,000?
  • If no → Young customer segment

The final decision is based on answers to these questions.

2️⃣ How a Decision Tree Works

A Decision Tree follows a simple but powerful process:

  • Step 1: Start with the entire dataset at the root node.
  • Step 2: Choose the best feature to split the data.
  • Step 3: Split the data into subsets.
  • Step 4: Repeat the process recursively.
  • Step 5: Stop when no further meaningful split is possible.

Stopping Conditions:

  • Maximum depth reached
  • Minimum samples in a node
  • No improvement in splitting criteria

3️⃣ Splitting Criteria Used in Decision Trees

📌 For Classification Problems:

1. Gini Impurity

Gini measures how “pure” or “impure” a node is. If all samples belong to the same class, impurity = 0.

2. Entropy

Entropy measures randomness in data. Lower entropy = better split.

3. Information Gain

Measures how much a split reduces entropy. Higher Information Gain = better split.

📌 For Regression Problems:

  • Mean Squared Error (MSE)
  • Mean Absolute Error (MAE)

A regression tree splits data to minimize prediction error for continuous values.

4️⃣ Example: Classification Using Decision Tree

Let’s take a simple classification problem:

Goal: Predict whether a customer will buy a product.

Features:

  • Age
  • Income

The Decision Tree may create splits like:

  • Is age > 30?
  • If yes → Is income > 50,000?
  • If no → Younger customers = lower probability

This sequence of decisions leads to the final classification: Buy (1) or Not Buy (0).

5️⃣ Advantages of Decision Tree

  • Very easy to understand and explain
  • Visual representation is intuitive
  • Requires little data preprocessing
  • Works with both numerical & categorical features
  • Handles multi-output problems

6️⃣ Disadvantages of Decision Tree

  • Highly prone to overfitting
  • Sensitive to small changes in data
  • Deep trees become complex and less interpretable
  • Biased towards features with many categories

👉 The solution to most limitations: Random Forest A combination of multiple Decision Trees (ensemble learning).

7️⃣ Python Example – Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier

# Sample dataset
X = [[25, 50000], [40, 60000], [35, 30000], [20, 20000]]
y = [1, 1, 0, 0]  # 1 = will buy, 0 = won’t buy

model = DecisionTreeClassifier()
model.fit(X, y)

print(model.predict([[30, 40000]]))  # Predict for new customer

This program trains a Decision Tree using two features (age, income) and predicts whether a new customer will buy the product.

8️⃣ Real-World Use Cases of Decision Trees

  • Customer churn prediction
  • Loan approval & credit scoring
  • Fraud detection in banking
  • Medical diagnosis
  • Sales and marketing forecasting
  • Recommendation systems

9️⃣ Summary of the Decision Tree Algorithm

  • Supervised algorithm for classification & regression
  • Splits data based on best criteria
  • Uses Gini, Entropy, or Information Gain
  • Very easy to visualize & interpret
  • Prone to overfitting → solved by Random Forest

💬 Tap ❤ for more ML algorithms!

📈 Join the CodeMyFYP Community

At CodeMyFYP, we help students learn Machine Learning, AI, Full-Stack Development, and build Final Year Projects with step-by-step guidance.

🌐 Website: www.codemyfyp.com
📞 Contact: 9483808379
📍 Location: Bengaluru, Karnataka
💼 Industry: IT Services & Consulting

🚀 Learn ML one algorithm at a time with CodeMyFYP!

Keywords: decision tree algorithm • gini impurity • entropy • information gain • classification tree • regression tree • machine learning decision tree • supervised learning algorithm • sklearn DecisionTreeClassifier • ML tutorial for beginners • CodeMyFYP

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.