# Difference between Classification and Clustering

Teachers and Examiners (CBSESkillEduction) collaborated to create the Difference between Classification and Clustering. All the important Information are taken from the NCERT Textbook Artificial Intelligence (417).

Contents

## Classification

We deal with classification issues almost every day. Here are a few compelling instances to show how classification issues are used frequently.

before starting classification first we have to ‘understand the Supervised learning –

Artificial intelligence (AI) can be created through the process of “supervised learning,” which involves training a computer algorithm on input data that has been labelled for a specific output.

Imagine receiving a basket full of several fruit varieties. The machine must now be trained by feeding it each different fruit one at a time, as shown below:

a. If shape of object is rounded with a depression at top and Red in colour, then it will be labeled as – Apple.
b. If shape of object is long curving cylinder and green in colour, then it will be labeled as – Banana.

Supervised learning is further classified into two categories of algorithms:

a. Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” or “disease” and “no disease”.
b. Regression: A regression problem is when the output variable is a real value, such as “INR” or “Kilograms”, “Fahrenheit” etc.

Difference between Classification and Clustering

### What is classification in Artificial Intelligence / Machine Learning (AI/ML)

Classification is the process of locating and classifying things or concepts into preset groups. Classification in data management permits the separation and sorting of data in accordance with predetermined needs for various business or personal objectives.

For example, if you reside in a gated community, there are specific trash cans for different types of waste, such as food, paper, and plastic. Here, you are essentially labelling each category after categorising the waste into various groups.

In the below picture, we are assigning the labels ‘paper’, ‘metal’, ‘plastic’, and so on to different types of waste.

#### Types of Classification Algorithm

Classification is a type of supervised learning. It labels the examples of input data and is best used when the output has finite and discrete values.

Examples of classification problems include:
a. Given an email, classify if it is spam or not.
b. Given a handwritten character, classify it as one of the known characters.
c. Given recent user behavior, classify as churn or not.

There are two main types of classification tasks that you may encounter, they are:

i) Binary Classification: Classification with only 2 distinct classes or with 2 possible outcomes

• Example: Male and Female
• Example: Classification of spam email and non-spam email
• Example: Results of an exam: pass/fail
• Example: Positive and Negative sentiment

ii) Multi Class Classification: Classification with more than two distinct classes.

• Example: classification of types of soil
• Example: classification of types of crops
• Example: classification of mood/feelings in songs/music
##### Binary Classification

Binary classification often involves two classes: one representing the normal state and the other the abnormal state. For example, the normal condition is “not spam,” while the abnormal state is “spam.” Another example is when a task involving a medical test has a normal condition of “cancer not identified” and an abnormal state of “cancer detected.”

The class for the normal state is assigned the class label 0 and the class with the abnormal state is assigned the class label 1.

Popular algorithms that can be used for binary classification include:
a. Logistic Regression
b. k-Nearest Neighbors
c. Decision Trees
d. Support Vector Machine

###### Logistic Regression

One of the binomial classification algorithms used to categorise observations into a finite set of classes is logistic regression. With binary data, where either an event occurs (1) or it doesn’t, logistic regression can be used (0).

As a result, given a feature x, an attempt is made to determine whether an event y occurs or not. So, y can only be either 0 or 1. x is given the value 1 in the scenario where the event occurs. Y is assigned a value of 0 if the event does not occur. For instance, if y stands for whether a sports team wins a game, then y will be 1 if they do or 0 if they lose.

Difference between Classification and Clustering

#### True positives, true negatives, false positives and false negatives

The effectiveness of a classification model, or classifier’s predictions, is evaluated using a matrix (NxN table) in the field of machine learning and artificial intelligence, where N is the number of target classes. The classification model’s predicted values are contrasted with the actual target values in the confusion matrix. This information reveals the classification model’s level of performance and the types of errors it is committing.

Let’s understand the matrix:

a. The target variable has two values: Positive or Negative
b. The columns represent the actual values of the target variable
c. The rows represent the predicted values of the target variable

But wait – what’s TP, FP, FN and TN here? That’s the point we have to understand in confusion matrix. Let’s understand each term below.

True Positive (TP)

a. The predicted value matches the actual value
b. The actual value was positive and classification model also predicts positive
c. There is no error

True Negative (TN)

a. The predicted value matches the actual value
b. The actual value was negative and classification model also forecasts negative
c. There is no error

False Positive (FP)

a. The predicted value doesn’t match the actual value
b. The actual value was negative but the model predicted a positive value
c. This is Type 1 Error

False Negative (FN)

a. The predicted value doesn’t match the actual value
b. The actual value was positive but the model predicted a negative value
c. This is Type 2 Error

Difference between Classification and Clustering

#### False Positive or False Negative in Medical Science

False positives in medical testing, and more generally in binary classification, occur when a test result incorrectly indicates the presence of a condition, such as a disease (the result is positive), when in reality it is not present. False negatives, on the other hand, occur when a test result incorrectly indicates the absence of a disease, when in fact it is present. These two types of errors can occur in a binary test.

Difference between Classification and Clustering

#### Practice exercise on simple binary classification models

A set of 1,000 test examples, of which 50% are negative, was used to evaluate a binary classifier. The classifier was discovered to have a 60% sensitivity and 70% accuracy. For this example, create the confusion matrix.

Undoubtedly one of the most well-known shipwrecks in history is the sinking of the Titanic. The RMS Titanic struck an iceberg and sank on April 15, 1912, while on her first voyage. Unfortunately, there were not enough lifeboats to accommodate everyone, and 1502 out of 2224 passengers and staff perished.

Even while survival required a certain amount of luck, it appears that some groups of people had a higher chance of living than others. You must use passenger data to create a predictive model that responds to the query: “What kinds of people were more likely to survive?” (i.e. name, age, gender, socioeconomic class, etc.).