Computer Vision Class 10 Notes – The CBSE has changed the syllabus of Std. X. The new notes are made based on the new syllabus and based on the New CBSE textbook. All the important Information are taken from the Artificial Intelligence Class X Textbook Based on CBSE Board Pattern.

Computer Vision Class 10 Notes
Introduction to Computer Vision
Computer vision is a field of artificial intelligence (AI). AI enables computers to think, and computer vision enables AI to see, observe and make sense of visual data(like images & videos). Computer vision enables computers and systems to extract useful information from digital photos, videos, and other visual inputs and to execute actions or make recommendations based on that information.

Difference between computer vision and image prcessing?
Computer Vision | Image Processing |
---|---|
Computer vision deals with extracting information from the input images or videos to infer meaningful information and understanding them to predict the visual input | Image processing is mainly focused on processing the raw input images to enhance them or preparing them to do other tasks |
Computer Vision is a superset of Image Processing. | Image Processing is a subset of Computer Vision |
Examples – Object detection, Hand writing recognition etc. | Examples- Rescaling image, Correcting brightness, Changing tones, etc. |
How the computer vision work?
Computer vision analyzes visual data using complex algorithms. The computer vision algorithm broke the images into pixels and processed them using a machine learning technique and compared them with the dataset to find the pattern or objects.
Applications of Computer Vision
In the 1970s, computer vision as a concept was first introduced. Everyone was excited by the new uses for computer vision. However, a considerable technological advance in recent years has elevated computer vision to the top of many companies’ priority lists. Let’s examine a few of them:
- Facial Recognition – Security being the most important application involves use of Computer Vision for facial recognition. It can be either guest recognition or log maintenance of the visitors.
- Face Filters – Modern-day apps like Instagram and Snapchat have face filter applications, which capture the face using a camera and a computer vision algorithm that is able to identify the facial dynamics of the person.
- Google’s Search by Image – Google has an interesting feature of getting search results through an image. Computer vision takes the input image from the users, compares it with the database of images, and gives us the search result.
- Computer Vision in Retail – Retailers can use Computer Vision techniques to track customers’ movements through stores, analyze navigational routes and detect walking patterns.
- Self-Driving Cars – Computer vision is the fundamental technology behind developing autonomous vehicles. Most leading car manufacturers in the world are investing money in artificial intelligence for developing on-road versions of hands-free technology.
- Medical Imaging – Computer vision supported physicians. The medical image is used to read and convert 2D scan images into interactive 3D models that enable medical professionals to gain a detailed understanding of a patient’s health condition.
- Google Translate App – If anyone wants to read signs in a foreign language, point your phone’s camera at the words and let the Google Translate app tell you what it means in your preferred language almost instantly.
Understanding Computer Vision Concepts
1. Computer Vision Tasks
The various applications of Computer Vision are based on a certain number of tasks which are performed to get certain information from the input image which can be directly used for prediction or forms the base for further analysis. The tasks used in a computer vision application are:

- Step 1: Classification – Image Classification problem is the task of assigning an input image one label from a fixed set of categories. This is one of the core problems in CV that, despite its simplicity, has a large variety of practical applications.
- Step 2: Classification + Localisation – This is the task which involves both processes of identifying what object is present in the image and at the same time identifying at what location that object is present in that image. It is used only for single objects.
- Step 3: Object Detection – Object detection is the process of finding instances of real-world objects such as faces, bicycles, and buildings in images or videos. Object detection algorithms typically use extracted features and learning algorithms to recognize instances of an object category. It is commonly used in applications such as image retrieval and automated vehicle parking systems.
- Step 4: Instance Segmentation – Instance Segmentation is the process of detecting instances of the objects, giving them a category and then giving each pixel a label on the basis of that. A segmentation algorithm takes an image as input and outputs a collection of regions (or segments).
2. Basics of Images-Pixel, Resolution, Pixel value, grayscale and RGB images
1. Basics of Pixels
The word “pixel” means a picture element. Every photograph, in digital form, is made up of pixels. They are the smallest unit of information that make up a picture. Usually round or square, they are typically arranged in a 2-dimensional grid.

In the image above, one portion has been magnified many times over so that you can see its individual composition in pixels.
2. Resolution
The number of pixels in an image is sometimes called the resolution. When the term is used to describe pixel count, one convention is to express resolution as the width by the height, for example a monitor resolution of 1280×1024. This means there are 1280 pixels from one side to the other, and 1024 from top to bottom.
3. Pixel value
Each of the pixels that represents an image stored inside a computer has a pixel value which describes how bright that pixel is, and/or what colour it should be. The most common pixel format is the byte image, where this number is stored as an 8-bit integer giving a range of possible values from 0 to 255. Typically, zero is to be taken as no colour or black and 255 is taken to be full colour or white.
4. Grayscale Images
A grayscale image is a digital image that contains only shades of black, gray, and white. In a grayscale image, the highest, darkest shade is black, which has a zero value of pixels, and the lightest possible shade is white, which has 255 values of pixels.
Let us look at an image to understand about grayscale images.

Here is an example of a grayscale image. as you check, the value of pixels is within the range of 0 255.The computers store the images we see in the form of these numbers.
5. RGB Images
All the images that we see around are coloured images. These images are made up of three primary colours Red, Green and Blue.

Understanding Convolution operator
The convolution operator is a mathematical operation that combines two functions or signals to create a third. This convolution operation is used in image processing and digital signal processing. Convolution provides a way of `multiplying together’ two arrays of numbers, generally of different sizes, but of the same dimensionality, to produce a third array of numbers of the same dimensionality.
An (image) convolution is simply an element-wise multiplication of image arrays and another array called the kernel followed by sum.

As you can see here,
I = Image Array
K = Kernel Array
I * K = Resulting array after performing the convolution operator
Note: The Kernel is passed over the whole image to get the resulting array after convolution.
What is a Kernel?
A kernel is a matrix that transforms data into a higher-dimensional space to solve non-linear problems, or you can say that a kernel is a matrix that is slid across the image and multiplied with the input such that the output is enhanced in a certain desirable manner. Each kernel has a different value for the kinds of effects that we want to apply to an image.
- Convolution is a common tool used for image editing.
- It is an element wise multiplication of an image and a kernel to get the desired output.
- In computer vision application, it is used in Convolutional Neural Network (CNN) to extract image features.
In Image processing, we use the convolution operation to extract the features from the images which can le later used for further processing especially in Convolution Neural Network (CNN).
Let’s try
In this section we will try performing the convolution operator on paper to understand how it works.

Step 1: Let’s apply the kernel to the top-left 3×3 patch:

Step 2: Applying the Kernel
(−1×150) + (0×0) + (−1×255) + (0×100) + (−1×179) + (0×25) + (−1×155) + (0×146) + (−1×13)
= − 150 + 0 − 255 + 0 − 179 + 0 − 155 + 0 − 13
= − 752
The answer is -752 before min-max normalization
Convolutional Neural Network (CNN)
A Convolutional Neural Network (CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.
The process of deploying a CNN is as follows:

In the above diagram, we give an input image, which is then processed through a CNN and then gives prediction on the basis of the label given in the particular dataset.
The different layers of a Convolutional Neural Network (CNN) is as follows:

A convolutional neural network consists of the following layers:
- Convolution Layer
- Rectified linear Unit (ReLU)
- Pooling Layer
- Fully Connected Layer
Convolution Layer – A convolutional layer is the first layer and the main building block of CNN that extracts features from images. In the convolution layer, there are several kernels that are used to produce several features. The output of this layer is called the feature map. A feature map is also called the activation map. We can use these terms interchangeably.
There’s several uses we derive from the feature map:
- We reduce the image size so that it can be processed more efficiently.
- We only focus on the features of the image that can help us in processing the image further.

Rectified Linear Unit Function – After we get the feature map, it is then passed onto the ReLU layer. This layer simply gets rid of all the negative numbers in the feature map and lets the positive number stay as it is.

If we see the two graphs side by side, the one on the left is a linear graph. This graph when passed through the ReLU layer, gives the one on the right. The ReLU graph starts with a horizontal straight line and then increases linearly as it reaches a positive number.
Why do we pass the feature map to the ReLU layer?

As shown in the above convolved image, there is a smooth grey gradient change from black to white. After applying the ReLu function, we can see a more abrupt change in color which makes the edges more obvious which acts as a better feature for the further layers in a CNN as it enhances the activation layer.
Pooling Layer – Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size of the Convolved Feature while still retaining the important features.
There are two types of pooling which can be performed on an image.
- Max Pooling: Max Pooling returns the maximum value from the portion of the image covered by the Kernel.
- Average Pooling: Max Pooling returns the maximum value from the portion of the image covered by the Kernel.

The pooling layer is an important layer in the CNN as it performs a series of tasks which are as follows:
- Makes the image smaller and more manageable
- Makes the image more resistant to small transformations, distortions and translations in the input image.

Fully Connected Layer – The final layer in the CNN is the Fully Connected Layer (FCP). The objective of a fully connected layer is to take the results of the convolution/pooling process and use them to classify the image into a label (in a simple classification example). For example, if the image is of a cat, features representing things like whiskers or fur should have high probabilities for the label “cat”.

No-Code AI Tools
Introduction to Lobe
Lobe.ai is an Auto-ML tool, which means that it is a no-code AI tool, It works with image classification and allows a set of images with labels and will automatically find the most optimal model to classify the images.
Introduction to Teachable Machine
Teachable Machine is an AI, Machine Learning, and Deep Learning tool that was developed by Google in 2017. It runs on top of tensorflow.js which was also developed by the same company. It is a web-based tool that allows training of a model based on different images, audio, or poses given as input through webcam or pictures.
Orange Data Mining Tool
Orange is an open-source software of machine learning that helps to design based on a no-code or low-code framework. With the help of Orange software, you can design the data visualization, predictive modeling, and analysis of the data. The orange tool is easy to use and has a drag-and-drop interface, basically used in education, research, business, etc.
https://orangedatamining.com/download
Computer Vision: Use Case Walkthrough using Orange Data tool
Computer Vision project to build a real-world Classification Model: Coral Bleaching
What is the first step of AI project cycle?
Step-1 Problem scoping
- Coral bleaching happens when corals lose their vibrant colors and turn white.
- But there’s a lot more to it than that. The leading cause of coral bleaching is climate change.
- Coral bleaching matters because once these corals die, reefs rarely come back.
- With few corals surviving, they struggle to reproduce, and entire reef ecosystems, on which people and wildlife depend, deteriorate.
- Detecting bleaching of coral reefs at an early stage can prevent the world from disasters.

Discussions
- Do you think such projects help you inculcate awareness about global problems and think about building solutions to overcome them?
- Coral Bleaching will fall under which SDG? Give your comments
What comes after Problem Scoping?
Step-2 Data Acquisition
- This dataset was created for the research and experimental purposes of a manuscript titled “Bag of Features (BoF) Based Deep Learning Framework for Bleached Corals Detection”.

Step 2: Upload Dataset

What is the next step after Data Acquisition?
Step 3: Explore Dataset


What is the next step after Data Exploration?
Step 4: Build Model


After model building, next step is?
Step 5: Evaluate Model


Step 6: Prediction



Disclaimer: We have taken an effort to provide you with the accurate handout of “Computer Vision Class 10 Notes“. If you feel that there is any error or mistake, please contact me at anuraganand2017@gmail.com. The above CBSE study material present on our websites is for education purpose, not our copyrights. All the above content and Screenshot are taken from Artificial Intelligence Class 10 CBSE Textbook and Support Material which is present in CBSEACADEMIC website, This Textbook and Support Material are legally copyright by Central Board of Secondary Education. We are only providing a medium and helping the students to improve the performances in the examination.
Images shown above are the property of individual organizations and are used here for reference purposes only.
For more information, refer to the official CBSE textbooks available at cbseacademic.nic.in