These AI Class 10 Notes Chapter 5 Computer Vision Class 10 Notes simplify complex AI concepts for easy understanding.
Class 10 AI Computer Vision Notes
Applications of Computer Vision Class 10 Notes
The applications of computer vision are as follows
Facial Recognition
Face recognition using Artificial Intelligence (AI) is a computer vision technology. It is used to enable a system to detect, recognize and verify faces in digital images or videos. The technology has become increasingly popular in a wide variety of applications such as unlocking a smartphone, unlocking doors, passport authentication, security systems, medical applications and so on.
There are even models that can detect emotions from facial expressions.
Face Filters
Face filters are digital overlays that can be applied to photos or videos in real-time to alter or enhance the appearance of a person’s face. These filters can range from simple effects like adding glasses or hats to more complex transformations like changing facial features or applying makeup.
Face filters work by using computer vision algorithms to detect and track facial features such as eyes, nose, mouth, and contours.
Once these features are identified, the filter is applied to the appropriate areas of the face, adjusting in real-time as the person moves. Popular apps for face filters include Snapchat, Instagram, TikTok, and FaceApp. These apps offer a wide variety of filters for users to express their creativity and have fun with their selfies and videos.
Image Based Search
Image based search is now possible by Google Images (previously Google Image Search) .Google search is a search service owned by Google that allows users to search the World Wide Web for images. It was introduced on July 12, 2001.Unlike traditional image retrieval, this feature removes the need to type in keywords and terms into the Google search box. Instead, users search by submitting an image as their query.
Results may include similar images, web results, pages with the image, and different resolutions of the image.
Users can upload an image from their device or provide a URL linking to an image, and Google will search for similar or related images across the web. This feature can be useful for identifying objects, locations, or people in images, finding higher-resolution versions of images, or discovering visually similar content.
Computer Vision in Retail
Computer vision in retail refers to the utilization of artificial intelligence and image processing techniques to extract insights and enhance various aspects of retail operations.
Computer vision is transforming retail by giving machines the ability to see and understand the physical store environment. Imagine cameras that track inventory, analyze customer behavior, and even power cashierless shopping.
For example, Amazon Go stores use computer vision to automatically detect items a customer takes from shelves and adds to their virtual cart, eliminating the need for checkout lines. This technology also helps retailers understand customer traffic patterns, optimize store layout, and even deliver targeted advertising based on what products a customer lingers near. In short, computer vision is creating a smarter and more personalized shopping experience.
Self-Driving Cars
Self-driving cars also known as autonomous vehicles, rely on advanced computer vision. It is equipped with a multitude of sensors, including cameras, LiDAR, radar, and ultrasonic sensors, autonomous vehicles capture and analyse data about the environment, such as road markings, traffic signs, other vehicles, pedestrians, and obstacles for safety.
Medical Imaging
Computer vision plays a pivotal role in medical imaging by analysing medical images, including X-rays, CT scans, MRIs, and ultrasound images, computer vision algorithms assist healthcare professionals in detecting abnormalities, quantifying disease progression, identifying anatomical structures, and guiding interventions.
Google Translate App
The Google Translate app utilises computer vision technology, to enable real-time translation of text captured by the device’s camera. By analysing the visual content of images, such as signs, menus, or documents, Google Translate identifies and extracts text characters, which are then translated into the desired language.
Computer Vision Tasks Class 10 Notes
Although computer vision has been utilised in so many fields, there are a few common tasks for computer vision systems. These tasks are given below
Classification
Image classification enables computers to see an image and accurately classify it in the class in which it falls. It requires training a model to recognise patterns and features within images to assign them to the correct class. Computer vision understands classes and labels them.
For example, distinguishing between different types of animals in image.
Classification + Localisation
It involves both processes of identifying the object present in the image and at the same time identifying the location at which the object is present in that image. So, this combined task of classification and localisation means processing the input image to identify its category along with the localisation of the object in the image. For example, classifying the animal shown as a dog, and localisation would be showing its position.
Object Detection
Object detection is the task of locating and classifying multiple objects within an image i.e., identifying where the object is in the image. It goes beyond image classification by not only identifying objects but also drawing bounding boxes around them. For example, one dog, one cat and one duck can be easily detected and classified using the object detection technique.
Instance Segmentation
Segmentation identifies an object by dividing images of it into different regions based on the pixels seen.
Segmentation also simplifies an image, such as placing a shape or outline of an item to determine what it is.
By doing so, segmentation also recognizes if there is more than one object in an image or frame.
For example, if there are two animals in an image, instance segmentation will separately identify and highlight each animal, treating them as distinct instances.
Basics of Images Class 10 Notes
Images are two-dimensional representations of visual data. Images are made up of different elements which are as follows
Pixels
Pixels are the smallest units of a digital image and stands for “picture element”. They represent individual points of light and are typically arranged in a grid pattern. Each pixel is characterised by its position within the image and its color or intensity value.
Resolution
Resolution refers to the number of pixels contained in an image, typically expressed as width $\times$ height in pixels. High resolutions result in greater detail and clarity while lower resolutions may appear pixelated or blurry.
Pixel Value
Pixel value refers to the numerical representation of the colour or intensity of a pixel. To determine the pixel value of an image, you would need to specify whether you are referring to a grayscale or color image and then examine the intensity values of the pixels at the desired location in the image.
Grayscale Images
Grayscale images consist of pixels that contain only shades of grey, ranging from black to white. Each pixel has a single intensity value representing its brightness level. Grayscale images are often used for simplicity or when colour information is not necessary.
RGB Images
RGB (Red, Green, Blue) images consist of pixels that contain three primary aolour channels: i.e. red, green, and blue. Each channel specifies the intensity of its respective colour component, and the combination of these channels produces a wide range of colours.
In most RGB images, each color is represented by an 8 -bit integer, meaning each color channel have values ranging from 0 to 255 with 0 meaning zero light and 255 meaning maximum light.
Image Features Class 10 Notes
In computer vision, image features refer to distinctive patterns or regions within an image that are computationally extracted and used for various tasks such as object detection, recognition, matching, and analysis.
These features are essential for understanding the content of images and enabling machines to interpret visual data effectively. The features may vary from image to image.
OpenCV Class 10 AI Notes
OpenCV short for Open-Source Computer Vision Library, is an open-source library of programming functions mainly aimed at real-time computer vision tasks. It was developed by Intel. It provides a comprehensive set of tools and algorithms for various tasks such as image and video processing, object detection and recognition, feature extraction, camera calibration, and machine learning.
OpenCV supports multiple platforms, including Windows, Linux, macOS, Android, and iOS, making it suitable for a wide range of applications across different operating systems and hardware architectures.
Note OpenCV library is available for a variety of languages such as C. C++, Python ctc.
Installing OpenCV
OpenCV can be directly downloaded and installed with the use of pip (package manager). To install OpenCV, just go to the command line and type the following command:
pip install opencv-python
After installation, verify that OpenCV has been successfully installed by importing it in a Python script or in a Python interpreter session and checking the version:
import cv2
print(cv2._ _version_ _ )
Finally, test the installation by running some basic
OpenCV code to ensure that it works correctly.
Convolution Class 10 Notes
Convolution is a general purpose filter effect for images. It is a tool used for editing images. It involves applying a filter to an input image to extract features or perform operations such as blurring, sharpening, edge detection, etc.
It involves sliding the kernel over the input image and computing the element-wise multiplication between the kernel and the corresponding region of the image, followed by summing up the results to produce a single output value for each position.
Convolution Operator
The convolution operator involves three main components: the input image, the kernel (also known as a filter or mask), and the output feature map. Technically, convolution is defined as a simple Mathematical operation that multiplies two numeric arrays of the same dimensions but different sizes to produce a third numeric array of the same dimension.
At each position of the input image, the kernel is applied by performing element-wise multiplication between the kernel values and the pixel values of the image at that position. The resulting products are then summed up to obtain the output value for that position in the feature map.
KERNEL
Kernel refers to the learnable parameters that are convolved avith the input image to extract features. Kernels are small matrices typically with dimensions much smaller than the input image, and they are learned during the training process through backpropagation.
Each kernel specialises in detecting specific patterns or features within the input image, such as edges, textures, or shapes, depending on the task and the depth of the network.
In image processing, we use convolution operation to extract the features from the images which can be later used for further processing. In this process, we overlap the centre of the image with the centre of the kernel to obtain the convolution output. The output image becomes smaller as the overlapping is done at the edge row and the column of the image.
Convolution Neural Network (CNN)
Neural networks are computational models inspired by the structure and functionality of the human brain. They consist of interconnected nodes, or neurons, organised into layers. Each neuron receives input, processes it using an activation function, and passes the result to the neurons in the next layer. They are widely used in machine learning for tasks such as classification, regression, pattern recognition, and optimization.
A Convolution Neural Network (CNN) is a type of artificial neural network specifically designed for processing and analysing visual data, such as images and videos. CNNs consist of multiple layers, including convolutional layers, pooling layers, and fully connected layers, which work together to automatically learn hierarchical representations of input images.
Layers of CNN
The different layers of a Convolutional Neural Network (CNN) are:
Convolutional Layer
- Convolutional layers are the building blocks of CNNs, responsible for feature extraction through convolution operations.
- Each convolutional layer consists of multiple filters (also called kernels), which are small matrices applied to overlapping regions of the input volume.
- The filters learn to detect different features such as edges, textures, or shapes within the input data by performing convolution operations.
- After convolution, activation functions such as ReLU are applied to introduce non-linearity to the network, helping it learn complex patterns.
Rectified Linear Unit (ReLU)
This is the second layer of CNN. After we get the feature map, it is then passed onto the ReLU layer. This layer simply gets rid of all the negative numbers in the feature map and lets the positive number stay as it is.
The ReLU graph starts with a horizontal straight line and then increases linearly asit reaches a positive number.
Pooling Layer
- Pooling layers are used to reduce the spatial dimensions of feature maps while preserving important information.
- Pooling layers help to make the representations more invariant to small translations or distortions in the input, while also reducing the computational cost and the risk of overfitting.
There are two types of pooling
(i) Max pooling returns the maximum value from the portion of the image covered by the Kernel.
(ii) Average pooling returns the average of all the values from the portion of the image covered by the Kernel.
Fully Connected Layer
The fully connected layer plays a critical role in the final stages of a CNN, where it is responsible for classifying images based on the features extracted in the previous layers.
The term fully connected means that each neuron in one layer is connected to each neuron in the subsequent layer. Each input from the previous layer connects to each activation unit in the fully connected layer, enabling the CNN to simultaneously consider all features when making a final classification decision.
Each of these layers plays a crucial role in the overall architecture of a CNN, contributing to the network’s ability to learn hierarchical representations of input data and make accurate predictions for various tasks in computer vision.
Glossary:
- Object detection It involves identifying and locating specific objects within an image or video frame.
- Instance segmentation it is an extension of semantic segmentation that not only labels each pixel with a category but also
- distinguishes between individual object instances within the same category.
Resolution It refers to the number of pixels contained in an image, typically expressed as width × height in pixels. - OpenCV, It is an open-source library of programming functions mainly aimed at real-time computer vision tasks.
- Convolution It involves applying a filter or kernel to an input image to extract features or perform operations such as blurring, sharpening, edge detection, etc.
- The kernel refers to the learnable parameters that are convolved with the input image to extract features.
- Convolutional Neural Network (CNN) It is specifically designed for processing and analysing visual data, such. as images and videos.
The post Computer Vision Class 10 AI Notes appeared first on Learn CBSE.