Quantcast
Channel: Learn CBSE
Viewing all articles
Browse latest Browse all 10128

Natural Language Processing Class 10 AI Notes

$
0
0

These AI Class 10 Notes Chapter 6 Natural Language Processing Class 10 Notes simplify complex AI concepts for easy understanding.

Class 10 AI Natural Language Processing Notes

Applications of NLP Class 10 Notes

Natural Language Processing has a wide range of applications across various industries and domains. Here are some detailed applications of NLP:

Sentiment Analysis

Sentiment analysis involves analysing text data to determine the sentiment or emotional tone expressed within it, such as positive, negative, or neutral.

Natural Language Processing Class 10 AI Notes 1

Customer feedback analysis customer reviews, social media posts, and survey responses to understand customer sentiment towards products or services.

Text Classification

Text classification involves categorising text documents into predefined categories or classes based on their content.

Natural Language Processing Class 10 AI Notes 2

For example, an application of text categorization is spam filtering in Email.

Text Summarisation

Text summarisation involves condensing large volumes of text into shorter summaries while preserving the most important information and key points.

Natural Language Processing Class 10 AI Notes 3

Its main goal is to simplify the process of going through vast amounts of data, such as scientific papers, news content or legal documentation.

Virtual Assistant

Virtual assistant also known as a digital assistant, is a software agent or a platform that can perform tasks and services for an individual and organization based on commands or questions.

Virtual assistant can be defined as a computer program that reproduces and analyze human dialogue (spoken or written), enabling users to communicate with electronic devices as if they were conversing with a live agent.

Natural Language Processing Class 10 AI Notes 4

Digital assistant works and interacts via three main methods:

  • Text including online chat, especially in an instant messaging application and other applications, short message service (SMS) text, E-mail, or other text-based communication channels.
  • Voice as in some of the most famous virtual assistant examples such as Google Assistant on Googleenabled/Android mobile devices, Amazon Alexa on the Amazon Echo device, and Siri on iPhone.
  • Images some assistants, such as Google Assistant (which includes Google Lens) and Bixby on the Samsung Galaxy series, have the added capability of performing image processing to recognize objects in images.

Natural Language Processing Class 10 AI Notes

Revisiting AI Project Cycle Class 10 Notes

Let’s us try to understand how we can develop a project in Natural Language Processing with the help of an example.

Scenario

The world is competitive nowadays. People face competition in every field where they are expected to give their best at every point in time. When people are unable to meet these expectations, they get stressed and go through depression due to reasons like study pressure, family pressure, work pressure etc.

Natural Language Processing Class 10 AI Notes 5

So, to overcome this, Cognitive Behavioural Therapy (CBT) is considered. This therapy includes understanding the behaviour and mindset of a person in their normal life. It is easy to implement on people and also gives good results.

Problem Scoping

CBT is a technique to cure patients out of stress and depression. But it has been observed that people do not wish to seek the help of a psychiatrist willingly. Thus, there is a need to bridge the gap between a person who needs help and the psychiatrist.

Let us look at the various factors creating this, problem. By using the 4 W ‘s canvas we will be able to identify and solve this problem.

Who Canvas : Who is having the problem?

Who are the stakeholders? People who suffer from stress.
What do you know about them? People who are going through stress are reluctant to consult a psychiatrist.

What Canvas : What is the nature of their problem?

What is the problem? People who need ‘nelp are reluctant to consult a psychiatrist.
How do you know that it is a problem? Studies around mental stress and depression.

Where Canvas : Where does the problent arise?

  • What is the situation in which the stakeholders experience this problem?
  • When they are going through a stressful period of time due to some unpleasant experiences.

Why Canvas: Why do you think it is a problem worth solving?

What would be of key values to the stakeholders? People get a platform where they can talk and vent out their feelings anonymously.
How would it improve their situation? People would be able to vent out their stress.
They would consider going to a psychiatrist whenever required.

Based on the 4 Ws canvas, the Problem Statement Template can be filled as follows

Our People undergoing stress Who?
Have a Problem of Not being able to share their feelings What?
While They need help in venting out their emotions Where?
An ideal solution would be Provide them a platform to share their thoughts anonymously and suggest help whenever required Why?

This Problem Statement Template has given clarity to the various factor:s around the problem and the goal can be now be stated as :
“To create a chatbot which can interact with people, help to vent out their feelings and take them through primitive CBT.”

Natural Language Processing Class 10 AI Notes

Data Acquisition

To understand the sentiments of people, we need to collect their conversational data so the machine can interpret the words that they use and understand their meaning. Such data can be collected from the various means:

Natural Language Processing Class 10 AI Notes 6

Data Exploration

Once the data is collected, it needs to be processed and cleaned so that an easier version can be sent to the machine. Thus, the text is normalised through various steps and is lowered to minimum vocabulary.

Modelling

Once the text has been normalised, it is then fed to an NLP based AI model. In NLP, modelling requires data pre-processing only after which data is fed to the machine. Depending upon the type of chatbot we try to make, there are a lot of AI models available which helps us build the foundation of our project.

Natural Language Processing Class 10 AI Notes

Evaluation

The model is now tested with the testing data. It is evaluated for the accuracy of the answers which the machine gives to the user’s response. The AI model is then evaluated and compared to see its efficiency.
When we see the results of model output, it can fall into the following three categories

Natural Language Processing Class 10 AI Notes 7

  • In the first diagram, the model’s output does not match the function at all. Such a model is said to be underfitting and its accuracy is lower.
  • In the second one, the model’s performance matches well with the true function which means that the model has optimum accuracy and it is called perfect fit.
  • In the third case, model performance is trying to cover all the data points even if they are out of sync with the true function.

This model is said to be overfitting and it has a lower accuracy as well.

Chatbots Class 10 Notes

Chatbot is a computer program that simulates and processes human conversation (either written or spoken), allowing humans to interact with digital devices as if they were communicating with a real person. Chatbots can be as simple as rudimentary programs that answer a simple query with a single-line response, or as sophisticated as digital assistants that learn and evolve to deliver increasing levels of personalization as they gather and process information.

Mitsuku Bot

Mitsuku is a conversational chatbot developed by Steve Worswick in 2005. It is designed to engage in natural language conversations with users and has won the Loebner Prize Turing Test five times.

Natural Language Processing Class 10 AI Notes 8

URL: hittps://www.pandorabots.com/mitsuku/
Cleverbot Cleverbot developed by British AI scientist Rollo Carpenter and launched in October 2008. It uses an algorithm based on user interactions to generate responses. It learns from conversationls with users and attempts to mimic human-like conversation patterns.

Natural Language Processing Class 10 AI Notes 9

URL: https://www.cleverbot.com/
Jabberwacky Jabberwacky is developed by British programmer Rollo Carpenter in the 1980 s and was launched on the web in 1997. It was the first chatbot that tried to incorporate voice interaction. Jabberwacky won the Loebner prize two times in 2005 and 2006.

Natural Language Processing Class 10 AI Notes 10

URL: http://www.jabberwacky.com/
Haptik Haptik is an Indian enterprise conversational AI platform that provides virtual assistant services through chatbots.

Natural Language Processing Class 10 AI Notes 11

URL: https://www.haptik.ai/
Rose: Rose is an AI chatbot developed by Bruce ‘Wilcox. It is designed to engage in natural language conversations with users and has won the Loebner Prize Turing Test multiple times. Rose uses pattern matching and scripted responses to simulate human-like conversation

Natural Language Processing Class 10 AI Notes 12

URL: http://ec2-54-215-197-164.us-west-1.comıpute.amazonaws. com/speech.php
Ochatbot Ochatbot is a conversational AI chatbot that can handle a very wide range of tasks such as customer support, enhanced Leadbot generation, e-com merce and more.

Natural Language Processing Class 10 AI Notes 13

URL: https://www.ochatbot.com/’

Types of Chatbots

We can state that there are two types of chatbots around us which are as follows

Script Bot

  • Script bots follow predefined scripts and decision trees to provide responses to user inputs.
  • They are limited to the specific scenarios and interactions programmed into their scripts.
  • Script bots are generally less flexible and adaptable compared to smart bots.
  • Examples: customer support chatbots and FAQ bots.

Natural Language Processing Class 10 AI Notes

Smart Bot

  • Smart bots utilize artificial intelligence (AI) and Natural language Processing (NLP) to understand and respond to user inputs in a more dynamic and intelligent manner.
  • They can interpret natural language, understand context, and generate adaptive responses based on the conversation flow.
  • Smart bots are capable of learning and improving over time through machine learning algorithms and continuous feedback.
  • Examples : virtual assistants like Siri, Google Assistant and advanced customer service chatbots.

Human Language Versus Computer Language Class 10 Notes

Human language and computer language serve distinct purposes and operate on different principles. Here’s a comparison between the human language and computer language:

Criteria Human Language Computer Language
Purpose It is primarily used for communication among people. It is rich in context, and emotion, allowing for complex expression and interpretation. It is also known as programming language, is used to instruct computers to perform specific tasks.
Syntax and Structure Human languages have complex grammar rules, vocabulary, and syntax. They often involve ambiguity and can be interpreted differently based on context and cultural factors. Computer languages have strict syntax rules and follow a predefined structure. They consist of symbols, keywords, and operators that must be used in a precise manner.
Flexibility They are highly flexible and adaptable, allowing for creative expression and evolution over time. They can convey abstract concepts and emotions. Computer languages are less flexible and more rigid in their structure. They are designed to be unambiguous and deterministic.
Ambiguity Human languages often contain ambiguity, multiple meanings, and interpretation challenges. Computer languages aim to minimize ambiguity to ensure precise execution of commands.
Learning Context, tone, and non-verbal cues play a significant role in resolving ambiguity. Ambiguous statements in programming languages can lead to errors in the program.

Difficulties a Machine would Face in Processing Natural Language Class 10 Notes

Arrangement of Words and Meanings

There are some rules that provide structure to human language. These languages include nouns, verbs, adverbs, adjectives. The computer has to identify the different parts of a speech. Also, it may be extremely difficult for a computer to understand the meaning behind the language we use.

Analogy with Programming Language

In any programming language we come across few statements where we have

Different syntax, same semantics
print (10+20) # is same as print (20+10)

Same syntax, different semantics

print (4/2) # in Python (2.7)
print (4/2) # in Python (3.0)

Here the statements written have the same syntax but their meanings are different. In Python (2.7), this statement would result in 2 while in Python 3, it would give an output of 2.0.

Natural Language Processing Class 10 AI Notes

Perfect Syntax, No Meaning

Sometimes, a statement can have a perfectly correct syntax but it does not mean anything.
e.g. Fruit flies like a banana; however, fruit flies like a ripe banana. This statement is correct grammatically but does not have any meaning.

Multiple Meanings of a Word

Words and phrases often have multiple meanings depending on context. For example, the word “bank” can refer to a financial institution or the side of a river. Resolving this ambiguity requires understanding the context in which the word is used.

Data Processing

Data processing in Natural Language Processing (NLP) involves transforming raw text data into a format that can be utilised by machine learning models or other algorithms to extract insights, perform analysis, or generate language-based outputs. The first step to it is text normalisation.

Text Normalisation It is a pre-processing step aimed at improving the quality of the text and making it suitable for machines to process. Text normalisation involves standardizing and organising text data to ensure uniformity and consistency across different documents or datasets.

Natural Language Processing Class 10 AI Notes 14

There are various steps in text normalisation as follows :

(i) Sentence Segmentation Sentence segmentation is the process of dividing a whole text into individual sentences. Sentence segmentation is crucial for various NLP tasks, including machine translation, text summarisation, and sentiment analysis, as it provides a foundational unit for analysis and processing.

Natural Language Processing Class 10 AI Notes 15

(ii) Tokenization Tokenization is the process of breaking down a piece of text into smaller units, such as words, phrases, symbols, or other meaningful elements. Tokenisation is a fundamental step in various NLP tasks, enabling further analysis and processing of text data. It serves as the foundation for tasks like part-of-speech tagging, named entity recognition, and sentiment analysis.

Natural Language Processing Class 10 AI Notes 16

(iii) Removing Stopwords, Special Characters and Numbers By performing these steps, we can clean and prepare text data for further analysis or processing in natural language processing tasks, ensuring that only relevant information is retained.
Stopwords Stopwords are common words that often do not carry significant meaning in a sentence.
Examples of stopwords: “the”, “is”, “and”, “to”, “in”, “a”, “of”, “that”, “on”.
Special Characters Special characters are symbols or punctuation marks that are not letters or numbers.
Examples of special characters: “.”, “,”, “!”, “?”, “(“, “)”, “-“, “$”, “%”.
Numbers Numbers are numerical digits or sequences of digits.
Examples of numbers: “123”, “3.14”, “2024”, “one”, “two”, “three”.

Natural Language Processing Class 10 AI Notes

(iv) Converting Text to a Common Case Converting text to a common case typically involves transforming all letters to either uppercase or lowercase to ensure consistency. The choice between uppercase and lowercase depends on the specific requirements of the application or analysis, but we prefer lowercase.

(v) Stemming It is the process of reducing a word to its base or root form.
For example, the stem of the word “running” is “run,” and the stem of the word “swimming” is “swim.” Stemming is often used in natural language processing tasks to standardise the text and improve the performance of algorithms.

(vi) Lemmatization It is the conversion of a word to its base form or lemma. This differs from stemming, which takes a word down to its root form by removing its prefixes and suffixes.
For example, the words “was,” “is,” and “will be” can all be lemmatized to the word “be.” Similarly, the words “better” apd “best” can be lemmatized to the word “good.”
Lemmatization reduces the text to its root, making it easier to find keywords.
The difference between stemming and lemmatization can be summarised by this example

Natural Language Processing Class 10 AI Notes 17

Bag of Words

The Bag of Words (BoW) model is a simple and widely used technique in Natural Language Processing (NLP) for representing text data. The Bag of Words model represents text data-as a collection of unique words, ignoring grammar and word order.

The Bag of Words model provides a simple yet effective representation of text data, allowing for quantitative analysis and machine learning applications. However, it’s important to be aware of its limitations and consider enhancements or alternative models for text processing tasks.

Natural Language Processing Class 10 AI Notes 18

This image gives us a brief overview about how bag of words works. Let us assume that the text on the left in this image is the normalised corpus (a large set of text) which

we have got after going through all the steps of text processing. Now, as we put this text into the bag of words algorithm, the algorithm returns to us the unique words out of the corpus and their occurrence in it.

You can see at the right; it shows us a list of words appearing in the corpus and the numbers corresponding to it shows how many times the word has occurred in the text body. Thus, we can say that the bag of words gives us two things.
1. A vocabulary of words for the corpus.
2. The frequency of these words in the corpus.

Natural Language Processing Class 10 AI Notes

The step-by-step approach to implement the Bag of Words algorithm as under :

  • Text Normalisation Collect the data and pre-process it
  • Create Dictionary Make a list of all the unique words occurring in the corpus
  • Create Document Vectors For each document in the corpus, find out how many times the word from the unique list of words has occurred.
  • Create document vectors for all the documents.

Example:
Step 1 Collect data and pre-process it

Let’s consider a simple corpus of two documents:

  • Document 1 “The quick brown fox”
  • Document 2 “The lazy dog jumps over the lazy fox”

After text normalisation, the text becomes

  • Document 1 [The, quick, brown, fox]
  • Document 2 [The, lazy, dog, jumps, over, the, lazy, fox]

Step 2 Create Dictionary

Go through all the steps and create a dictionary i.e., list down all the words which occur in all these documents.
Dictionary

Natural Language Processing Class 10 AI Notes 19

There is a vocabulary of 8 words from the corpus containing 12 words. This is because some words though repeated in the documents are written just once while creating the dictionary.

Step 3 Create Document Vectors

In this step, the vocabulary is written in the first’column. In the second column for each word in the document, if it matches the vocabulary, put a 1 in front of it and if the word appears againincrease the previous value by 1. If the word does not occut in that document put a 0 under it.

Document 1 Document 2
The 1 2
quick 1 0
brown 1 0
fox 1 1
lazy 0 2
dog 0 1
jumps 0 1
over 0 1

Observe the posit on of 0 s and 1 s in the table. This gives the document vecor table for our corpus. The tokens however have stilnot been converted to numbers which lead to the final ssp of the algorithm: TF-IDF.

Natural Language Processing Class 10 AI Notes

TF-IDF Term Frequency & Inverse Document Frequency

Term Frequency-Inverse Document Frequency (TF-IDF) is a numerical statstic used in information retrieval and text mining to evalust the importance of a word in a document relative to a collstion of documents.

Term Frequenc (TF)

Term Frequencymeasures the frequency of a term (word) in a document.
It’s calculated as he ratio of the number of times a term appears in a doament to the total number of terms in the document.

Inverse Docurent Frequency (IDF)

  • Inverse Docunent Frequency measures the importance of a term acrss a collection of documents.
  • It’s calculatedas the logarithm of the ratio of the total number of dxuments to the number of documents containing the term, adjusted by 1 to avoid division by zero for terns that appear in all documents.

Natural Language Processing Class 10 AI Notes 20

Example

  • Total numbr of documents in the collection: 10,000
  • Number of socuments containing “brown”: 3
  • Inverse Dociment Frequency of “brown”: IDF

Natural Language Processing Class 10 AI Notes 21

Applications of TFIDF are as follows

  • Document Classification Helps in classifying the type and genre of a document.
  • Topic Modelling It helps in predicting the topic for a corpus.
  • Information Retrieval System To extract the important information out of a corpus.
  • Stop word filtering Helps in removing the unnecessary words out of a text body

Natural Language Toolkit

Natural Language Toolkit (NLTK) is a platform used for building python programs that work with human language data for applying in statistical natural language processing. NLTK is a comprehensive library for NLP tasks, including tokenisation, stemming, lemmatisation, part-of-speech tagging, parsing, and semantic reasoning.

NLTK is widely used in academic and industry for research, education, and development in the field of NLP.

NLTK offers a wealth of functionality and resources for NLP tasks, making it an indispensable tool for anyone working in the field of natural language processing.

Installation

You can install NLTK using pip:
splp install milk

Additionally, you need to download NLTK data using:
import nitk
nltk, download()

Natural Language Processing Class 10 AI Notes

Glossary :

  • Text classification it involves categorising text documents into predefined categories or classes based on their content.
  • Text summarisation It involves condensing large volumes of text into shorter summaries while preserving the most important information and key points.
  • Tokenisation It is the process of breaking down a piece of text into smaller units, such as words, phrases, symbols, or other meaningful elements.
  • Stemming it is the process of reducing words to their root or base form. It involves removing suffixes or prefixes from words to obtain the core meaning.

The post Natural Language Processing Class 10 AI Notes appeared first on Learn CBSE.


Viewing all articles
Browse latest Browse all 10128

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>