Urdu Word and Sentence Similarity using SpaCy

The similarity is the common measure of understanding how much close two words or sentences are to
each other. There are multiple ways to find out the similarity of two documents and the most common being used in NLP is Cosine Similarity. Cosine Similarity is counted using vectors (word2vector) and provides information about how much two vectors are close in the context of orientation.

Some helpful links to understand the similarity concepts:

It mostly depends on the quality of the vectors of the documents. If you want to get better results, build a better word 2 vector model. To use the similarity feature of SpaCy, you need to build a language model (you can build a language model by following my article https://www.urdunlp.com/2019/08/how-to-build-urdu-language-model-in.html).

Here is how I've calculated the cosine similarity of the words and sentences.

import spacy

nlp = spacy.load('ur_model')

doc1 = nlp("عمران")
doc2 = nlp("عرفان")
print("Cosine Similarity of words.")

cosine_similarity = doc1.similarity(doc2)
print(cosine_similarity)

print("Cosine Similarity of sentences.")
doc3 = nlp("میں کھیلتا ہوں")
doc4 = nlp("میں کام کرتا ہوں")

cosine_similarity = doc3.similarity(doc4)
print(cosine_similarity)

If you have any questions, feel free to ask in comments.

Comments

Umair Arshad25 April 2021 at 14:27
Salam , is there any tutorials available for URDU_NLP in youtube
ReplyDelete
Replies
Arslan Ahmad15 February 2022 at 05:38
OSError: [E050] Can't find model 'ur_model'. It doesn't seem to be a Python package or a valid path to a data directory.
ReplyDelete
Replies
sameer17 February 2022 at 21:57
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Anonymous19 February 2022 at 16:57
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Taha Muzammil29 November 2022 at 22:46
I think by using above spacy module, cosine similarity gives high similarity results even two sentences are not much similar.
ReplyDelete
Replies
Muhammad Irfan8 December 2022 at 06:29
Yes, to get better results you need to train large model with vectors.
ReplyDelete
Replies

Add comment

UrduNLP

Search This Blog

Urdu Word and Sentence Similarity using SpaCy

Comments

Post a Comment

Popular posts from this blog

Text Summarization for Urdu: Part 1

Transformer Based QA System for Urdu

Urdu News Classification