UrduNLP

Posts

Showing posts from 2020

Sentiment Analysis of Products and Services in Pakistan

This blog post is about the dataset and sentiment analysis of the products and services provided in Pakistan. It's the first step in sentiment analysis for manufacturing industry related reviews of people. It took me some time to build this dataset with the help of a few students. We have used the following products and services provided by the company for analysis. Let's begin with the implementation of SVM for sentiment analysis. Import necessary packages. import re import pickle import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import precision_recall_fscore_support , accuracy_score from sklearn import svm Read the dataset and convert it to a list for further processing. raw_data = pd.read_csv( '../data/products_sentiment_urdu.csv' ) raw_data.head() # check the size of the data and its class d...

Word Cloud for your Name in Urdu

Word cloud is a nice library for generating fun plots with names. Here is an example I've used to generate the cloud with my name. You need to install python-arabic-reshaper, word cloud, matplotlib, python-bidi and numpy libraries to generate the plots. You need to install the font NotoNaskhArabic-Regular.ttf ( https://github.com/frappe/fonts/blob/master/usr_share_fonts/noto/NotoNaskhArabic-Regular.ttf ). This example is used in macOS. import numpy as np import matplotlib.pyplot as plt from wordcloud import WordCloud from bidi.algorithm import get_display from arabic_reshaper import ArabicReshaper configuration = { "language": "Urdu" } reshaper = ArabicReshaper(configuration=configuration) text = reshaper.reshape("عرفان") text = get_display(text) x, y = np.ogrid[:300, :300] mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2 mask = 255 * mask.astype(int) wc = WordCloud(background_color="white",font_path='/Users/mirfan/Lib...

Urdu Baby Name Generation Using AI

Common Urdu Names. Text generation is an advanced field of AI. It uses state of the art techniques to generate texts using text corpus. You can generate books, poems, songs, and even research papers using this technique. How to generate short text like names? Well, you are in the right place. You can create the unique baby names in Urdu by following this tutorial. The first thing for this tutorial is to get the baby names, I've written a tutorial for scrapping the baby names from the website. Check it Baby Names . I've also created a Git repository urdu-baby-names for baby names, check it out. Let's start. First import libraries we are going to use: import numpy as np import pandas as pd from keras.callbacks import LambdaCallback from keras.layers import LSTM, Dense from keras.models import Sequential Read the names file, extract the characters and indices to dictionaries of every character in names. I'm using boys_names.csv for this tutorial. You can u...

Building Conversational Chatbot for Urdu language.

There are no solutions available yet for building conversational chatbots for the Urdu language. Keeping that in mind I've contributed to the SpaCy library for this purpose. As you know Rasa uses spaCy pipeline for building a chatbot, So I've built a model Urdu Model that I will be using for this chatbot. I've made the model publicly available to be used for chatbots. Install Rasa X: This is the version I've used for building the chatbot. Rasa is being updated rapidly so you may find new versions while reading this post. virtualenv -p python3.6 .rasa source .rasa/bin/activate pip3 install rasa-x==0.26.1 --extra-index-url https://pypi.rasa.com/simple Install Urdu Model: You need to download and install the spacy model I've built and shared here Ur Model . pip3 install ur_model-0.0.0.tar.gz Initialize the rasa project run the following command to initialize the project. python -m rasa init It will create the skeleton of the rasa chatbot with...

Urdu Sentiment Classification

Sentiment Analysis is a classic task in NLP. There is a lot of research done for different languages but not in the Urdu language. Although there are some papers available for sentiment analysis on Urdu, but not data or source code is provided to reproduce the results. This is my first attempt to run logic regression on the Urdu dataset. Let's start coding for logistic regression. I'm using Urdu Corpus V1 for this tutorial. Here is what data looks. # load data and take a quick look import re import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer raw_data = pd.read_csv('../data/sentiment_urdu.csv') raw_data.head(5) Original datasets have 3 classes but one class has only 20 records. I've removed it because it will cause some issues for classification. Here it is what the bar chat looks like with three classes. # check the ...