Skip to main content

Posts

Showing posts from February, 2020

Urdu Sentiment Classification

Sentiment Analysis is a classic task in NLP. There is a lot of research done for different languages but not in the Urdu language. Although there are some papers available for sentiment analysis on Urdu, but not data or source code is provided to reproduce the results. This is my first attempt to run logic regression on the Urdu dataset. Let's start coding for logistic regression. I'm using Urdu Corpus V1 for this tutorial. Here is what data looks. # load data and take a quick look import re import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer raw_data = pd.read_csv('../data/sentiment_urdu.csv') raw_data.head(5) Original datasets have 3 classes but one class has only 20 records. I've removed it because it will cause some issues for classification. Here it is what the bar chat looks like with three classes. # check the