Skip to main content

Building Conversational Chatbot for Urdu language.

There are no solutions available yet for building conversational chatbots for the Urdu language. Keeping that in mind
I've contributed to the SpaCy library for this purpose. As you know Rasa uses spaCy pipeline for building a chatbot,
So I've built a model Urdu Model that I will be using for this chatbot. I've made the model publicly available to be used
for chatbots.

Install Rasa X:
This is the version I've used for building the chatbot. Rasa is being updated rapidly so you may find
new versions while reading this post.
virtualenv -p python3.6 .rasa
source .rasa/bin/activate
pip3 install rasa-x==0.26.1 --extra-index-url https://pypi.rasa.com/simple
Install Urdu Model:
You need to download and install the spacy model I've built and shared here Ur Model.
pip3 install ur_model-0.0.0.tar.gz
Initialize the rasa project
run the following command to initialize the project.
python -m rasa init
It will create the skeleton of the rasa chatbot with the necessary files.

Dataset Creation:
Now you need to create the dataset according to the requirements of your chatbot. For example here is my NLU and STORIES format of data. Here are NLU.md file contents.
## intent:goodbye
- الّٰلہ حافظ
- پھر بات ہو گی
- خدا حافظ
- بعد میں بات ہو گی
- چلیں پھر بات ہو گی
- مجھے جانا ہے

## intent:query_knowledge_base
- کوئی اچھا سا [ریسٹورنٹ](restaurant) بتائیں؟
- کچھ [ریسٹورنٹس](restaurant) کے نام بتائیں؟
- کیا آپ اچھے [ریسٹورنٹس](restaurant) کے نام بتا سکتے ہیں؟
- کچھ اچھے [ریسٹورنٹس](restaurant) کے نام دکھائیں
- کچھ دیسی [ریسٹورنٹس](restaurant) بتائیں
- کوئی [چائنیز](cuisine) [ریسٹورنٹ](restaurant)؟
- کوئی [اٹالین](cuisine) [ریسٹورنٹ](restaurant)؟
- کوئی [اٹالین](cuisine)؟
- کوئی [چائنیز](cuisine)؟
- کچھ [ہوٹلز](hotel) کے نام بتائیں؟
- کوئی اچھا سا [ہوٹل](hotel) بتائیں؟
- کیا آپ اچھے [ہوٹلز](hotel) کے نام بتا سکتے ہیں؟
- کچھ اچھے [ہوٹلز](hotel) کے نام دکھائیں
- کوئی اچھا سا [ریسٹورنٹ](restaurant) بتائیں؟
- کچھ [ریسٹورنٹس](restaurant) کے نام بتائیں؟
- کیا آپ اچھے [ریسٹورنٹس](restaurant) کے نام بتا سکتے ہیں؟
- کچھ اچھے [ریسٹورنٹس](restaurant) کے نام دکھائیں
- کچھ دیسی [ریسٹورنٹس](restaurant) بتائیں
- کوئی [چائنیز](cuisine) [ریسٹورنٹ](restaurant)؟
- کوئی [اٹالین](cuisine) [ریسٹورنٹ](restaurant)؟
- کچھ [ہوٹلز](hotel) کے نام بتائیں؟
- کوئی اچھا سا [ہوٹل](hotel) بتائیں؟
- کیا آپ اچھے [ہوٹلز](hotel) کے نام بتا سکتے ہیں؟
- کچھ اچھے [ہوٹلز](hotel) کے نام دکھائیں

## intent:greet
- ہیلو
- ہیلو
- خدا حافظ
- گڈ مارننگ
- گڈ ایوننگ
- اسلام علیکم
- کوئی ہے
- سلام

## intent:bot_challenge
- کیا میں BOT ہوں؟
- کیا آپ انسان(Human) ہیں؟
- کیا میں ایک BOT سے بات کر رہا ہوں؟
- کیا میں ایک انسان(Human) سے بات کر رہا ہوں؟

## lookup:restaurant
- سپائس بازار
- دی لکھنوی
- انداز ریسٹورنٹ
- اربن کچن
- لاہور چٹخارہ
- واسابی
- حویلی ریسٹورنٹ

## lookup:hotel
- پی سی ہوٹل
- دی نشاط ہوٹل
- فیملی ہوٹل
- سٹپ ان ہوٹل
- لاہور پیلس ہوٹل

## synonym:hotel
- ہوٹلز
- ہوٹل

## synonym:restaurant
- ریسٹورنٹ
- ریسٹورنٹس
and here are STORIES.md file contents.
## Happy path 1
* greet
  - utter_greet
* query_knowledge_base
  - action_query_database
* goodbye
  - utter_goodbye

## Happy path 2
* greet
  - utter_greet
* query_knowledge_base
  - action_query_database
* goodbye
  - utter_goodbye

## Happy path 3
* greet
  - utter_greet
* query_knowledge_base
  - action_query_database
* goodbye
  - utter_goodbye

## Happy path 4
* greet
  - utter_greet
* query_knowledge_base
  - action_query_database
* query_knowledge_base
  - action_query_database
* goodbye
  - utter_goodbye

## Happy path 5
* greet
  - utter_greet
* query_knowledge_base
  - action_query_database
* query_knowledge_base
  - action_query_database
* goodbye
  - utter_goodbye

## Happy path 6
* greet
  - utter_greet
* query_knowledge_base
  - action_query_database
* query_knowledge_base
  - action_query_database
* goodbye
  - utter_goodbye

## Happy path 7
* greet
  - utter_greet
* query_knowledge_base
  - action_query_database
* query_knowledge_base
  - action_query_database
* goodbye
  - utter_goodbye

## Hello
* greet
- utter_greet

## Query Knowledge Base
* query_knowledge_base
- action_query_database

## Bye
* goodbye
- utter_goodbye

## bot challenge
* bot_challenge
  - utter_iamabot
Your config.py should look like this.
language: ur_model

pipeline:
  - name: SpacyNLP
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector

policies:
- name: MemoizationPolicy
  max_history: 5
- name: TEDPolicy
  epochs: 100
- name: MappingPolicy
Add these lines in you credentials.yml
socketio:
  user_message_evt: user_uttered
  bot_message_evt: bot_uttered
  session_persistence: true/false
And last but most important file to have data like this.

session_config:
  session_expiration_time: 0.0
  carry_over_slots_to_new_session: true
intents:
- greet
- goodbye
- query_knowledge_base
- bot_challenge
entities:
- hotel
- restaurant
- cuisine
slots:
  cuisine:
    type: text
  hotel:
    type: text
  restaurant:
    type: text
  results:
    type: list
responses:
  utter_greet:
  - text: جی فرمائیں
  - text: جی سر
  - text: میں آپ کی کیا مدد کر سکتا ہوں
  utter_goodbye:
  - text: خدا حافظ
  - text: دوبارہ ضرور آیئں
  utter_ask_rephrase:
  - text: میں آپکی بات نہیں سمجھ سکا،آپ دوبارہ بتائیں
  - text: سوری میں آپکی بات نہیں سمجھ پایا؟ دوبارہ بتائیں
  utter_iamabot:
  - text: میں ایک BOT ہوں
actions:
- utter_greet
- utter_goodbye
- utter_ask_rephrase
- utter_iamabot
- action_query_database
- action_query_cuisine
Train the Model:
To train your model you need to run this command to see the logs of the model being built.
python -m rasa train
if you encountered any errors, fix them before moving to the next step.
Test the Model:
Use the following command to run a browser session of rasa chatbot.
python -m rasa x

if you want to connect your chatbot with a website, here is the sample code of flask frontbot to test it. To connect to a website you need to use CORS policy to connect to your frontend chat widget. The frontbot app will look like this.


python -m rasa run --enable-api --cors "*"
I've some actions.py, custom actions for querying from the custom knowledge base.

from rasa_sdk import Action


def get_restaurants(restaurant=None, cuisine=None):
    restaurants = [
        {
            "id": 0,
            "name": "سپائس بازار",
            "cuisine": "دیسی",
            "price-range": "mid-range"
        },
        {
            "id": 1,
            "name": "دی لکھنوی",
            "cuisine": "دیسی",
            "price-range": "cheap"
        },
        {
            "id": 2,
            "name": "انداز ریسٹورنٹ",
            "cuisine": "English",
            "price-range": "mid-range"
        },
        {
            "id": 3,
            "name": "آر کیڈین کیفے",
            "cuisine": "اٹالین",
            "price-range": "cheap"
        },
        {
            "id": 4,
            "name": "پاستا لا وستا",
            "cuisine": "اٹالین",
            "price-range": "mid-range"
        },
        {
            "id": 5,
            "name": "لاہور چٹخارہ",
            "cuisine": "دیسی",
            "price-range": "mid-range"
        },
        {
            "id": 6,
            "name": "واسابی",
            "cuisine": "English",
            "price-range": "cheap"
        }
    ]
    if cuisine:
        return [restaurant for restaurant in restaurants if
                restaurant['cuisine'] == cuisine], "یہ ہیں کچھ مشہور {} ریسٹورنٹس".format(cuisine)
    else:
        return restaurants, "یہ ہیں کچھ مشہور ریسٹورنٹس"


def get_hotels():
    hotels = [
        {
            "id": 0,
            "name": "پی سی ہوٹل",
            "price-range": "expensive",
            "city": "Lahore",
            "star-rating": 5,

        },
        {
            "id": 1,
            "name": "دی نشاط ہوٹل",
            "price-range": "expensive",
            "city": "Lahore",
            "star-rating": 4,
        },
        {
            "id": 2,
            "name": "فیملی ہوٹل",
            "price-range": "mid-range",
            "city": "Lahore",
            "star-rating": 4,
        },
        {
            "id": 3,
            "name": "سٹپ ان ہوٹل",
            "price-range": "mid-range",
            "city": "Lahore",
            "star-rating": 4,
        },
        {
            "id": 4,
            "name": "لاہور پیلس ہوٹل",
            "price-range": "expensive",
            "city": "Lahore",
            "star-rating": 4,
        }
    ]
    return hotels, "یہ ہیں کچھ مشہور ہوٹلز"


class ActionQueryDatabase(Action):
    def name(self):
        return "action_query_database"

    def run(self, dispatcher, tracker, domain):
        restaurant = tracker.get_slot("restaurant")
        hotel = tracker.get_slot("hotel")
        cuisine = tracker.get_slot("cuisine")
        results = [{"name": "دی لکھنوی"}]
        message = ""
        if restaurant or cuisine:
            results, message = get_restaurants(restaurant=restaurant, cuisine=cuisine)
        elif hotel:
            results, message = get_hotels()
        dispatcher.utter_message(message)
        # limit to top 5 queries
        for i, obj in enumerate(results[:5]):
            dispatcher.utter_message(str(i + 1) + " - " + obj['name'])
        return []
To run custom actions, you need to execute the command.
python -m rasa run actions
and that's it. You have your own conversational Urdu chatbot. If you have any questions feel free to ask.
Here is the GitHub link Urdu Bot of the tutorial if you want to experiment feel free to create PR.

Comments

  1. which version of python did you use?

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. how to do it in roman urdu

    ReplyDelete
    Replies
    1. Build dataset in roman Urdu.

      Delete
    2. can you text me on +923448959905 Whatsapp i need some help

      Delete
    3. Very helpful blog. Could you please also share a solution for Roman Urdu ?

      Delete

Post a Comment

Popular posts from this blog

Text Summarization for Urdu: Part 1

 Text Summarization is an important task for large documents to get the idea of the document. There are two main summarization techniques used in NLP for text summarization. Extractive Text Summarization :  This approach's name is self-explanatory. Most important sentences or phrases are extracted from the original text and a short summary provided with these important sentences. See the figure for the explanation. Abstractive Text Summarization : This approach uses more advanced deep learning techniques to generate new sentences by learning from the original text. It is a complex task and requires heavy computing power such as GPU. Let's dive into the code for generating the text summary. I'm using Arabic as a parameter because the contributor did an excellent job of handling a lot of things like stemming, Urdu characters support, etc. from summa.summarizer import summarize text = """ اسلام آباد : صدر مملکت ڈاکٹر عارف علوی بھی کورونا وائرس کا شکار ہوگئے۔ سما...

Transformer Based QA System for Urdu

Question Answer Bot   The Question-Answer System is the latest trend in NLP.  There are currently two main techniques used for the Question-Answer system. 1 -  Open Domain: It is a wast land of NLP applications to build a QA system. A huge amount of data and text used to build such a system. I will write a blog post later about using the Open-Domain QA system. 2 - Closed Domain:  A closed domain question system is a narrow domain and strictly answers the questions which can be found in the domain. One example of a Closed Domain question system is a Knowledge-Based system. In this tutorial, I will explain the steps to build a Knowledge-Based QA system. Knowledge Base (KB) question answers are mostly used for FAQs. Where the user asks the questions and the model returns the best-matched answer based on the question. It's easy to implement and easy to integrate with chatbots and websites.  It is better to use the KB system for small datasets or narrow domains like...

Urdu Tokenization using SpaCy

SpaCy is an NLP library which supports many languages. It’s fast and has DNNs build in for performing many NLP tasks such as POS and NER. It has extensive support and good documentation. It is fast and provides GPU support and can be integrated with Tensorflow, PyTorch, Scikit-Learn, etc. SpaCy provides the easiest way to add any language support. A new language can be added by simply following Adding Languages article. I’ve added the Urdu language with dictionary-based lemmatization, lexical support and stop words( Urdu ). Here is how you can use the tokenizer for the Urdu language. First, install SpaCy . $ pip install spacy Now import spacy and create a blank object with support of Urdu language. I’m using blank because there is no proper model available for Urdu yet, but tokenization support available. import spacy nlp = spacy.blank('ur') doc = nlp(" کچھ ممالک ایسے بھی ہیں جہاں اس برس روزے کا دورانیہ 20 گھنٹے تک ہے۔") print("Urdu Tokeniza...