Urdu Baby Name Generation Using AI

Common Urdu Names.

Text generation is an advanced field of AI. It uses state of the art techniques to generate texts using text corpus. You can generate books, poems, songs, and even research papers using this technique.

How to generate short text like names? Well, you are in the right place. You can create the unique baby names in Urdu by following this tutorial. The first thing for this tutorial is to get the baby names, I've written a tutorial for scrapping the baby names from the website. Check it Baby Names.
I've also created a Git repository urdu-baby-names for baby names, check it out.

Let's start.

First import libraries we are going to use:

import numpy as np
import pandas as pd
from keras.callbacks import LambdaCallback
from keras.layers import LSTM, Dense
from keras.models import Sequential

Read the names file, extract the characters and indices to dictionaries of every character in names. I'm using boys_names.csv for this tutorial. You can use girls_names.csv for girl names.

names = pd.read_csv("../data/boys_names.csv")
full_names = names["boys_names"].tolist()
full_names = list(map(lambda s: s + "۔", full_names))
chars = sorted(list(set(" ".join(full_names))))
print("total chars:", len(chars))
char_to_index = dict((c, i) for i, c in enumerate(chars))
index_to_char = dict((i, c) for i, c in enumerate(chars))

Now get the max length of name from names and max dimension.

max_char = len(max(full_names, key=len))

m = len(full_names)
char_dim = len(char_to_index)

Generate X and Y variables which will be used for storing the dataset values for training the model

X = np.zeros((m, max_char, char_dim))
Y = np.zeros((m, max_char, char_dim))

Feed X and Y with values.

for i in range(m):
    name = list(full_names[i])
    for j in range(len(name)):
        X[i, j, char_to_index[name[j]]] = 1
        if j < len(name) - 1:
            Y[i, j, char_to_index[name[j + 1]]] = 1

Build and train the model.

model = Sequential()
model.add(LSTM(128, input_shape=(max_char, char_dim), return_sequences=True))
model.add(LSTM(128, return_sequences=True))
model.add(Dense(char_dim, activation="softmax"))

model.compile(loss="categorical_crossentropy", optimizer="rmsprop")
model.fit(X, Y, batch_size=32, epochs=100, verbose=0)

After training the model, you can now generate names using this model by using the following function.

def generate_name(model):
    name = []
    x = np.zeros((1, max_char, char_dim))
    end = False
    i = 0

    while not end:
        probs = list(model.predict(x)[0, i])
        probs = probs / np.sum(probs)
        index = np.random.choice(range(char_dim), p=probs)
        if i == max_char - 2:
            character = "۔"
            end = True
        else:
            character = index_to_char[index]
        name.append(character)
        x[0, i + 1, index] = 1
        i += 1
        if character == "۔":
            end = True

    print(''.join(name))

for i in range(5):
    generate_name(model)

It will generate a unique name as well as names you see in the file. Choose the best name you like for your baby or suggest it to someone you want to.

UrduNLP

Search This Blog

Urdu Baby Name Generation Using AI

Comments

Post a Comment

Popular posts from this blog

Transformer Based QA System for Urdu

Text Summarization for Urdu: Part 1

Urdu News Classification