During my research on Indeed for classifying the job types, I found an issue regarding multiple labels for the same job i.e. temporary and part-time are assigned to the same jobs. I used the following code to remove common descriptions of jobs with different labels.
import pandas as pd
part_time = pd.read_csv("part_time.csv", index_col=0)
temporary = pd.read_csv("temporary.csv", index_col=0)
# find common jobs using description column with isin() function.
# A intersection B
A = part_time[part_time.description.isin(temporary.description)]
# remove common elements from both part_time and temporary jobs.
# temporary - A
# part_time - A
temporary = temporary[~temporary.description.isin(A)]
part_time = part_time[~part_time.description.isin(A)]
# now concat these two data frames and save.
total = pd.concat([temporary, part_time])
total.to_csv("indeed_jobs.csv", index=False)
Hoping it will help those who have the same issue.
Comments
Post a Comment