The purpose of this project was to automate the manual task of standardizing the format of each individual address entries in a table based on a list of fully capitalized words found in an excel file, without compromising the naming of a few address exceptions.
1. capitalized_words.csv: a list of words that should be uppercase
2. switch_words.csv: a list of specific words that should be switched from "word" to "new"
∙ Checks if any word in string contains substring in capitalized_words list
∙ Checks position of word in string
∙ Removes word from source_str at pos
∙ Uppercase insert_str, then insert insert_str at pos of source_str
∙ Converts case for words containing multiple capitalized words &
∙ Converts case for words containing single capitalized words
This output a list of cleaned up or standardized words in a CSV file.
For more examples on these functions, check out the jupyter notebook here.
This python script reads in the table with pymysql and retrieves the target column for cleaning up. For each of the rows in this target column, we feed the value into the function from the clean up Python script. The new list of words is read in with Pandas and appended to the original CSV file. A new CSV file where the original words and new words differ is saved. To facilitate a quicker update with MySQL queries, each entry in the new words column is fed into a python file to automatically generate the `UPDATE` queries.