Data Wrangling is the process of making raw data usable by removing flawed or unnecessary elements and putting it together according to the required format or structure to get it ready for use and further analysis. This is exactly what was needed in this project as the imported raw data was structured randomly in one column and needed to be transformed based on the requirements.
The raw data was sourced from a database where it was not possible to set exporting criteria and filter it down only for the necessary information. Additionally, the data was organized in one column, where the crucial rows were randomly located under labels such as PRICE, among other rows. Therefore Excel functions were not sufficient to turn this raw data set containing 120247 rows into predefined, organized tables. Furthermore, the project required a simple, one-click solution, where this process can be easily repeated on similar datasets.
To provide the quickest and easiest solution, without using additional software, Excel VBA was utilized for this project. With VBA, it was possible to loop through the dataset looking for criterias and only extract data that was necessary based on the requirements. The code analyzed the dataset row-by-row and extracted the data, organizing it into rows and columns into tables and separated spreadsheets. Turning 120247 rows into organized tables took only a few seconds for the code, saving time for the user.
The product numbers needed to be placed into rows with the product specifications in columns as the requirements for the organized tables. Additionally, it was required to leave the cells empty where the products did not have information and change the format according to the nature of the data whether it was string or number.
It was possible to turn a big dataset of raw data into tables and extract only the required data in a few seconds by utilizing VBA.