Text and Excel files play an important role in the management of organizations. They find their applications from inventory management to Day to day record-keeping activities. One of the most popular processes would be text parsing and record maintenance. Let’s consider a use-case where we use the Excel application to record and make sense of the data in a machine-generated Text file.
In this post, our input file is a text file that consists of around 2595 lines, we can assume that there are at least 250 records in the text file along with some unnecessary data.
Our task here is to identify each entry along with its relevant data and record it to the excel file. The manual execution of such a task for an employee could take at least 5 - 6 hours to finish the whole process. We also need to consider the inaccuracies, human error, the missed records, the time spent by a reviewer to check the data and the in-efficacies it causes on its dependent processes. To overcome these drawbacks, we can move to automation.
While we document how to execute this process, let’s benchmark the performance of UiPath automation. Of course, we realize there is more than one way of benchmarking. In this use case, we pick up two processes and compare their execution and also, benchmark their performance with different system configurations.
In the first process, let us take a file that contains 2595 lines where we have to extract 250 records. The primary challenge we need to deal with is unnecessary data which is present with the records.
These are computed generated data.
In the above image we must extract only account number, credit card number, CVV but ignore phone number, savings and unnecessary dashes in the file.
The extracted records should then be parsed into an excel file in their designated column. In order to do this, we will use regex to extract the text from each record and write it one by one into the cells present in the excel file. Using the same process, we will execute the file which contains 100k records.
In the second process, we will use UiPath datatables feature and parse the whole table in the excel sheet. We will execute this process with the 250 records and also with 100k records.
In iteration 1, first, we have to read the whole text from the file, iterate through each line and find the required words using regex. Generally, regex expressions are used to match and get only the required text.
For example, let us consider the string “phone number: 987654321”. In this string, we need only the integer after the text ”phone number ”. We can make use of (?<=phone number: )(.+) this expression to extract only the number.
Likewise, we can get the required text from a text file and parse it to excel. If the necessary word is found, write it in the specified cell. After extracting and writing the final word from a record, increment the row by assigning a counter. The counter gets incremented every time at the end of each record and the next record will be written in the next row. This process took around 11 minutes to execute completely.
Now let us create a file that has to contain 100k records. We can do that by appending the 250 records in the same file about 400times in a loop. This process takes only 25s in UiPath. Below there is a screenshot that shows the execution time and number in which specifies how many times the loop has been executed.
Now let us parse this file into an excel sheet. When we try to process the file with 100k records with the same method it will take about 63 hours 38 minutes. This process consumes a huge amount of time so we will see how to speed up this process in iteration 2.
In this iteration, we will be handling the execution time of the previous iteration with a different approach using a datatable feature. After reading the text from the text file, iterate through each line to check if the line is the start of the record. If the condition is true then set the counter to 1 and use regex to extract the necessary words. After extracting the required words from the record, we can use add row activity to create a row in the datatable along with the extracted data. Finally, it writes the whole data table into an excel sheet. This approach took only 10s to finish the whole process for 250 records, wherein the iteration 1 approach took 11 minutes to complete.
Let’s do the same approach with 100k records in a text file. Using this approach, the execution of the whole process took only around 3 minutes 7 seconds. Below is the screenshot of the execution process.
Instead of excel, we tried to write data into a CSV file for reducing the time. But CSV and Excel took the same time to write the data so we moved to excel for better supporting operations in the future.
Comparing Both Iterations With Their Execution Time and Ram Configurations
Graphs have been plotted based on the above two iterations with different RAM configurations:
The above graph shows the execution time with 250 records in 2gb RAM for both the iterations. Iteration1 approach took 867s whereas the iteration 2 approach took only 26s.
The above graph shows the execution time with 100k records in 2gb RAM for both the iterations. Iteration1 approach took 5780 minutes whereas the iteration 2 approach took only 7 minutes.
The above graph shows the execution time with 250 records in 4gb RAM for both the iterations. Iteration1 approach took 660s whereas the iteration 2 approach took only 10s.
The above graph shows the execution time with 100k records in 4gb RAM for both the iterations. The iteration1 approach took 3983 minutes whereas the iteration 2 approach took only 3 minutes.
The above graph shows the execution time with 250 records in 8gb RAM for both the iterations. The iteration1 approach took 467s whereas the iteration 2 approach took only 9s.
The above graph shows the execution time with 100k records in 8gb RAM for both the iterations. The iteration1 approach took 1577 minutes whereas the iteration 2 approach took only 1minute.
Performance Comparison Among 2gb, 4gb and 8gb Ram for Iteration 1, Iteration 2 and Manual Process
Before we saw the comparison between iteration 1 and iteration 2. Now we will compare the two iterations along with the time taken by the manual process.
For 2GB RAM
For 4GB RAM
For 8GB RAM
We can also reduce execution time even further by implementing multiple robots in orchestrator with the help of queues to process large files. We will see that in the next case study.
Happy automation :)