Can Excel or Google spreadsheets handle really large amounts of data?


by

Pexels

Spreadsheet software is often used to compile accounting data, build large data sets by machine learning, and categorize a large number of numbers. A research team led by computer scientists Aditya Parameswaran and others announced that Microsoft Excel, Google spreadsheets, and open source free spreadsheet software LibreOffice Calc have benchmarked. Computer research scientist Adrian Colyer explains the results of this research in an easy-to-understand manner.

Benchmarking Spreadsheet Systems
https://people.eecs.berkeley.edu/~adityagp/papers/spreadsheet_bench.pdf


Benchmarking spreadsheet systems – the morning paper
https://blog.acolyer.org/2019/12/06/benchmarking-spreadsheet-systems/


As technology advances, spreadsheets such as Google Spread Seed and Excel are said to support more than 5 million cells. However, Mr. Colyer points out that 'actually, spreadsheet software becomes quite heavy when tens of thousands of lines are read, and freezes occur frequently and cannot be put into practical use.'

The research team of Parameswaran showed this in research. The researchers conducted a benchmark test using a 50,000 row x 17 column spreadsheet file containing weather data as a sample.

First, the research team prepared spreadsheets with various numbers of rows from 150 to 490,000 from the sample, and performed various operations to check the load time for loading, updating, and querying.

The graph below on the left summarizes Excel and LibreOffice Calc, and the graph on the right summarizes Google spreadsheet delays. The horizontal axis is the number of sample rows, and the vertical axis is the number of seconds it takes to read. F is for reading formula cells and value cells, and V is for reading only value cells. Excel works relatively stably, but reading time is almost doubled when cells are read.



In the case of LibreOffice Calc, the reading time did not change depending on the cell, but you can see from the graph above that the operation is considerably heavy compared to Excel. Also, although Google spreadsheets vary, reading formula cells takes a lot of time.

Colyer evaluates that “Excel is 6000 lines, LibreOffice Calc and Google spreadsheets are over 150 lines, and it is delayed at a level that can not be put to practical use.”

And the graph when narrowing down with the filter function is as follows. When you use Excel to narrow down a sheet that contains formula cells and value cells, you can see that the process took an unusually long time. In addition, when Google Sheet exceeded 20,000 lines, processing took 1.5 seconds, and when it exceeded 50,000 lines, it was found that it waited nearly 5 seconds.



Also, when using the VLOOKUP function, LibreOffice Calc and Google Spreadsheet scan the entire spreadsheet, and it seems that there was no action to stop when a value is found. As a result, Colyer says that working with LibreOffice Calc and Google spreadsheets can be quite inefficient when the data is sorted.

Overall, Colyer argues that the various types of spreadsheet software are not able to process large data sets efficiently, “The benchmark results are the starting point for contributing to the optimization of spreadsheet software. It will be. '

in Software, Posted by log1i_yk