This is how data analysis will change in the 2020s.
I started my career as a management consultant in 2005. Back then, spreadsheets were the default software for data analysis. A decade and a half later, spreadsheets are still the default. The reason isn’t difficult to understand—they just work.
But data analysis has changed in multiple ways since the mid-2000s. Most prominently:
Data sets are large, very large. Whether you run an e-commerce store, marketing agency, or a recruiting company, you likely handle tens of thousands of data points. If you are in the top 30-percentile, you likely handle millions of data points in your analysis.
Data is dispersed. There was a time when you had multiple Excel files holding data and you could simply run vlookups to weave them together. In 2019, you have your data in different systems (because, millions!) and correlating the data from different sources is no longer a simple vlookup.
Data is no longer just numbers. Data today refers to images, music files, videos, and more. This increases the burden of processing and preparation of data before it can be analyzed.
Data is large, data is dispersed, and data is multi-format—and yet, the majority of our daily business still takes place on spreadsheets. Because spreadsheets are generally easy to learn, do the job well, and have a healthy support system (generally typing in “How to…<do something> …in Excel” on Google will solve any problem). Which brings me to my next point.
Spreadsheets are not going away.
Google Sheets and MSExcel are not going away. I will go a step farther and say new spreadsheet software will not easily replace GSheets and Excel either. I predict that existing spreadsheets will become more intelligent to handle the three complications above—whether Google and Microsoft intend for it to happen or not.
Data processing will be automated.
Data cleaning and preprocessing is a painful reality of working with spreadsheets. It is ridiculous that in 2019, we still have analysts spending most of their time downloading CSVs and fixing data and formatting. Machine learning is on its way to solving this problem and, although it isn’t there yet, we can expect a comprehensive solution coming through in the 2020s.
Some spreadsheet analysis will move to the Cloud.
As data sets get larger, it will be increasingly difficult to run massive calculations on your local machine. This will open up the market for more services that just make things faster. Something like Suhail Doshi’s MightyApp for faster Chrome.
Spreadsheets will be increasingly used for prediction, not just decision-making.
There is no way around this—decision-making is no longer prerogative of highly-paid business(wo)men in suits. Most decisions happen thousands of times in the background today: what product to show to a customer, which order to ship out first, which stock to buy, and which currency to hedge against. You can’t possibly use external software for every decision you make. Some spreadsheet-adjacent services, such as BigML are already trying to make prediction power accessible to spreadsheets.