Reporting & Analysis
Jan 22, 2020
TLDR
Spreadsheets and databases serve different purposes. Spreadsheets are for quick data analysis, while databases handle large-scale data storage and connections between different data sources. Spreadsheets are easier for data entry but lack data integrity controls. Databases ensure data adheres to a standard schema. The choice depends on your needs, with databases suitable for scalability and spreadsheets for quick analysis. Airboxr helps with data sourcing and analysis in Google spreadsheets for users dealing with multiple databases.
Most people use the words "spreadsheet" and "database" interchangeablyβbut there are some pretty massive differences between the two.
If you have arrived on Airboxr, you likely know very well what a spreadsheet is: it's the table with a cool green ribbon on top we all love. Spreadsheets help you arrange your data in rows and columns, and run interesting analysis on them. A database, for the most part, is a collection of data organised in a standardised manner to help us retrieve and analyse them in a scalable fashion. It may seem like I just said the same thing twice: but think of databases as a way to organise the data for future analysis (which may or may not be done on a spreadsheet). Here we look at key aspects of the two to help identify which works best for you.
Analysis
Spreadsheets win for quick analytics.
Databases themselvesβwith a few exceptionsβdon't help you analyse data. Typically, you would use a programming language such as SQL to query the data stored in the database. SQL can be used to query multiple databases and combine output for decision-making. If you don't want to learn SQL, you can also use tools such as Metabase and Tableau to create your queries through a simple builder.
Most databases have a software application layer to enable user access to the data.
Spreadsheets are for quick analysis of data from a table (or multiple tables). You could run AVERAGEs, SUMs, VLOOKUPs etc. to manipulate the data on a spreadsheet for your needs.
Tldr; databases are typically for data storage at scale while spreadsheets are for quick analysis. Databases can hold immense amounts of data (even hundreds of millions of data points) without slowing down or breaking, while spreadsheets are more appropriate for analysis a few thousand datapoints.
Relational Data Management
Databases win for their ability to connect wide variety of data sources.
You often need to connect disparate pieces of data to make decisions. For instance, to look for a list of past orders from a specific customer, you may look at all the order numbers tagged to that customer's email address. Now, if you were to paste all the information about customers (customer names, order numbers, email addresses, IP addresses, phone numbers, mother's maiden name, and father's favourite colour) on to a spreadsheet, you will never be able to get anything done!
And that is why you have databases, which are relational by nature. A database will contain multiple tables with discreet data connected through a specific field (oftentimes called a key field). If you wish to find out the address for every customer who ordered product X, you can use a unique Customer ID as a key field to query your address table and your orders table to find the answer.
Data entry
Spreadsheets win for ease of entry... with some major associated risks.
You don't typically make changes right in the database, most databases are connected to a frontend (usually an ERP or CRM system) that helps users enter data through a structured frontend. While database managers or developers may still log into the database and make changes, those instances are few and far between.
You can define the type of data for every cell on a database.
On the other hand, entering or editing data on a spreadsheet is pretty straightforwardβjust position your cursor on a cell and start typing away. In theory (and unfortunately, much too often in practice), you could position your cursor on a date column and enter an email address leading to a lot of heartache down the road.
Improve your DTC game. Sign up for weekly tips.
Data integrity
Databases win by ensuring data meets a standard schema.
Remember the time your colleague misspelled a name on your spreadsheet and your VLOOKUPs wouldn't work anymore? Or the time you downloaded a CSV which broke the formatting for all dates? If data integrity is an issue, databases are for you. Databases ensure that every records of every table adheres to the right schema. For instance, if you want all your dates in MMDDYYYY format, databases will ensure that users are not able to enter DDMMYY dates. That way, you can be sure that the data meets a certain standard quality.
Users can define the schema for every record in a database.
In the end, you may simply need to decide whether to use spreadsheets or build a database based on budget and time constraints. If the trade-off in time and effort is not in favour of databases, you should simply make the best of spreadsheet features. If you are trying to build a scalable data system and anticipate dealing with tens of thousands of data points, you should put in the effort to build a database for your data. If you are new to creating databases, we suggest looking at Posticoβit has a simple GUI to help you through the process.
The customary plug: if you already use multiple databases (after all, every ERP has a database behind it), you should use Airboxr to source and analyse data on your Google spreadsheets.