What is SQL?
If you are aspiring to become a data analyst or just interested in data analytics, with the amount of data being stored nowadays you will want to start looking beyond Excel and Google Sheets and will come across SQL (short for Structured Query Language, you can say “sequel” or spell the letters out “S-Q-L”). SQL is a programming language that was designed and developed specifically for managing data stored in relational database management systems (RDBMS for short), which are used for managing databases and related tables of data. SQL is easy to learn and fun to use. Here I will give you an intro to the language and how you can start using it to query data in your own projects.
Why SQL?
For one, while Excel and Google Sheets can technically handle one million rows, it can slow down in the thousands. That’s all good for small projects and analyses with small amounts of data. But nowadays businesses are having to store large volumes of data, too large and complex for traditional data-processing software. Now we have to be able to work with data sets/tables of gigabytes, terabytes, and even more of data. Aside from the volume, another factor is speed and availability. Data in databases can be updated continuously and you may want to analyze current data, think streaming.
When you work with a spreadsheet in Excel or Sheets you are working with the underlying data meaning any changes you make to it are changing the data. Working with the data directly, it is easy to introduce errors and your analysis will eventually also have errors. SQL separates your analysis from the data. The SQL interface you use will grab the data that you ask it for and will present you with a view of it, it is not changing the data. And because SQL is a script in plain text, you can save this as a file that you can reuse for other analyses or share with other analysts in your team so they can review what you did. Not having to change the data and being able to share your SQL scripts makes this workflow faster and safer for the business’ data.
Writing SQL
In SQL, the code you write is known as a query which is a set of instructions you use to retrieve a result and can save as a SQL file. Writing SQL is simple because it has an English-like syntax structure, and you can quickly get used to it. SQL queries require you to use keywords known as a “clause” to communicate what you want to do. Here are the basic clauses:
SELECT
– Tells SQL to start a query and which columns from the table to retrieve. You can even use this standalone to do simple math and SQL will return the result. So if you select columns and can perform simple math using select, you can imagine you can use SELECT
to create a column that doesn’t exist in the original table. For example, you can return a column that is the result of adding two columns from the data. The resulting column is known as a calculated field.
FROM
– Specifies the table you want to work with.
WHERE
– This keyword is your filter and it works on the row level, like creating a filter in Excel or Sheets but without changing the data.
LIMIT
– Another clause to begin practicing with SQL is the LIMIT clause which limits your result to a number that you specify. This is important and good practice to use because when you start with your analysis you might just want to have a quick look at the data and get acquainted with it. In most cases, when you first load your data you will want to only look at the first few rows to make sure your data is there and what it looks like without loading your entire table and using unnecessary computing power.
Example query:
Conclusion
SQL is user-friendly, easy to learn, and faster when you need to analyze large datasets. Once you get some practice with the syntax and functions it’s easy to shift to SQL for more complex analysis, and you can still use Excel and Sheets for small, simple analysis and for your visuals. You can simply copy the results from your SQL queries, paste them into Excel or Sheets, and create your charts to share with others.