How to get started with SQL: First 4 clauses


What is SQL?

If you are aspiring to become a data analyst or just interested in data analytics, with the amount of data being stored nowadays you will want to start looking beyond Excel and Google Sheets and will come across SQL (short for Structured Query Language, you can say “sequel” or spell the letters out “S-Q-L”). SQL is a programming language that was designed and developed specifically for managing data stored in relational database management systems (RDBMS for short), which are used for managing databases and related tables of data. SQL is easy to learn and fun to use. Here I will give you an intro to the language and how you can start using it to query data in your own projects.

Why SQL?

For one, while Excel and Google Sheets can technically handle one million rows, it can slow down in the thousands. That’s all good for small projects and analyses with small amounts of data. But nowadays businesses are having to store large volumes of data, too large and complex for traditional data-processing software. Now we have to be able to work with data sets/tables of gigabytes, terabytes, and even more of data. Aside from the volume, another factor is speed and availability. Data in databases can be updated continuously and you may want to analyze current data, think streaming.

When you work with a spreadsheet in Excel or Sheets you are working with the underlying data meaning any changes you make to it are changing the data. Working with the data directly, it is easy to introduce errors and your analysis will eventually also have errors. SQL separates your analysis from the data. The SQL interface you use will grab the data that you ask it for and will present you with a view of it, it is not changing the data. And because SQL is a script in plain text, you can save this as a file that you can reuse for other analyses or share with other analysts in your team so they can review what you did. Not having to change the data and being able to share your SQL scripts makes this workflow faster and safer for the business’ data.

Writing SQL

In SQL, the code you write is known as a query which is a set of instructions you use to retrieve a result and can save as a SQL file. Writing SQL is simple because it has an English-like syntax structure, and you can quickly get used to it. SQL queries require you to use keywords known as a “clause” to communicate what you want to do. Here are the basic clauses:

SELECT – Tells SQL to start a query and which columns from the table to retrieve. You can even use this standalone to do simple math and SQL will return the result. So if you select columns and can perform simple math using select, you can imagine you can use SELECT to create a column that doesn’t exist in the original table. For example, you can return a column that is the result of adding two columns from the data. The resulting column is known as a calculated field.

You can perform mathematical operations using just the SELECT clause
The operations above result in four columns with our results. Note the default column names because we didn’t provide a name. Later on we’ll learn how to name our columns.
You can use an asterisk with SELECT when you want all columns included in your result.

FROM – Specifies the table you want to work with.

The FROM clause tells your SQL interpreter which table you are using for your query. Once you specify the table, you can specify the columns in the SELECT clause you want to include in your result.

WHERE – This keyword is your filter and it works on the row level, like creating a filter in Excel or Sheets but without changing the data.

Use the WHERE clause as a row filter. When you run this script, SQL will check for the condition in the table and then return the columns specified in SELECT. You can specifiy more than one condition using the keywords AND and/or OR.
Using AND and OR for more than one condition with the WHERE clause.

LIMITAnother clause to begin practicing with SQL is the LIMIT clause which limits your result to a number that you specify. This is important and good practice to use because when you start with your analysis you might just want to have a quick look at the data and get acquainted with it. In most cases, when you first load your data you will want to only look at the first few rows to make sure your data is there and what it looks like without loading your entire table and using unnecessary computing power.

LIMIT 5 will return the first 5 rows from your table. You can specify whatever number you want here. It is good practice to use LIMIT when you are just exploring the data with SELECT * to avoid returning unnecessary large amounts of data.

Example query:

Conclusion

SQL is user-friendly, easy to learn, and faster when you need to analyze large datasets. Once you get some practice with the syntax and functions it’s easy to shift to SQL for more complex analysis, and you can still use Excel and Sheets for small, simple analysis and for your visuals. You can simply copy the results from your SQL queries, paste them into Excel or Sheets, and create your charts to share with others.