5 SQL queries every Data Engineer & Scientist should know!

Data Engineering Academy
6 min readMar 13, 2021

Over the years, I’ve worked on dozens of data engineering projects and these 5 queries have queries have gotten me out of few jams! Hope you find them useful

1. Using the APPLY Operator

Ever find yourself trying to find the last transaction, a set of customers made? Struggled with trying to use combinations of the row_number(), top 1, along with a group and order by? There is an easier way with the APPLY operator.

APPLY is a table operator used in the FROM clause and can be either a CROSS APPLY or an OUTER APPLY. The OUTER APPLY is similar to a LEFT OUTER JOIN except that the right table expression will get executed once for each row in the left table.

Going back to the original question of how to get the latest transactions for a set of customers, Using the AdventureWorks DB, we have a Customer and a SalesOrderHeader table. Below is a sample of the data present in each:

For each customer record in the first result we would like to know the 5 latest orders based on the second result set:

Here’s the desired output:

Conceptually, you can think of this as looping through the customer table and running the inner select query for the current CustomerID.

2. Recursive CTE

Recursive CTEs (Common Table Expressions) come into their own when dealing with hierarchical data such as company org structures. Recursive CTEs are CTEs that reference themselves and contain at least two CTE query definitions, an anchor member and a recursive member.

For example, lets say you have an employee org structure as the below where DimEmployee table is self referencing and you want to get a list of each manager and their direct reports.

--

--