Have you ever wondered about the order in which SQL Server query engine executes a SQL query? Most programming languages follow the fundamental of processing a program line by line, but SQL server has a defined logical query processing order to execute queries.
Following are the main query clauses specified in the order that you are supposed to type them (known as “keyed-in order”):
-
SELECT
-
FROM
-
WHERE
-
GROUP BY
-
HAVING
-
ORDER BY
Let’s consider below example. We have a Customer table which have information about customers from around the world. We need to write a query to fetch number of Male Customers. But include countries with more than 1 customers.
SELECT COUNT(*) AS CountOfCustomers, Country
FROM Customer
WHERE Gender = 'M'
GROUP BY Country
HAVING COUNT(*) > 1
ORDER BY CountOfCustomers ASC;
But the logical query processing order, which is the conceptual interpretation order, is different. It starts with the FROM clause. Here is the logical query processing order of the seven main query clauses:
-
FROM
-
WHERE
-
GROUP BY
-
HAVING
-
SELECT
-
ORDER BY
-
TOP and OFFSET-FETCH filters
Each phase operates on one or more tables as inputs and returns a virtual table as output. The output table of one phase is considered the input to the next phase.
Let’s come back to our example and see how that query is actually getting executed.
-
EVALUATE THE FROM CLAUSE
In the first phase, the FROM clause is evaluated. That’s where you indicate the tables you want to query and table operators like joins if applicable. If you need to query just one table, you indicate the table name as the input table in this clause. Then, the output of this phase is a table result with all rows from the input table.
In our example, the input is the Customer table and the output is a table result with all the rows from that table. Here’s the result of this phase. Please note that I am just displaying a subset of results
-
FILTER ROWS BASED ON THE WHERE CLAUSE
The second phase filters rows based on the predicate in the WHERE clause. Only rows for which the condition evaluates to true are returned.
In this query, the WHERE filtering phase filters only rows for Customers whose Gender is ‘M’. Here’s the result of this phase:
A lot of people make the mistake of referring to column alias defined in the SELECT clause in the WHERE clause. Since the WHERE clause is evaluated before the SELECT clause, you can’t use those column alias in Where clause. Alias created by the SELECT phase isn’t even visible to other expressions that appear in the same SELECT list.
-
GROUP ROWS BASED ON THE GROUP BY CLAUSE
This phase defines a group for each distinct combination of values in the grouped elements from the input virtual table. It then associates each input row to its respective group. In our example, the query groups the rows by country. If you consider the result set after Where Clause, within the six rows in the input table, this step identifies four groups. Here are the groups and the detail rows that are associated with them.
Please note that above table is just for a representation. In reality, if you want to refer to an element that is not part of your GROUP BY list (for example, FirstName, LastName, Gender), it must be contained within an aggregate function like MAX, SUM, etc. You can read more about GROUP BY clause here
-
FILTER GROUPS BASED ON THE HAVING CLAUSE
This phase is also responsible for filtering out rows the filter condition is evaluated after the data has been grouped. That means this phase is evaluated per group and filters groups as a whole. As is usual in T-SQL, the filtering predicate can evaluate to true, false, or unknown. Only groups for which the predicate evaluates to true are returned from this phase.
In our example, the HAVING clause uses the predicate COUNT(*) > 1, meaning filter only country that have more than one Customer. If you look at the number of rows that were associated with each group in the previous step, you will notice that only the groups Australia and United States have more than one Customer. Hence, the result of this phase has the following result.
It’s important to understand the difference between WHERE and HAVING. The WHERE clause is evaluated before rows are grouped, and therefore is evaluated per row. The HAVING clause is evaluated after rows are grouped, and therefore is evaluated per group.
-
PROCESS THE SELECT CLAUSE
The SELECT clause will be processed in this phase. If you look at our query, SELECT is the first clause we used but it is interesting to see that SELECT clause id being processed only at this stage.
This phase includes two main steps. The first step is evaluating the expressions in the SELECT list and producing the result attributes. The second step in this phase is applicable if you indicate the DISTINCT clause, in which case this step removes duplicates. Here’s the result of this phase in the query:
This phase assigns column aliases (in our example CountOfCustomers). This phase returns a relational result and therefore, the order of the rows is not guaranteed. If we need the results in a particular order, we need to specify ORDER BY clause which we will see in next phase.
- HANDLE ORDERING CLAUSE
This phase is applicable if the query has an ORDER BY clause. This phase is responsible for returning the result in a specific order according to the expressions that appear in the ORDER BY list. In our example we want to sort based on the number of Customers in Ascending order. So our result will be as below
Please note that we have used the column alias CountOfCustomers in the ORDER BY clause. ORDER BY clause is the first and only clause that is allowed to refer to column aliases defined in the SELECT clause. That’s because the ORDER BY clause is the only one to be evaluated after the SELECT clause.