Navicat Blog

Some Top SQL Query Mistakes - Part 3 May 6, 2022 by Robert Gravelle

Outer Joins and Cartesian Products

In this series on Top SQL Query Mistakes, we've been exploring how seemingly intuitive ways of constructing SQL queries can result in anti-patterns that lead to erroneous results and/or performance degradation. Last week, we took a break from the series to talk about Predicates in SQL. In this installment, we'll be learning how their placement can adversely affect query execution - particularly in outer joins.

What Are Outer Joins?

There are four basic join types employed in linking related tables and views: inner, left, right, and outer. With an inner join, rows from either table that are unmatched in the other table are not returned. In an outer join, unmatched rows in one or both tables can be returned. The last three join types are all instances of outer joins, whereby:

  • LEFT JOIN returns only unmatched rows from the left table.
  • RIGHT JOIN returns only unmatched rows from the right table.
  • FULL OUTER JOIN returns unmatched rows from both tables.

How Outer Joins Go Wrong

While outer joins certainly have their place in the database practitioner's arsenal, developers have a tendency of using them even in situations where they are not needed. Moreover, an outer join query can produce completely different results depending on how you construct it, and where you place predicates in your query. To illustrate, let's look at an example.

We would like to retrieve a list of ALL customers (whether they placed any orders or not), along with the total number of orders that they placed since the beginning of June, 2005. Do do so, we would employ an outer join to link the customers and orders tables as follows:

SELECT C.customerName, count(O.customerNumber) AS 2005_orders
FROM customers AS C
LEFT OUTER JOIN orders AS O
  ON C.customerNumber = O.customerNumber
WHERE O.orderDate >= '2005-05-01'
GROUP BY C.customerName
ORDER BY 2005_orders DESC;

The result should contain every possible combination of rows from the first and second table, also known as a Cartesian product. Unfortunately, when we run the query in Navicat Premium 16, only 13 rows are returned, even though there are 122 unique customers in the table (not shown):

customer_orders_bad (74K)

To understand where we went wrong, let's rebuild the query one step at a time, starting with only the columns and outer join:

outer_join_without_where_clause (121K)

Now we are getting all of the customers. Those who have not placed any orders have NULL customerNumbers, since they are coming from the orders table.

Now, let's apply the WHERE clause predicate:

outer_join_with_where_clause (90K)

All of sudden, we've lost many customers! The problem is that the predicate in the WHERE clause turned the outer join into an inner join.

To correct the issue, we need to add the WHERE predicate to the join condition:

outer_join_with_date (114K)

We can now adjust our original query to fetch all customers:

customer_orders_good (89K)

The Moral of the Story

Always be careful of where you filter out rows. In the above example, the WHERE clause was the issue; in a more complex example, where multiple joins occur, the incorrect filtering may happen on a subsequent table operator (like join to another table) instead in the WHERE clause.

Navicat Blogs
Feed Entries
Blog Archives
Share