Posts with tag: join

TPC-H Query 4 - Semi Join and Uniqueness

September 29, 2025

Today we are looking at a Q04 — which on the surface is similar to Q17. Like Q17, Q04 has a correlated subquery that can be de-correlated using a join.

But sometimes, a regular INNER JOIN is not enough to de-correlate a query. You need special join types that don't appear as part of the standard SQL syntax.

It is time to learn about the SEMI JOIN.

Joins are NOT Expensive! - Raw Reading

July 23, 2025

When talking about Data Lakes and how people access them - we must address some of the misconceptions that made them popular in the first place.

One of the largest misconceptions is this: "Joins are expensive". It is apparently believed (as I discovered from discussions on LinkedIn) that it costs less CPU to turn your data model into a flat table and serve that to users - than it does to join everytime you access the table.

It is certainly true that object stores allow nearly infinite I/O (although at a very high CPU cost to handle the HTTP overhead) and nearly infinite storage. But, is there really such a thing as "sacrificing disk space to save the CPU cost of joins?"

Today, let us put this to the test.

Databases are just Loops - Joins

May 24, 2024

I decided to write this post in response to my recent discussions with Matt Martin. Matt and I have been sparring lately over software performance. During one of these discussions, Matt said the magic words in the title of this post. I hypothesise that this quote carries of lot of explaining power for those who are not fully familiar with databases.

Let us play with this idea a bit! I'm going to use Python to show examples of "database loops". Obviously, no one in their sane mind would implement a database in Python. But Python has the great advantage of being readable to nearly everyone. Let's go...

Join Elimination in Query Planners

November 17, 2021

Most relational databases are both by human SQL query writers and tools that automatically generate queries. Query tools frequently add joins into the queries, even when those joins are not needed. This results in unnecessary query slowdowns and increase CPU usage.

For example, consider the classic Star Schema model with a fact table in the middle and dimension tables surrounding it. From a reporting tool perspective, it is useful to have a view on top of that schema which joins all the tables and presents the user with the illusion of a single table. This approach of creating large join views has some drawbacks that can only be offset by query optimisers if the database designer has correctly declared key constraints in the data model.