TPC-H Query 6 - Expression Optimisation

October 13, 2025

And now, for something completely different.

This week on TPC-H query analysis - we are not going to look at join ordering. Today's query does not have any joins.

But as we shall see, there is still performance to be had for a clever query optimiser.

Introducing the TPC series - TPC-H Query 1: Column Storage and Local Aggregation

July 14, 2025

After the wonderful feedback on the previous blog about Iceberg - it is now time to switch gears.

Databases are more than row storage engines. They are algorithm machines, helping programmers solve highly scalable, tricky problems that would otherwise not be discovered until the data is in production. They do this by using their big bag of heuristics and tuning methods - a form of distilled computer science sprinkled onto your application.

Benchmarks, particularly those made by the TPC council, are co-evolving with the database industry. As new optimisations are discovered, benchmarks are updated to test for these optimisations to see just how far database can be pushed. This kind of benchmarking allows database engine developers to track how they are doing against the state of the art.

Just like in biology, we can learn a lot about the species of today by studying their fossils and ancestry. Database benchmarks are the predators of the savannah - the environment that databases evolved to survive in. Each little detail of a benchmark gives us a clue about the genetic makeup of databases - even the ones in present time.

Our first visit to history is the TPC-H benchmark - come with me on a journey to discover the origins of our data DNA.

Row or Column based Storage?

April 7, 2023

These days, columnar storage formats are getting a lot more attention in relational databases. Parquet, with its superior compression, is quickly taking over from CSV formats. SAP Hana, Vertica, Yellowbrick, Databricks, SQL Azure and many others all promote columnar representations as the best option for analytical workloads.

It is not hard to see why. But there are tradeoffs.

Posts with tag: column

TPC-H Query 6 - Expression Optimisation

Introducing the TPC series - TPC-H Query 1: Column Storage and Local Aggregation

Row or Column based Storage?