Database Doctor
Writing on databases, performance, and engineering.

Posts with tag: aggregate

Cover

TPC-H Query 10 - Histograms and Functional Dependency

Welcome back to the TPC-H series, dear reader. And happy holidays to those of you who've already shut down.

In today's educational blog, I'm going to teach you about:

  • The importance of histograms
  • When not to do bushy joins
  • Functional dependencies and how they speed up queries
  • Bloom filters

This is a lot of ground to cover in the around 5-15 minutes I have your attention. Every deep dive starts at the surface — let us jump into the deep sea.

Read More...

Cover

TPC-H Query 9 - Composite Key Joins and Pre-aggregation

Today's query will give us a new insight about about query optimisers — because one of the joins
contains a little extra surprise: Composite key joins. We will also learn about a new, strong optimisation that we haven't seen before: Aggregating before joining.

This is the first time we encounter some series work on partsupp and its strange relationship to lineitem

Let us proceed in the familiar way.

Read More...

Cover

Introducing the TPC series - TPC-H Query 1: Column Storage and Local Aggregation

After the wonderful feedback on the previous blog about Iceberg - it is now time to switch gears.

Databases are more than row storage engines. They are algorithm machines, helping programmers solve highly scalable, tricky problems that would otherwise not be discovered until the data is in production. They do this by using their big bag of heuristics and tuning methods - a form of distilled computer science sprinkled onto your application.

Benchmarks, particularly those made by the TPC council, are co-evolving with the database industry. As new optimisations are discovered, benchmarks are updated to test for these optimisations to see just how far database can be pushed. This kind of benchmarking allows database engine developers to track how they are doing against the state of the art.

Just like in biology, we can learn a lot about the species of today by studying their fossils and ancestry. Database benchmarks are the predators of the savannah - the environment that databases evolved to survive in. Each little detail of a benchmark gives us a clue about the genetic makeup of databases - even the ones in present time.

Our first visit to history is the TPC-H benchmark - come with me on a journey to discover the origins of our data DNA.

Read More...