Database Doctor
Writing on databases, performance, and engineering.

Posts with tag: aggregation

Cover

TPC-H Query 12 and Query 14 - The limits of Estimation

Today, we are going to talk about estimating filter selectivity and how to fail at it. Two queries are similar enough that we will analyse them together: Q12 and Q14.

Like listening to a song from the '80s, it is good to be back to TPC-H. I have been busy doing work with Floe. If you are a regular reader of this blog, I think you will like this article I wrote recently:

  • Joining in Spark and Databricks

Read More...

Cover

Databases are just Loops - GROUP BY

In my previous post - I introduced the idea that you can think of database queries as a series of loops. Let me take this ideas even further - introducing more complex database concepts in terms more familiar to computer programmers.

Databases can do this cool thing called "aggregation". You may think of it like GROUP BY. You will have seen queries like this:

SELECT x, SUM(v) AS agg_y
FROM a
GROUP BY x

What is going on here? And how can we think of this like a loop?

Aggregate or GROUP BY?

To establish some terminology: You will hear me refer to GROUP BY as "aggregation" in the following passages.

We will call SUM(v) above an "aggregate function" and we will call x a "grouping column". We will refer to the entire loop as the "aggregation loop"

Read More...

Cover

Finding things in Data Models with Property Bags

When designing database schemas, you may find yourself needing to create a property bag table that is in a 1-n relationship with an entity table. It isn't always viable to just add more columns to a table because you may not know all the properties you want in advance. The problem with these property bags is that they can be hard to query.

In this blog entry, we will explore the property bag data modelling use case and provide some tricks that will make it very simple to query those bags.

Read More...