The Database Doctor
Musing about Databases

Tag: performance

Cover image for TPC series - TPC-H Query 7 - Optimiser Reasoning
TPC series - TPC-H Query 7 - Optimiser Reasoning

It is time to resume the TPC-H series and look at Query 7. We will learn about how query optimisers can decompose filters and reason about the structure of expressions to reduce join This...

Cover image for TPC series - TPC-H Query 6 - Expression Optimisation
TPC series - TPC-H Query 6 - Expression Optimisation

And now, for something completely different. This week on TPC-H query analysis - we are not going to look at join ordering. Today's query does not have any joins. But as we shall see, is to...

Cover image for TPC series - TPC-H Query 5 - Transitive Closure and Join Order Dependencies
TPC series - TPC-H Query 5 - Transitive Closure and Join Order Dependencies

Welcome back to the TPC-H analysis. If this is your first time, I highly recommend that you visit the previous blogs in the series first. They're here (and I look forward to seeing you in a...

Cover image for TPC series - TPC-H Query 4 - Semi Join and Uniqueness
TPC series - TPC-H Query 4 - Semi Join and Uniqueness

Today we are looking at a Q04 — which on the surface is similar to Q17. Like Q17, Q04 has a correlated subquery that can be de-correlated using a join. But sometimes, a regular INNER JOIN is...

Cover image for TPC-H series - TPC-H Query 3 - Join Ordering and Heap Sorting
TPC-H series - TPC-H Query 3 - Join Ordering and Heap Sorting

I want to teach you an important skill that will serve your well as a database specialist. One blog entry is not going to be enough, but here is my goal: When you look at an SQL query in the you...

Cover image for TPC series - TPC-H Query 2 and 17 - De-correlation
TPC series - TPC-H Query 2 and 17 - De-correlation

The great promise databases make to programmers is: "Tell me what you want and I will figure out the fastest way to do it." A database is a computer science engine — it knows and...

Cover image for Joins are NOT Expensive! - Raw Reading
Joins are NOT Expensive! - Raw Reading

When talking about Data Lakes and how people access them - we must address some of the misconceptions that made them popular in the first place. One of the largest misconceptions is are I...

Cover image for Introducing the TPC series - TPC-H Query 1: Column Storage and Local Aggregation
Introducing the TPC series - TPC-H Query 1: Column Storage and Local Aggregation

After the wonderful feedback on the previous blog about Iceberg - it is now time to switch gears. Databases are more than row storage engines. They are algorithm machines, helping that...

TPC series - TPC-H Query 20 - Nested De-correlation

TPC series - TPC-H Query 9 - Composite Key Joins

TPC series - TPC-H Query 16 - Anti Joins

TPC series - TPC-H Query 6 and Query 14 - Expression Optimisation

TODO TODO...

Cover image for Testing is Hard  and we often use the wrong Incentives
Testing is Hard and we often use the wrong Incentives

I have been spending a lot of time thinking about testing and reviewing testing lately. At a superficial level - testing looks simple: Write test matrix, code tests, run tests, learn we...

Cover image for Why are Databases so Hard to Make? - Logging to Disk
Why are Databases so Hard to Make? - Logging to Disk

Transaction logs. Why are they so important and why are they so hard to make?

Cover image for Why are Databases so Hard to Make? - CPU usage
Why are Databases so Hard to Make? - CPU usage

In our previous blogs, we have visited the idea that "databases are just loops". At this point, my dear readers may rightfully ask: "if those database are indeed just -...

Cover image for Databases are Just Loops - Row and Batch execution
Databases are Just Loops - Row and Batch execution

Our database journey makes a brief stop. We need to appreciate an important design decision every database must make: Should I use row or batch execution? Depending on the database - or...

Cover image for Databases are just Loops - GROUP BY
Databases are just Loops - GROUP BY

In my previous post - I introduced the idea that you can think of database queries as a series of loops. Let me take this ideas even further - introducing more complex database concepts in...