The great promise databases make to programmers is: "Tell me what you want and I will figure out the
fastest way to do it."
A database is a computer science engine — it knows things and optimisations that the average programmer has
not heard about...
Sometimes...
Some queries look "easy" to programmers — but databases often need to apply a method called de-correlation to
make them effective.
Even back in the 90ies, the great minds of the TPC council knew how to design queries that look for this optimisation.
Today we will learn how to spot these cases and what happens when databases fail at optimising them.
After the wonderful feedback on the previous blog about Iceberg - it is now time to switch
gears.
Databases are more than row storage engines. They are algorithm machines, helping programmers solve highly
scalable, tricky problems that would otherwise not be discovered until the data is in production. They do this by
using their big bag of heuristics and tuning methods - a form of distilled computer science sprinkled onto your
application.
Benchmarks, particularly those made by the TPC council, are co-evolving with the database industry. As new optimisations
are discovered, benchmarks are updated to test for these optimisations to see just how far database can be pushed. This
kind of benchmarking allows database engine developers to track how they are doing against the state of the art.
Just like in biology, we can learn a lot about the species of today by studying their fossils and ancestry. Database
benchmarks are the predators of the savannah - the environment that databases evolved to survive in. Each little
detail of a benchmark gives us a clue about the genetic makeup of databases - even the ones in present time.
Our first visit to history is the TPC-H benchmark - come with me on a journey to discover the origins of our data DNA.