This site provides the following access keys:

Brandan Lennox's

Articles (Page 2)

I took last night off from PT. Grapevines felt better on my right leg tonight.

Also removed Google Analytics from my site, since I haven't looked at it in years, and even my own browser blocks it.

Oh man. I'm really excited about this Night Owl pin! 🌚 Makes me feel young again.

I finally set up SSL on bclennox.com. I had Micro.blog in a redirect loop trying to discover the Micropub endpoint, and the whole site was inaccessible at various points. But it's all straight now! I think! Probably!

For the first time in years, I can comfortably rest my forehead on the floor in pigeon pose! Feels like progress, although my actual strain is still just as painful as it has been for the past month.

I'm still having to learn how to stand up straight, but it seems to be the key to avoiding pain in my legs.

Since I'm bad at posting to my blog and bad at doing my physical therapy exercises, I'm going to post each time I exercise. 💎➡️🐥🐥

The Postgres CLUSTER Command

Ever heard of clustering in Postgres? No, not where you use a bunch of machines to serve a single database. Or how Postgres actually refers to a single machine serving multiple databases as a cluster. I mean the one where you rewrite table data in a particular order to reduce disk seeks. Yeah, that one!

We have a database that models a star schema, and a couple of our fact tables were reaching sizes of 70M rows and 20 GB on disk. Query times started to increase, so I decided to partition the tables to improve performance.

After using pg_partman to partition the tables, query times hardly improved at all. I had tried composite indexes, ordered indexes, various partition sizes, even tweaking a couple of settings like random_page_cost to convince the planner to do more index scanning.

Eventually, one particular query plan revealed the bottleneck:

=> EXPLAIN (ANALYZE, BUFFERS) SELECT [a bunch of stuff] ;
...
Bitmap Heap Scan on r4_1  (cost=992.06..113171.21 rows=1192 width=44) (actual time=26.017..6814.430 rows=1667 loops=1)
   Recheck Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date))
   Filter: (some other conditions)
   Rows Removed by Filter: 41822
   Heap Blocks: exact=27156
   Buffers: shared read=27278
   ->  Bitmap Index Scan on group_date_index  (cost=0.00..991.76 rows=42186 width=0) (actual time=18.119..18.119 rows=43489 loops=1)
         Index Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date))
         Buffers: shared read=122
...

I could see that I had a very efficient index scan (virtually instantaneous) that was resulting in a very inefficient heap scan (almost 7 seconds). The index scan was returning 43,489 rows that matched the grouping_id and date conditions, and then the planner decided to load those rows from disk to do the additional filtering (down to 1,667 rows).

That heap scan loaded ~27,000 blocks (also called pages) from disk in order to capture those 43,000 rows. A block in Postgres is 8 KB, and based on the data types of the columns in the table (integers and timestamps), I would expect each row to be less than 300 B. So it was loading well over 200 MB worth of pages for 12 MB worth of relevant rows1, which is an abysmal hit rate. And most likely, many of these were non-sequential seeks, which means death if you’re on spinning disks like we are.

This led me to the CLUSTER command. CLUSTER allows you to reorder table data at rest on disk according to one (and only one) index of the table. In our case, since we’re modeling a data cube, every single query will filter on grouping_id and date, so clustering on that index was an obvious choice:

=> CLUSTER r4_1 USING group_date_index;

After clustering all of the partitions, the structure of the query plan didn’t change at all, but the number of blocks read by that heap scan was reduced by about 95%:

=> EXPLAIN (ANALYZE, BUFFERS) SELECT [the same bunch of stuff] ;
...
Bitmap Heap Scan on r4_1  (cost=992.06..113335.17 rows=1192 width=44) (actual time=40.724..280.699 rows=1667 loops=1)
   Recheck Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date))
   Filter: (some other conditions)
   Rows Removed by Filter: 41822
   Heap Blocks: exact=1404
   Buffers: shared read=1526
   ->  Bitmap Index Scan on group_date_index  (cost=0.00..991.76 rows=42186 width=0) (actual time=18.097..18.097 rows=43489 loops=1)
         Index Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date))
         Buffers: shared read=122
...

Execution time of that query went from 7s to around 300ms after a full reboot (i.e., nothing in any caches). In production, our response times dropped by an order of magnitude.

Clustering certainly addresses a specific need, but if you have that need, it’s stunningly effective.

Footnotes

  1. You may have noticed that this query was only for 10 days of data. Many of these queries are for 1–6 months of data, so these heap scans could grow to be much larger.

Moons

I probably would tweet this if I were a Tweeter. After reading this morning’s XKCD, I had two songs recommended to me by Apple Music with depictions of the moon in their album art, evidently during a solar eclipse.

Entropy by My Aim:

Entropy by My Aim

and Direction by Mat Kerekes:

Luna and the Wild Blue Everything by Mat Kerekes

And I would tag this post “confirmation bias” and “Baader-Meinhof” if I supported tags on this blog.