This site provides the following access keys:

Brandan Lennox's

Articles (Page 3)

I finally set up SSL on I had in a redirect loop trying to discover the Micropub endpoint, and the whole site was inaccessible at various points. But it's all straight now! I think! Probably!

For the first time in years, I can comfortably rest my forehead on the floor in pigeon pose! Feels like progress, although my actual strain is still just as painful as it has been for the past month.

I'm still having to learn how to stand up straight, but it seems to be the key to avoiding pain in my legs.

Since I'm bad at posting to my blog and bad at doing my physical therapy exercises, I'm going to post each time I exercise. 💎➡️🐥🐥

The Postgres CLUSTER Command

Ever heard of clustering in Postgres? No, not where you use a bunch of machines to serve a single database. Or how Postgres actually refers to a single machine serving multiple databases as a cluster. I mean the one where you rewrite table data in a particular order to reduce disk seeks. Yeah, that one!

We have a database that models a star schema, and a couple of our fact tables were reaching sizes of 70M rows and 20 GB on disk. Query times started to increase, so I decided to partition the tables to improve performance.

After using pg_partman to partition the tables, query times hardly improved at all. I had tried composite indexes, ordered indexes, various partition sizes, even tweaking a couple of settings like random_page_cost to convince the planner to do more index scanning.

Eventually, one particular query plan revealed the bottleneck:

=> EXPLAIN (ANALYZE, BUFFERS) SELECT [a bunch of stuff] ;
Bitmap Heap Scan on r4_1  (cost=992.06..113171.21 rows=1192 width=44) (actual time=26.017..6814.430 rows=1667 loops=1)
   Recheck Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date))
   Filter: (some other conditions)
   Rows Removed by Filter: 41822
   Heap Blocks: exact=27156
   Buffers: shared read=27278
   ->  Bitmap Index Scan on group_date_index  (cost=0.00..991.76 rows=42186 width=0) (actual time=18.119..18.119 rows=43489 loops=1)
         Index Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date))
         Buffers: shared read=122

I could see that I had a very efficient index scan (virtually instantaneous) that was resulting in a very inefficient heap scan (almost 7 seconds). The index scan was returning 43,489 rows that matched the grouping_id and date conditions, and then the planner decided to load those rows from disk to do the additional filtering (down to 1,667 rows).

That heap scan loaded ~27,000 blocks (also called pages) from disk in order to capture those 43,000 rows. A block in Postgres is 8 KB, and based on the data types of the columns in the table (integers and timestamps), I would expect each row to be less than 300 B. So it was loading well over 200 MB worth of pages for 12 MB worth of relevant rows1, which is an abysmal hit rate. And most likely, many of these were non-sequential seeks, which means death if you’re on spinning disks like we are.

This led me to the CLUSTER command. CLUSTER allows you to reorder table data at rest on disk according to one (and only one) index of the table. In our case, since we’re modeling a data cube, every single query will filter on grouping_id and date, so clustering on that index was an obvious choice:

=> CLUSTER r4_1 USING group_date_index;

After clustering all of the partitions, the structure of the query plan didn’t change at all, but the number of blocks read by that heap scan was reduced by about 95%:

=> EXPLAIN (ANALYZE, BUFFERS) SELECT [the same bunch of stuff] ;
Bitmap Heap Scan on r4_1  (cost=992.06..113335.17 rows=1192 width=44) (actual time=40.724..280.699 rows=1667 loops=1)
   Recheck Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date))
   Filter: (some other conditions)
   Rows Removed by Filter: 41822
   Heap Blocks: exact=1404
   Buffers: shared read=1526
   ->  Bitmap Index Scan on group_date_index  (cost=0.00..991.76 rows=42186 width=0) (actual time=18.097..18.097 rows=43489 loops=1)
         Index Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date))
         Buffers: shared read=122

Execution time of that query went from 7s to around 300ms after a full reboot (i.e., nothing in any caches). In production, our response times dropped by an order of magnitude.

Clustering certainly addresses a specific need, but if you have that need, it’s stunningly effective.


  1. You may have noticed that this query was only for 10 days of data. Many of these queries are for 1–6 months of data, so these heap scans could grow to be much larger.


I probably would tweet this if I were a Tweeter. After reading this morning’s XKCD, I had two songs recommended to me by Apple Music with depictions of the moon in their album art, evidently during a solar eclipse.

Entropy by My Aim:

Entropy by My Aim

and Direction by Mat Kerekes:

Luna and the Wild Blue Everything by Mat Kerekes

And I would tag this post “confirmation bias” and “Baader-Meinhof” if I supported tags on this blog.

Namespaces in Rails Applications

In general, Rails doesn’t encourage developers to factor code into namespaces. Most apps end up with hundreds of files and very few directories in app/models and similarly flat hierarchies in other app directories. Rails Engines are a great way to pull your code into isolated, reusable modules, but there’s a significant amount of overhead associated with an Engine that might be too expensive for a small feature, or you might just have a few conceptually similar models that you want to “live near each other”.

Rails has support for namespacing in its MVC architecture, but there are a few things that might cause confusion if you haven’t used it before.

Pretend your app has a user-configurable dashboard. Users can add widgets — like charts and tables — to the dashboard, and they can lay the widgets out on the dashboard in rows and columns. You might model it like this:

  • Dashboard::Dashboard, which contains Rows
  • Dashboard::Row, which contains Widgets
  • Dashboard::Widget, which has a position within its Row

I’ll use these models in the examples below.


Most of the gnarly bits of namespacing happen in the models, so I’ll start there.

First, you’ll need to create a module for your namespace and define the class method table_name_prefix (see Migrations):

# app/models/dashboard.rb
module Dashboard
  def self.table_name_prefix

Once you’re “inside” the namespace, you generally won’t refer to it when declaring associations. For example, Widget will belong to Row and Row will belong to Dashboard:

# app/models/dashboard/widget.rb
module Dashboard
  class Widget < ApplicationRecord
    belongs_to :row

# app/models/dashboard/row.rb
module Dashboard
  class Row < ApplicationRecord
    belongs_to :dashboard
    has_many :widgets

# app/models/dashboard/dashboard.rb
module Dashboard
  class Dashboard < ApplicationRecord
    has_many :rows

Outside the namespace, though, you’ll have to tell Rails the class name of any association that uses your namespaced model. Otherwise, it will guess incorrectly:

# app/models/user.rb
class User < ApplicationRecord

  # WRONG => NameError: uninitialized constant User::DashboardDashboard
  has_many :dashboard_dashboards

  has_many :dashboard_dashboards, class_name: 'Dashboard::Dashboard'

  has_many :dashboards, class_name: 'Dashboard::Dashboard'


Since most databases don’t support namespaces for tables, Rails expects you to prefix your table names with the “underscored” name of your namespace. You also won’t be able to pass foreign_key: true to the references helper method, so you’ll have to create constraints yourself.

To create the migration for Dashboard::Widget:

$ rails generate migration create_dashboard_widgets

Edit the generated file:

# db/migrate/YYYYMMDDHHMMSS_create_dashboard_widgets.rb
class CreateDashboardWidgets < ActiveRecord::Migration
  create_table :dashboard_widgets do |t|

    # adding "foreign_key: true" here will cause an error like this:
    #   PG::UndefinedTable: ERROR:  relation "rows" does not exist
    t.references :row, index: true

  add_foreign_key :dashboard_widgets, :dashboard_rows, column: :row_id


Simply wrap your routes in a call to namespace:

# config/routes.rb
namespace :dashboard do
  resources :dashboards
  # ...

Your URLs will look like /dashboard/dashboards/1, and path helpers will look like edit_dashboard_dashboard_path(dashboard).


Controllers are also straightforward:

# app/controllers/dashboard/dashboards_controller.rb
module Dashboard
  class DashboardsController < ApplicationController
    def index
      @dashboards = current_user.dashboards

    # ...


You may get a confusing error if you try to render a partial of a model from outside the namespace.

Say one of your widgets renders a list of recently active users:

# app/views/dashboard/widgets/_active_user_widget.html.erb

<%# This can result in an error:
      Missing partial dashboard/users/_user %>
<%= render @widget.active_users %>

Rails is attempting to render a different partial than you might expect since you’re inside a namespace. The idea is that you might want a different _user partial in, say, an admin area than you do in the public area.

Since I’m generally using namespaces to isolate features rather than create distinct layouts, I turn this behavior off:

# config/application.rb
config.action_view.prefix_partial_path_with_controller_namespace = false

Unfortunately, it’s a global setting, so you can’t change it for each call to render. You can always explicitly render a template if you need the default behavior:

<%= render partial: 'admin/users/user', collection: @users %>


If you’re using FactoryGirl for testing, you’ll need to tell her the name of the factory to use for your namespaced associations:

# spec/factories/dashboard/widget.rb
FactoryGirl.define do
  factory :dashboard_widget, class: Dashboard::Widget do
    association :row, factory: :dashboard_row


After many years of cramming everything directly into app/models, I’ve had great luck with namespacing. It keeps related concepts together and provides a nice level of isolation without the overhead of full-fledged Engines or microservices.

I hope your experience is as pleasant as mine!

Moving to Linode

Here’s an embarrassing story.

I used to host a friend’s WordPress blog on a subdomain of, since I was already paying for a shared host at Site5 and he didn’t care about the domain name. Site5 recently warned me that the WordPress installation was outdated, so I asked him if he still needed it at all, since he hadn’t written anything in a few years1. He said he didn’t need it, so he’d export his data and I could shut down the subdomain.

But then he wrote back and said he couldn’t get to WordPress. Actually, he couldn’t get to anything on at all.

Actually, my DNS records were wrong. I don’t remember changing them, and I don’t understand why they had the values that they did, but most offensively, I don’t even know how long my site was unreachable. Days? Months? No clue. I fixed the DNS entries, and a little while later, my domain was resolving again…

…to no avail, because the Rails app wouldn’t boot.

As a shared hosting environment, the default Ruby version at Site5 was 1.8.7 for an absurdly long time. At some point in the past couple of years, they started offering Ruby 2.0, but only through some arcane configuration in your .bashrc or RUBY_HOME or something. I clearly don’t remember how it works — hence the unbootable app — and their documentation server was down when I tried searching for clues.

I’ve been standing up Linux servers quite a lot over the past few years. Bringing up a Rails stack to host my tiny, untrafficked blog was not a daunting task to me. Inertia (and advance payment) kept me at Site5, but this was the final kick the ass I needed to move my site to a proper hosting environment.

To VPS, and Beyond!

After putting the baby to sleep last Tuesday, and before bedtime, I:

  • signed up for a Linode account,
  • installed CentOS 7.2, nginx/passenger, rbenv, Postgres, and monit,
  • migrated my existing SQLite database2 to Postgres,
  • deployed the code, and
  • updated my DNS records to point to Linode.

Literally an evening’s worth of work, without reading a word of Linode’s documentation, and I was on a stable, secure, up-to-date environment that I control from top to (almost) bottom. I should’ve done this years ago!

The next night, I moved over a couple of other services I used to host at Site5. Now I wish I had the time to work on some more side projects, but I guess I can wait until Rowan is old enough to want to work on them with me.


  1. For shame!
  2. That’s right. Production site hosted on SQLite.

My Backup Strategy

A friend recently asked me to help make her backup process more sane. I wrote her a long e-mail, and now I’m rewriting it as a post here for future reference (because I recently forgot what I was doing and had to find that e-mail).


  • Backups of 3 classes of assets:
    • Irreplaceable and frequently used (a.k.a., FU) — current projects, photos
    • Replaceable and FU — stock art, downloaded or ripped media
    • Irreplaceable and not FU — archived projects, other external systems (game consoles and such)
  • Few total drives
  • Always-on incremental backups for IFU data
  • Occasional snapshot backups for RFU and INFU data
  • Bootable clone of primary desktop
  • No NAS
  • No cloud
  • No DVDs

The last three are controversial, but that’s what we had discussed, so I’m sticking to it for now.

My strategy requires three drives (in addition to the drive in the computer).

Drive 0 — Primary Drive

The one in the computer.

Drive 1 — Incremental Backup of Drive 0

Time Machine backups that are always running for IFU data. Also storage for RFU data and any other disk-based caches you may need, since it’s always connected. I’m also adding INFU here as a simplifying constraint for Drive 3.

You can either partition the drive and tell Time Machine to use one of the partitions, or leave the whole drive as a single partition and just use folders to manage your other storage (which is what I do). Time Machine just needs its Backups.backupdb folder, and it doesn’t care what else is on the drive.

One partition is simpler to manage, but it allows Time Machine to fill the whole drive. You might want to limit the size of the Time Machine backup (do you really need weekly backups from 6 years ago?), which you can only do by limiting the size of the partition it’s on, but you have to guess the right sizes for your partitions. I don’t see tremendous benefit to either scheme. The platter is spinning just as much either way.

It shouldn’t be a problem to leave this drive connected all the time so that Time Machine can run. I’ve been buying SeaGate drives for a while. They’re fast and you can’t hear them over the presumed fan noise from your computer.

Frequency: continuous
Capacity: 2× Drive 0, plus whatever you need for asset storage

Drive 2 — Bootable Clone of Drive 0

This SuperDuper! clone will be the same IFU data as your Time Machine backups (if they both ran at the same moment), except you can boot from it in emergencies, or if you just want to boot your OS on a different piece of hardware for the hell of it.

You can actually run SuperDuper! and Time Machine on the same drive. I considered having two bit-for-bit identical drives that you would leave plugged in all the time that both had Time Machine and SuperDuper! backups, sort of like a crappy RAID 1, so that if one drive fails you have another identical drive. But then one virus that takes out both drives leaves you with no backups.

Frequency: every sometimes
Capacity: equivalent to Drive 0, since it’s a clone

Drive 3 — Non-Bootable Clone of Drive 1

I keep this drive off-site and only back it up about once a month. Here’s my logic:

  • Your IFU data exists on Drive 0, Drive 1, and Drive 2. Makes sense, since it’s the most important.
  • Your RFU data exists only on Drive 1. But it’s R, so it’s just inconvenient, not catastrophic, if you lose that drive.
  • Your INFU data exists only on Drive 1. It’s I, so you don’t want to lose it, but it’s not changing anymore, so occasional snapshots for backup are good enough.

So snapshots of Drive 1 are a good enough backup for your RFU and INFU data, and you would have to lose 3 drives before you needed to get IFU data from here.

Frequency: occasional sometimes
Capacity: equivalent to Drive 1, since it’s a clone

That’s It

So far, I’ve never had to recover from anything catastrophic, but I’ve used my backup drives for convenience many times. This strategy has worked well for me, and it’s ultimately only cost me a few hundred dollars over the course of about a decade.

Extremely Large File Uploads with nginx, Passenger, Rails, and jQuery

We have to handle some really frackin’ huge uploads (approaching 2 TB) in our Rails-Passenger-nginx application at work. This results in some interesting requirements:

  1. Murphy’s Law guarantees that uploads this big will get interrupted, so we need to support resumable uploads.
  2. Even if the upload doesn’t get interrupted, we have to report progress to the user since it’s such a long feedback cycle.
  3. Luckily, we can restrict the browsers we support, so we can use some of the advanced W3C APIs (like File) and avoid Flash.
  4. Only one partition in our appliance is large enough to contain a file that size, and it’s not /tmp.

For the first three requirements, it seemed like the jQuery File Upload plugin was a perfect fit. For the last, we just needed to tweak Passenger to change the temporary location of uploaded files…


Many Googles later, I realized that option is only supported in Apache and my best bet was the third-party nginx upload module. But its documentation is fairly sparse, and getting it to work with the jQuery plugin was a lot more work than I anticipated.

Below is my solution.

nginx and the Upload Module

The first step was recompiling nginx with the upload module. In our case, this meant modifying an RPM spec and rebuilding it, but in general, you just need to extract the upload module’s tarball to your filesystem and reference it in the ./configure command when building nginx:

./configure --add-module=/path/to/nginx_upload_module ...

Once that was built and installed, I added the following section to our nginx.conf:

# See
location = /upload-restore-archive {

  # if resumable uploads are on, then the $upload_field_name variable
  # won't be set because the Content-Type isn't (and isn't allowed to be)
  # multipart/form-data, which is where the field name would normally be
  # defined, so this *must* correspond to the field name in the Rails view
  set $upload_field_name "archive";

  # location to forward to once the upload completes
  upload_pass /backups/archives/restore.json;

  # filesystem location where we store uploads
  # The second argument is the level of "hashing" that nginx will perform
  # on the filenames before storing them to the filesystem. I can't find
  # any documentation online, so as an example, say we were using this
  # configuration:
  #   upload_store /tmp/uploads 2 1;
  # A file named '43829042' would be written to this path:
  #   /tmp/uploads/42/0/43829042
  # I hope that's clear enough. The argument is required and must be
  # greater than 0. You can see the implementation here:
  upload_store /backup/upload 1;

  # whether uploads are resumable
  upload_resumable on;

  # access mode for storing uploads
  upload_store_access user:r;

  # maximum upload size (0 for unlimited)
  upload_max_file_size 0;

  # form fields to be passed to Rails
  upload_set_form_field $upload_field_name[filename] "$upload_file_name";
  upload_set_form_field $upload_field_name[path] "$upload_tmp_path";
  upload_set_form_field $upload_field_name[content_type] "$upload_content_type";
  upload_aggregate_form_field $upload_field_name[size] "$upload_file_size";

  # hashes are not supported for resumable uploads
  #upload_aggregate_form_field $upload_field_name[signature] "$upload_file_sha1";

That’s a literal copy-and-paste from the config. I’m including the comments here because the documentation wasn’t as explicit as I apparently needed it to be.

Some important points:

  • Valery Kholodkov, the author of the upload module, has written a protocol defining how resumable uploads work. You should definitely read it and understand the Content-Range and Session-Id headers.
  • I can’t find any documentation on “nginx directory hashes”. That comment is the best I could do to explain it.
  • Once the upload is completely finished, the module sends a request to a given URL with a given set of parameters. That’s what upload_set_form_field and upload_aggregate_form_field are for, so you can make the request look like a multipart form submission to your application.
  • The module supports automatic calculation of a SHA1 (or MD5) hash of uploaded files, presumably implemented as a filter during the upload to save time. I would’ve liked to have that hash passed to Rails for verification of the file, but it’s unsupported for resumable uploads. I’m leaving that setting commented out for future developers’ sakes.

At this point, I was able to use curl to upload files and observe what was happening on the filesystem. The next step was configuring the jQuery plugin.

The jQuery File Upload Plugin

This plugin is extremely full-featured and comprehensively documented, which was exactly the problem I had with it. I needed something in between the basic example and the kitchen sink example, and the docs were spread over a series of wiki pages that I personally had trouble following. A curse of plenty.

Here’s the essence of what I came up with (in CoffeeScript):

# We need a simple hashing function to turn the filename into a
# numeric value for the nginx session ID. See:
hash = (s, tableSize) ->
  b = 27183
  h = 0
  a = 31415

  for i in [0...s.length]
    h = (a * h + s[i].charCodeAt()) % tableSize
    a = ((a % tableSize) * (b % tableSize)) % (tableSize)

sessionId = (filename) ->
  hash(filename, 16384)


  # nginx's upload module responds to these requests with a simple
  # byte range value (like "0-2097152/3892384590"), so we shouldn't
  # try to parse that response as the default JSON dataType
  dataType: 'text',

  # upload 8 MB at a time
  maxChunkSize: 8 * 1024 * 1024,

  # very importantly, the nginx upload module *does not allow*
  # resumable uploads for a Content-Type of "multipart/form-data"
  multipart: false,

  # add the Session-Id header to the request when the user adds the
  # file and we know its filename
  add: (e, data) ->
    data.headers or= {}
    data.headers['Session-Id'] = sessionId(data.files[0].name)

  # update the progress bar on the page during upload
  progress: (e, data) ->

Unlike the nginx config above, this example leaves out a lot of application-specific settings that aren’t relevant to getting the plugin to work with nginx.

Some important points:

  • I decided to use a simple JavaScript hashing function to hash the filename for the Session-Id. It might not need to be numeric, but all the nginx examples I read used numeric filenames, and the Session-Id is used directly by nginx as the filename on disk.
  • As noted in the comment, the response to an individual upload request is a plain-text byte range, which is also present in the Content-Range header. The plugin uses this value to determine the next chunk of the file to upload.
  • This means that in order to resume an upload, the first chunk of the file must be re-uploaded. Then nginx responds with the last successful byte range, and the plugin will start from there on the next request. This can be momentarily disconcerting, since it looks like the upload has started over. Set your chunk size accordingly.
  • You must set multipart: false for resumable uploads to work. I missed that note in the protocol, and I wasted a lot of time trying to figure out why my uploads weren’t resuming.

At this point, I could interrupt an upload, resume it by simply uploading the same file again, and I had a lovely progress bar to boot. The last step was making sure Rails worked.


All the hard work has been done by the time Rails even realizes somebody’s uploading something. The controller action looks exactly like you’d expect it to:

class ArchivesController < ApplicationController
  def restore
    archive =[:archive])

    if archive.valid? && archive.perform!
      render json: { errors: archive.errors.full_messages }, status: :error

The view suffers a bit, since the jQuery plugin wants to own the form and nginx has its configuration hard-coded:

<!-- The fileupload plugin takes care of all the normal form options -->
  <input id="restore-archive" type="file" data-url="/upload-restore-archive">
  <%= button_tag 'Upload and Restore', id: 'restore-upload-button', type: 'button' %>

That’s about it.


It was pretty sweet once it worked, but the journey was arduous. Hope this helps some people.