Also removed Google Analytics from my site, since I haven't looked at it in years, and even my own browser blocks it.
Articles (Page 3)
Oh man. I'm really excited about this Night Owl pin! 🌚 Makes me feel young again.
I finally set up SSL on bclennox.com. I had Micro.blog in a redirect loop trying to discover the Micropub endpoint, and the whole site was inaccessible at various points. But it's all straight now! I think! Probably!
For the first time in years, I can comfortably rest my forehead on the floor in pigeon pose! Feels like progress, although my actual strain is still just as painful as it has been for the past month.
I'm still having to learn how to stand up straight, but it seems to be the key to avoiding pain in my legs.
Since I'm bad at posting to my blog and bad at doing my physical therapy exercises, I'm going to post each time I exercise. 💎➡️🐥🐥
Ever heard of clustering in Postgres? No, not where you use a bunch of machines to serve a single database. Or how Postgres actually refers to a single machine serving multiple databases as a cluster. I mean the one where you rewrite table data in a particular order to reduce disk seeks. Yeah, that one!
We have a database that models a star schema, and a couple of our fact tables were reaching sizes of 70M rows and 20 GB on disk. Query times started to increase, so I decided to partition the tables to improve performance.
After using pg_partman to partition the tables, query times hardly improved at all. I had tried composite indexes, ordered indexes, various partition sizes, even tweaking a couple of settings like
random_page_cost to convince the planner to do more index scanning.
Eventually, one particular query plan revealed the bottleneck:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT [a bunch of stuff] ; ... Bitmap Heap Scan on r4_1 (cost=992.06..113171.21 rows=1192 width=44) (actual time=26.017..6814.430 rows=1667 loops=1) Recheck Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date)) Filter: (some other conditions) Rows Removed by Filter: 41822 Heap Blocks: exact=27156 Buffers: shared read=27278 -> Bitmap Index Scan on group_date_index (cost=0.00..991.76 rows=42186 width=0) (actual time=18.119..18.119 rows=43489 loops=1) Index Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date)) Buffers: shared read=122 ...
I could see that I had a very efficient index scan (virtually instantaneous) that was resulting in a very inefficient heap scan (almost 7 seconds). The index scan was returning 43,489 rows that matched the grouping_id and date conditions, and then the planner decided to load those rows from disk to do the additional filtering (down to 1,667 rows).
That heap scan loaded ~27,000 blocks (also called pages) from disk in order to capture those 43,000 rows. A block in Postgres is 8 KB, and based on the data types of the columns in the table (integers and timestamps), I would expect each row to be less than 300 B. So it was loading well over 200 MB worth of pages for 12 MB worth of relevant rows1, which is an abysmal hit rate. And most likely, many of these were non-sequential seeks, which means death if you’re on spinning disks like we are.
This led me to the
CLUSTER allows you to reorder table data at rest on disk according to one (and only one) index of the table. In our case, since we’re modeling a data cube, every single query will filter on grouping_id and date, so clustering on that index was an obvious choice:
=> CLUSTER r4_1 USING group_date_index;
After clustering all of the partitions, the structure of the query plan didn’t change at all, but the number of blocks read by that heap scan was reduced by about 95%:
=> EXPLAIN (ANALYZE, BUFFERS) SELECT [the same bunch of stuff] ; ... Bitmap Heap Scan on r4_1 (cost=992.06..113335.17 rows=1192 width=44) (actual time=40.724..280.699 rows=1667 loops=1) Recheck Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date)) Filter: (some other conditions) Rows Removed by Filter: 41822 Heap Blocks: exact=1404 Buffers: shared read=1526 -> Bitmap Index Scan on group_date_index (cost=0.00..991.76 rows=42186 width=0) (actual time=18.097..18.097 rows=43489 loops=1) Index Cond: ((grouping_id = 185) AND (date >= '2017-06-21'::date) AND (date < '2017-06-30'::date)) Buffers: shared read=122 ...
Execution time of that query went from 7s to around 300ms after a full reboot (i.e., nothing in any caches). In production, our response times dropped by an order of magnitude.
Clustering certainly addresses a specific need, but if you have that need, it’s stunningly effective.
I probably would tweet this if I were a Tweeter. After reading this morning’s XKCD, I had two songs recommended to me by Apple Music with depictions of the moon in their album art, evidently during a solar eclipse.
And I would tag this post “confirmation bias” and “Baader-Meinhof” if I supported tags on this blog.
In general, Rails doesn’t encourage developers to factor code into namespaces. Most apps end up with hundreds of files and very few directories in
app/models and similarly flat hierarchies in other
app directories. Rails Engines are a great way to pull your code into isolated, reusable modules, but there’s a significant amount of overhead associated with an Engine that might be too expensive for a small feature, or you might just have a few conceptually similar models that you want to “live near each other”.
Rails has support for namespacing in its MVC architecture, but there are a few things that might cause confusion if you haven’t used it before.
Pretend your app has a user-configurable dashboard. Users can add widgets — like charts and tables — to the dashboard, and they can lay the widgets out on the dashboard in rows and columns. You might model it like this:
Dashboard::Dashboard, which contains
Dashboard::Row, which contains
Dashboard::Widget, which has a position within its
I’ll use these models in the examples below.
Most of the gnarly bits of namespacing happen in the models, so I’ll start there.
First, you’ll need to create a module for your namespace and define the class method
table_name_prefix (see Migrations):
# app/models/dashboard.rb module Dashboard def self.table_name_prefix 'dashboard_' end end
Once you’re “inside” the namespace, you generally won’t refer to it when declaring associations. For example,
Widget will belong to
Row will belong to
# app/models/dashboard/widget.rb module Dashboard class Widget < ApplicationRecord belongs_to :row end end # app/models/dashboard/row.rb module Dashboard class Row < ApplicationRecord belongs_to :dashboard has_many :widgets end end # app/models/dashboard/dashboard.rb module Dashboard class Dashboard < ApplicationRecord has_many :rows end end
Outside the namespace, though, you’ll have to tell Rails the class name of any association that uses your namespaced model. Otherwise, it will guess incorrectly:
# app/models/user.rb class User < ApplicationRecord # WRONG => NameError: uninitialized constant User::DashboardDashboard has_many :dashboard_dashboards # RIGHT has_many :dashboard_dashboards, class_name: 'Dashboard::Dashboard' # ALSO RIGHT has_many :dashboards, class_name: 'Dashboard::Dashboard' end
Since most databases don’t support namespaces for tables, Rails expects you to prefix your table names with the “underscored” name of your namespace. You also won’t be able to pass
foreign_key: true to the
references helper method, so you’ll have to create constraints yourself.
To create the migration for
$ rails generate migration create_dashboard_widgets
Edit the generated file:
# db/migrate/YYYYMMDDHHMMSS_create_dashboard_widgets.rb class CreateDashboardWidgets < ActiveRecord::Migration create_table :dashboard_widgets do |t| # adding "foreign_key: true" here will cause an error like this: # PG::UndefinedTable: ERROR: relation "rows" does not exist t.references :row, index: true end add_foreign_key :dashboard_widgets, :dashboard_rows, column: :row_id end
Simply wrap your routes in a call to
# config/routes.rb namespace :dashboard do resources :dashboards # ... end
Your URLs will look like
/dashboard/dashboards/1, and path helpers will look like
Controllers are also straightforward:
# app/controllers/dashboard/dashboards_controller.rb module Dashboard class DashboardsController < ApplicationController def index @dashboards = current_user.dashboards end # ... end end
You may get a confusing error if you try to render a partial of a model from outside the namespace.
Say one of your widgets renders a list of recently active users:
# app/views/dashboard/widgets/_active_user_widget.html.erb <%# This can result in an error: Missing partial dashboard/users/_user %> <%= render @widget.active_users %>
Rails is attempting to render a different partial than you might expect since you’re inside a namespace. The idea is that you might want a different
_user partial in, say, an admin area than you do in the public area.
Since I’m generally using namespaces to isolate features rather than create distinct layouts, I turn this behavior off:
# config/application.rb config.action_view.prefix_partial_path_with_controller_namespace = false
Unfortunately, it’s a global setting, so you can’t change it for each call to
render. You can always explicitly render a template if you need the default behavior:
<%= render partial: 'admin/users/user', collection: @users %>
If you’re using FactoryGirl for testing, you’ll need to tell her the name of the factory to use for your namespaced associations:
# spec/factories/dashboard/widget.rb FactoryGirl.define do factory :dashboard_widget, class: Dashboard::Widget do association :row, factory: :dashboard_row end end
After many years of cramming everything directly into
app/models, I’ve had great luck with namespacing. It keeps related concepts together and provides a nice level of isolation without the overhead of full-fledged Engines or microservices.
I hope your experience is as pleasant as mine!
Here’s an embarrassing story.
I used to host a friend’s WordPress blog on a subdomain of bclennox.com, since I was already paying for a shared host at Site5 and he didn’t care about the domain name. Site5 recently warned me that the WordPress installation was outdated, so I asked him if he still needed it at all, since he hadn’t written anything in a few years1. He said he didn’t need it, so he’d export his data and I could shut down the subdomain.
But then he wrote back and said he couldn’t get to WordPress. Actually, he couldn’t get to anything on bclennox.com at all.
Actually, my DNS records were wrong. I don’t remember changing them, and I don’t understand why they had the values that they did, but most offensively, I don’t even know how long my site was unreachable. Days? Months? No clue. I fixed the DNS entries, and a little while later, my domain was resolving again…
…to no avail, because the Rails app wouldn’t boot.
As a shared hosting environment, the default Ruby version at Site5 was 1.8.7 for an absurdly long time. At some point in the past couple of years, they started offering Ruby 2.0, but only through some arcane configuration in your
RUBY_HOME or something. I clearly don’t remember how it works — hence the unbootable app — and their documentation server was down when I tried searching for clues.
I’ve been standing up Linux servers quite a lot over the past few years. Bringing up a Rails stack to host my tiny, untrafficked blog was not a daunting task to me. Inertia (and advance payment) kept me at Site5, but this was the final kick the ass I needed to move my site to a proper hosting environment.
To VPS, and Beyond!
After putting the baby to sleep last Tuesday, and before bedtime, I:
- signed up for a Linode account,
- installed CentOS 7.2, nginx/passenger, rbenv, Postgres, and monit,
- migrated my existing SQLite database2 to Postgres,
- deployed the code, and
- updated my DNS records to point to Linode.
Literally an evening’s worth of work, without reading a word of Linode’s documentation, and I was on a stable, secure, up-to-date environment that I control from top to (almost) bottom. I should’ve done this years ago!
The next night, I moved over a couple of other services I used to host at Site5. Now I wish I had the time to work on some more side projects, but I guess I can wait until Rowan is old enough to want to work on them with me.