This site provides the following access keys:

Brandan Lennox's

Recent Articles

Searching JSON Columns in Postgres with Ransack

(Somewhat related to my previous post. This came out of the same feature request.)

I hadn’t explored Postgres’s JSON support until recently, and I was, as usual, impressed.

We have an STI model (call it Bucket) in our application with a polymorphic association (call it contents). In the UI, we have an index page that lists Buckets by type with some Bucket-generic information, like volume and material. The new request was to add to that index page data about the contents of the buckets, and to make it searchable and sortable and whatnot. The contents of the bucket can be basically anything, though constant for a given Bucket type, so the UI needed to be flexible enough to render arbitrary columns of data in table and provide some way of searching them. We already use Ransack around the app, so that seemed like the way to go.

Data Model

Here’s some pseudocode modeling what I described above:

class Bucket < ApplicationRecord
  belongs_to :contents, polymorphic: true

  # attribute :type
  # attribute :contents_type
  # attribute :contents_id

  # attribute :volume
  # attribute :material
end

class SolidsBucket < Bucket; end
class LiquidsBucket < Bucket; end

class Solid < ApplicationRecord
  # attribute :name
  # attribute :unique_key

  # attribute :density
  # attribute :hardness
  # attribute :classification
end

class Liquid < ApplicationRecord
  # attribute :name
  # attribute :unique_key

  # attribute :temperature
  # attribute :fluidity
end

When we render the index page for Buckets of type SolidsBucket, we want to include volume and material of the Bucket and the density and hardness of the Solids that are inside of it.

We have a lot of Buckets, and they reference a lot of contents. We already had a materialized view representing union of all different types of contents, so I first tried adding a new JSON column to that matview.1

Here’s what that view looked like before:

SELECT
  s.id,
  s.name,
  s.unique_key,
  'Solid' AS contents_type
FROM solids

UNION ALL

SELECT
  l.id,
  l.name,
  l.unique_key,
  'Liquid' AS contents_type
FROM liquids

-- more contents types...

I added a new line to each SELECT statement in the view definition:

-- e.g., SELECT ... FROM solids
jsonb_build_object(
  'density', s.density,
  'hardness', s.hardness,
  'classification', s.classification
) AS metadata

This uses Postgres’s jsonb_build_object function to build a JSON object of the “metadata” attributes that I want. Since each type of contents is already represented as a separate SELECT statement in the UNION, each call to jsonb_build_object can also return a differently structured JSON object.2

To make it fast to search, I also built an index on that column:

CREATE INDEX index_contents_on_metadata ON contents USING GIN (metadata);

This technique is certainly not limited to materialized views. It should work on any relation with a JSON column, even if it’s not indexed.

Custom Ransacker

It was straightforward to query the new metadata column directly from SQL using the ->> operator:3

SELECT cast(metadata ->> 'density') AS float FROM contents WHERE contents_type = 'Solid';

But getting Ransack to cooperate was less straightforward. Ransack does support custom "Ransackers", which let you return any Arel node as the “left-hand side” of a predicate. This is quite powerful, especially since the ransacker can take arguments, but the tricky part was getting the arguments from a web form back to the Ransacker (see below).

Here’s the Ransacker, in essence:

ransacker :contents_metadata, args: [:parent, :ransacker_args] do |parent, (attribute, type)|
  Arel::Nodes::NamedFunction('cast', [
    Arel::Nodes::InfixOperation.new('->>', parent.table[:metadata], Arel::Nodes.build_quoted(attribute)).as(type)
  ])
end

This allowed me to send Ransack a structure like this:

contents.ransack(
  contents_type_eq: 'Solid',
  c: [{
    a: {
      '0' => {
        name: 'contents_metadata',
        ransacker_args: ['density', 'float']
      }
    },
    p: 'gt',
    v: ['23.2']
  }],
  s: {
    '0' => { name: 'contents_metadata', dir: 'desc', ransacker_args: ['hardness', 'float'] },
    '1' => 'sortable_contents_metadata_(hardness)_(float) desc'
  }
)

to search for Solids with a density greater than 23.2 and sort by hardness descending.

I should explain that sorting. Since I need the attribute name and type to pass to the Ransacker, and since the sort attribute is sent via query params, I decided to embed the name and type in the query param itself. This required me to munge together a “fake” name for the sort query param, then de-munge it in the controller, while still leaving the original munged name in the sort params so that Ransack would correctly render the link.

Yes, it’s gross. I buried most of the code for integrating with Ransack in a couple of well-tested modules, so it’s at least contained.

Managing Attributes

This is quite specific to our app, but may still be useful to you. Since the metadata attributes themselves had their own metadata (type, valid values, etc.), I wrote a DSL to allow each Bucket type to declare its supported attributes:

module SolidsBucketMetadata
  include Attribute::DSL

  attribute :density, type: :float
  attribute :hardness, type: :float, format: :percentage
  attribute :classification, type: :text, values: %w(Wood Metal Glass Plastic)
end

This additional metadata was used in various other places, but I reference type throughout this article, so I wanted to show where it was declared.

Form Fields

I won’t go into much detail here, except to show how to format HTML input names so that they end up in the right Ransack structure.

As an example, this is how I render the search field for the first searchable column in the table:

<input type="hidden" name="q[c][0][a][0][name]" value="contents_metadata">
<input type="hidden" name="q[c][0][a][0][ransacker_args][]" value="density">
<input type="hidden" name="q[c][0][a][0][ransacker_args][]" value="float">
<input type="hidden" name="q[c][0][p]" value="gt">
<input type="search" name="q[c][0][v][]">

For subsequent fields, change q[c][0] to q[c][1], etc.

As I mentioned above, I had to use a custom sort key in order to make sorting work, but I still used Ransack’s sort_link helper to render the actual links.

Conclusion

It took a lot of digging through Ransack’s source to figure out exactly how to glue all this together, but it’s flexible enough to support everything that Ransack and Arel support, even within a JSON structure inside of a single column. It’s a trivial amount of code to add this metadata to any type of contents, and you get a bunch of stuff for free. Pretty frickin’ sweet.

Footnotes

  1. Spoiler: it worked, so it was also the last thing I tried.
  2. We also had contents with no interesting metadata, in which case I returned null::jsonb instead of calling jsonb_build_object.
  3. One gotcha with the ->> operator: it always returns text, even if the value at that key has a proper JSON type (numeric, boolean, etc.). Hence the cast.

Refactoring with (Maybe) Null Object

I’m not sure exactly which design pattern this is, but I found it useful today.

I had a class that performed a query and returned the results in JSON format for a charting library. The query grouped by one column (column), calculated the average of another column ('metric'), counted the number of records in each group, and did a little formatting of the values before returning the JSON. This is an approximation of what it looked like (lots of other logic omitted for brevity):

ChartData = Struct.new(:scope, :column) do
  def as_json(options = nil)
    values = results.map do |record|
      { x: "#{record.name} (#{record.count})", y: record.score }
    end

    [{ key: column, values: values }]
  end
  
  private

  def results
    scope
      .group(group)
      .select(group.as('name'), count, score)
  end
  
  def group
    scope.arel_table[column]
  end

  def count
    Arel::Nodes::NamedFunction.new('count', [Arel.sql('*')]).as('count')
  end

  def score
    Arel::Nodes::NamedFunction.new('avg', ['metric']).as('score')
  end
end

This worked great. Pretty easy to test, and flexible enough to work with any table that had the appropriate columns.

Then we got a new requirement. We needed to be able to perform the “count” portion of the query on a table that didn’t have the appropriate columns (i.e., it lacked a “metric”), and use the count rather than the score as the Y value of the chart.

At first, it seemed like a matter of reformatting the JSON, but that wasn’t sufficient. The query itself had to change, and that was buried in a private method. It was a big enough change to be painful, but small enough that it didn’t feel like a whole new class.

I first tried passing a new include_scores boolean to the class:

ChartData = Struct.new(:scope, :column, :include_scores) do
  def as_json(options = nil)
    values = results.map do |record|
      if include_scores
        { x: "#{record.name} (#{record.count})", y: record.score }
      else
        { x: record.name, y: record.count }
      end
    end
    
    [{ key: column, values: values }]
  end
  
  private
  
  def results
    if include_scores
      scope
        .group(group)
        .select(group.as('name'), count, score)
    else
      scope
        .group(group)
        .select(group.as('name'), count)
    end
  end
  
  # ...

end

That worked, but it was ugly. The real query was more complex than this, and I didn’t like duplicating all that code. Likewise, the JSON wasn’t two slightly different representations of the same data. It was really two different datas being represented as JSON.

After some more refactoring, I ended up with this:

ChartData = Struct.new(:scope, :column) do
  def scores
    data_points_with_score(average_score) do |record|
      { x: "#{record.name} (#{record.count})", y: record.score }
    end
  end
  
  def counts
    data_points_with_score(null_score) do |record|
      { x: record.name, y: record.count }
    end
  end

  private
  
  def data_points_with_score(score)
    values = scope
      .group(group)
      .select(group.as('name'), count, score)
      .map do |record|
        yield record
      end
    
    [{ key: column, values: values }]
  end
  
  def average_score
    Arel::Nodes::NamedFunction.new('avg', ['metric']).as('score')
  end
  
  def null_score
    Arel.sql('NULL').as('score')
  end
  
  # ...
end

I replaced the single as_json method with two methods, one for each JSON format that I needed to support. I also replaced the single score method with average_score (the actual score calculation) and null_score (effectively a no-op). Since the caller was now responsible for doing the “switching” — i.e., either calling scores or counts — I could declare the scoring method and the JSON format in the same place, and nothing else in the class needed to know about it.

I can see an argument that this is still violating Single Responsibility Principle. Maybe if there were one more case to handle, I would split this into multiple classes. For now, this remains easy to test, and the exceptional behavior is grouped together, so I call it a win.

She’s got it figured out.

AVGN Goes Jazz

I just ran across something from the archives of my digital life:

Back in 2010, I recorded a cover of the Angry Video Game Nerd theme song in GarageBand, because that’s what people did way back then. I decided to go jazz, since I had never heard a jazz cover before, and Christmas was coming up, so it felt gentle and snowy to me:

Every note was painstakingly struck using the touchpad on my MacBook Pro (i.e., I’m not a real musician). I’m actually kind of proud of this. It’s the closest I’ve ever come to composing music.

But the timing! This was the year that James managed to get Bear McCreary to compose the music for the Christmas episode, which was a take on Seuss’s Grinch story. So needless to say, my cover…mmm, fell by the wayside.

Fun to go back and listen to this again. I wonder if I could learn some of the parts on an actual (musical) keyboard.

Nice to be home again. One of my favorite things about traveling is that it makes me appreciate home.

Got the new plants in the garden yesterday, just in time for 2” of rain last night! They all seemed happy this morning.

(Not pictured: blueberry bush to my left.)

Falcon

@smokey Apologies for the absolutely-not-a-bird-lens photo.

I’m always most impressed by @manton’s humility in spite of his prolific-ility. Shipping software is just pouring cereal for him, like it’s so simple and natural that he wouldn’t even think to be arrogant about it. Congrats on building another great thing, @manton!

Bubbles

No, it's cool! I can stop whenever I want! Seriously!

More articles

Featured Project

HSLider: A Color Widget for Mac OS X Dashboard

Targeted at developers using CSS, this Dashboard widget uses HSL sliders and plain ol’ text fields to tweak colors. Copy your new color as valid CSS (hex, RGB, or HSL) with a single click.

HSLider screenshot

More about HSLider