This site provides the following access keys:

Brandan Lennox's

Searching JSON Columns in Postgres with Ransack

(Somewhat related to my previous post. This came out of the same feature request.)

I hadn’t explored Postgres’s JSON support until recently, and I was, as usual, impressed.

We have an STI model (call it Bucket) in our application with a polymorphic association (call it contents). In the UI, we have an index page that lists Buckets by type with some Bucket-generic information, like volume and material. The new request was to add to that index page data about the contents of the buckets, and to make it searchable and sortable and whatnot. The contents of the bucket can be basically anything, though constant for a given Bucket type, so the UI needed to be flexible enough to render arbitrary columns of data in table and provide some way of searching them. We already use Ransack around the app, so that seemed like the way to go.

Data Model

Here’s some pseudocode modeling what I described above:

class Bucket < ApplicationRecord
  belongs_to :contents, polymorphic: true

  # attribute :type
  # attribute :contents_type
  # attribute :contents_id

  # attribute :volume
  # attribute :material
end

class SolidsBucket < Bucket; end
class LiquidsBucket < Bucket; end

class Solid < ApplicationRecord
  # attribute :name
  # attribute :unique_key

  # attribute :density
  # attribute :hardness
  # attribute :classification
end

class Liquid < ApplicationRecord
  # attribute :name
  # attribute :unique_key

  # attribute :temperature
  # attribute :fluidity
end

When we render the index page for Buckets of type SolidsBucket, we want to include volume and material of the Bucket and the density and hardness of the Solids that are inside of it.

We have a lot of Buckets, and they reference a lot of contents. We already had a materialized view representing union of all different types of contents, so I first tried adding a new JSON column to that matview.1

Here’s what that view looked like before:

SELECT
  s.id,
  s.name,
  s.unique_key,
  'Solid' AS contents_type
FROM solids

UNION ALL

SELECT
  l.id,
  l.name,
  l.unique_key,
  'Liquid' AS contents_type
FROM liquids

-- more contents types...

I added a new line to each SELECT statement in the view definition:

-- e.g., SELECT ... FROM solids
jsonb_build_object(
  'density', s.density,
  'hardness', s.hardness,
  'classification', s.classification
) AS metadata

This uses Postgres’s jsonb_build_object function to build a JSON object of the “metadata” attributes that I want. Since each type of contents is already represented as a separate SELECT statement in the UNION, each call to jsonb_build_object can also return a differently structured JSON object.2

To make it fast to search, I also built an index on that column:

CREATE INDEX index_contents_on_metadata ON contents USING GIN (metadata);

This technique is certainly not limited to materialized views. It should work on any relation with a JSON column, even if it’s not indexed.

Custom Ransacker

It was straightforward to query the new metadata column directly from SQL using the ->> operator:3

SELECT cast(metadata ->> 'density') AS float FROM contents WHERE contents_type = 'Solid';

But getting Ransack to cooperate was less straightforward. Ransack does support custom "Ransackers", which let you return any Arel node as the “left-hand side” of a predicate. This is quite powerful, especially since the ransacker can take arguments, but the tricky part was getting the arguments from a web form back to the Ransacker (see below).

Here’s the Ransacker, in essence:

ransacker :contents_metadata, args: [:parent, :ransacker_args] do |parent, (attribute, type)|
  Arel::Nodes::NamedFunction('cast', [
    Arel::Nodes::InfixOperation.new('->>', parent.table[:metadata], Arel::Nodes.build_quoted(attribute)).as(type)
  ])
end

This allowed me to send Ransack a structure like this:

contents.ransack(
  contents_type_eq: 'Solid',
  c: [{
    a: {
      '0' => {
        name: 'contents_metadata',
        ransacker_args: ['density', 'float']
      }
    },
    p: 'gt',
    v: ['23.2']
  }],
  s: {
    '0' => { name: 'contents_metadata', dir: 'desc', ransacker_args: ['hardness', 'float'] },
    '1' => 'sortable_contents_metadata_(hardness)_(float) desc'
  }
)

to search for Solids with a density greater than 23.2 and sort by hardness descending.

I should explain that sorting. Since I need the attribute name and type to pass to the Ransacker, and since the sort attribute is sent via query params, I decided to embed the name and type in the query param itself. This required me to munge together a “fake” name for the sort query param, then de-munge it in the controller, while still leaving the original munged name in the sort params so that Ransack would correctly render the link.

Yes, it’s gross. I buried most of the code for integrating with Ransack in a couple of well-tested modules, so it’s at least contained.

Managing Attributes

This is quite specific to our app, but may still be useful to you. Since the metadata attributes themselves had their own metadata (type, valid values, etc.), I wrote a DSL to allow each Bucket type to declare its supported attributes:

module SolidsBucketMetadata
  include Attribute::DSL

  attribute :density, type: :float
  attribute :hardness, type: :float, format: :percentage
  attribute :classification, type: :text, values: %w(Wood Metal Glass Plastic)
end

This additional metadata was used in various other places, but I reference type throughout this article, so I wanted to show where it was declared.

Form Fields

I won’t go into much detail here, except to show how to format HTML input names so that they end up in the right Ransack structure.

As an example, this is how I render the search field for the first searchable column in the table:

<input type="hidden" name="q[c][0][a][0][name]" value="contents_metadata">
<input type="hidden" name="q[c][0][a][0][ransacker_args][]" value="density">
<input type="hidden" name="q[c][0][a][0][ransacker_args][]" value="float">
<input type="hidden" name="q[c][0][p]" value="gt">
<input type="search" name="q[c][0][v][]">

For subsequent fields, change q[c][0] to q[c][1], etc.

As I mentioned above, I had to use a custom sort key in order to make sorting work, but I still used Ransack’s sort_link helper to render the actual links.

Conclusion

It took a lot of digging through Ransack’s source to figure out exactly how to glue all this together, but it’s flexible enough to support everything that Ransack and Arel support, even within a JSON structure inside of a single column. It’s a trivial amount of code to add this metadata to any type of contents, and you get a bunch of stuff for free. Pretty frickin’ sweet.

Footnotes

  1. Spoiler: it worked, so it was also the last thing I tried.
  2. We also had contents with no interesting metadata, in which case I returned null::jsonb instead of calling jsonb_build_object.
  3. One gotcha with the ->> operator: it always returns text, even if the value at that key has a proper JSON type (numeric, boolean, etc.). Hence the cast.