Searching JSON Columns in Postgres with Ransack
(Somewhat related to my previous post. This came out of the same feature request.)
I hadn’t explored Postgres’s JSON support until recently, and I was, as usual, impressed.
We have an STI model (call it Bucket
) in our application with a polymorphic association (call it contents
). In the UI, we have an index page that lists Buckets by type with some Bucket-generic information, like volume and material. The new request was to add to that index page data about the contents of the buckets, and to make it searchable and sortable and whatnot. The contents of the bucket can be basically anything, though constant for a given Bucket type, so the UI needed to be flexible enough to render arbitrary columns of data in table and provide some way of searching them. We already use Ransack around the app, so that seemed like the way to go.
Data Model
Here’s some pseudocode modeling what I described above:
class Bucket < ApplicationRecord
belongs_to :contents, polymorphic: true
# attribute :type
# attribute :contents_type
# attribute :contents_id
# attribute :volume
# attribute :material
end
class SolidsBucket < Bucket; end
class LiquidsBucket < Bucket; end
class Solid < ApplicationRecord
# attribute :name
# attribute :unique_key
# attribute :density
# attribute :hardness
# attribute :classification
end
class Liquid < ApplicationRecord
# attribute :name
# attribute :unique_key
# attribute :temperature
# attribute :fluidity
end
When we render the index page for Buckets of type SolidsBucket, we want to include volume and material of the Bucket and the density and hardness of the Solids that are inside of it.
We have a lot of Buckets, and they reference a lot of contents. We already had a materialized view representing union of all different types of contents, so I first tried adding a new JSON column to that matview.1
Here’s what that view looked like before:
SELECT
s.id,
s.name,
s.unique_key,
'Solid' AS contents_type
FROM solids
UNION ALL
SELECT
l.id,
l.name,
l.unique_key,
'Liquid' AS contents_type
FROM liquids
-- more contents types...
I added a new line to each SELECT
statement in the view definition:
-- e.g., SELECT ... FROM solids
jsonb_build_object(
'density', s.density,
'hardness', s.hardness,
'classification', s.classification
) AS metadata
This uses Postgres’s jsonb_build_object
function to build a JSON object of the “metadata” attributes that I want. Since each type of contents is already represented as a separate SELECT
statement in the UNION
, each call to jsonb_build_object
can also return a differently structured JSON object.2
To make it fast to search, I also built an index on that column:
CREATE INDEX index_contents_on_metadata ON contents USING GIN (metadata);
This technique is certainly not limited to materialized views. It should work on any relation with a JSON column, even if it’s not indexed.
Custom Ransacker
It was straightforward to query the new metadata column directly from SQL using the ->>
operator:3
SELECT cast(metadata ->> 'density') AS float FROM contents WHERE contents_type = 'Solid';
But getting Ransack to cooperate was less straightforward. Ransack does support custom "Ransackers", which let you return any Arel node as the “left-hand side” of a predicate. This is quite powerful, especially since the ransacker can take arguments, but the tricky part was getting the arguments from a web form back to the Ransacker (see below).
Here’s the Ransacker, in essence:
ransacker :contents_metadata, args: [:parent, :ransacker_args] do |parent, (attribute, type)|
Arel::Nodes::NamedFunction('cast', [
Arel::Nodes::InfixOperation.new('->>', parent.table[:metadata], Arel::Nodes.build_quoted(attribute)).as(type)
])
end
This allowed me to send Ransack a structure like this:
contents.ransack(
contents_type_eq: 'Solid',
c: [{
a: {
'0' => {
name: 'contents_metadata',
ransacker_args: ['density', 'float']
}
},
p: 'gt',
v: ['23.2']
}],
s: {
'0' => { name: 'contents_metadata', dir: 'desc', ransacker_args: ['hardness', 'float'] },
'1' => 'sortable_contents_metadata_(hardness)_(float) desc'
}
)
to search for Solids with a density greater than 23.2 and sort by hardness descending.
I should explain that sorting. Since I need the attribute name and type to pass to the Ransacker, and since the sort attribute is sent via query params, I decided to embed the name and type in the query param itself. This required me to munge together a “fake” name for the sort query param, then de-munge it in the controller, while still leaving the original munged name in the sort params so that Ransack would correctly render the link.
Yes, it’s gross. I buried most of the code for integrating with Ransack in a couple of well-tested modules, so it’s at least contained.
Managing Attributes
This is quite specific to our app, but may still be useful to you. Since the metadata attributes themselves had their own metadata (type, valid values, etc.), I wrote a DSL to allow each Bucket type to declare its supported attributes:
module SolidsBucketMetadata
include Attribute::DSL
attribute :density, type: :float
attribute :hardness, type: :float, format: :percentage
attribute :classification, type: :text, values: %w(Wood Metal Glass Plastic)
end
This additional metadata was used in various other places, but I reference type
throughout this article, so I wanted to show where it was declared.
Form Fields
I won’t go into much detail here, except to show how to format HTML input names so that they end up in the right Ransack structure.
As an example, this is how I render the search field for the first searchable column in the table:
<input type="hidden" name="q[c][0][a][0][name]" value="contents_metadata">
<input type="hidden" name="q[c][0][a][0][ransacker_args][]" value="density">
<input type="hidden" name="q[c][0][a][0][ransacker_args][]" value="float">
<input type="hidden" name="q[c][0][p]" value="gt">
<input type="search" name="q[c][0][v][]">
For subsequent fields, change q[c][0]
to q[c][1]
, etc.
As I mentioned above, I had to use a custom sort key in order to make sorting work, but I still used Ransack’s sort_link
helper to render the actual links.
Conclusion
It took a lot of digging through Ransack’s source to figure out exactly how to glue all this together, but it’s flexible enough to support everything that Ransack and Arel support, even within a JSON structure inside of a single column. It’s a trivial amount of code to add this metadata to any type of contents, and you get a bunch of stuff for free. Pretty frickin’ sweet.