Skip to content

Conversation

richardTowers
Copy link
Contributor

This is an illustration of an approach to link expansion using PostgreSQL WITH RECURSIVE Queries.

This is similar to the approach taken in https://github.com/alphagov/govuk-graphql, but uses plain old ActiveRecord instead of introducing the Sequel ORM.

Potentially, this approach could be used as an alternative to the current approach that uses LinkGraph, NodeCollectionFactory, LinkReference and friends. It could also be an alternative to Dataloader for our main GraphQL implementation.

The general idea is to start with an edition (the BaseEdition) and a JSON array describing the link types to follow (the LinkExpansionRules).

We can then find linked editions one of four ways - forward link set links, forward edition links, reverse link set links, or reverse edition links. Each of these has its own query, and the four are joined together with a UNION ALL (ActiveRecord does this if you pass an array to .with(), like we're doing with .with(..., all_links: [...])).

The newly found editions are joined with the next level of link types in the lookahead, and the query recurses until we reach the leaves of the JSON structure, or until we can't find any linked editions for a particular branch.

There are two workarounds in this code which feel a bit ugly to me, but otherwise I actually think it's not too bad (particularly when compared to the existing link expansion code).

Workaround 1: we can't call jsonb_to_recordset() directly, sadly, because the query planner assumes it will always return 100 rows, which leads to terrible query plans. To avoid this, we have to wrap it in a subquery and limit the result to max_links_count. Ugly, but not show stoppingly bad.

Workaround 2: The recursive case uses a second .with() clause for a few reasons (which I'll skip here, for brevity's sake). Unfortunately ActiveRecord will attempt to pass this to UNION ALL without wrapping it in parens, which the postgres parser is not happy about. To force it to be wrapped with parens, I've wrapped it in Arel.sql(recursive_case.to_sql). Ugly, but not show stoppingly bad.

There's a bit more work we'd need to do to actually use this. It only returns edition ids / content ids, so we'd need to look up the actual editions. We'd also need to handle cases like "there's both a draft and a published edition for this content id" and "there's both an english and welsh document for this content id" and "there are both edition and link set links for this link type / edition / content_id combination". I also haven't handled withdrawn editions here.

The big potential motivation for this would be that this recursive CTE is faster than the Dataloader approach, which might not be true. It is very fast though. Loading the 843 linked edition ids we need to render the ministers index page takes 12ms on my laptop.

FROM linked_editions
CROSS JOIN LATERAL (
SELECT * from jsonb_to_recordset(linked_editions.links) AS lookahead(type varchar, reverse boolean, links jsonb)
LIMIT #{max_links_count(@links)}

Check warning

Code scanning / Brakeman

Possible SQL injection. Warning

Possible SQL injection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant