Skip to content

Conversation

yndajas
Copy link
Member

@yndajas yndajas commented Sep 10, 2025

Unfortunately the compacting we introduced via the GraphqlContentItemService results in content items that are invalid against the schema in at least some cases. This removes the compacting and adds a note about the limitations of the live content validation test

Closes #3587 (which went further down the line of removing empty fields)

@yndajas yndajas force-pushed the dont-compact-graphql-content-items branch 2 times, most recently from 874f1cb to f643a49 Compare September 10, 2025 13:07
This was the original motivation for creating this service, but we've
since found that it results in a content item that's not valid against
the schema for at least some content types

For instance, the world index requires a description. Its actual
description is nil, but to be valid against the schema we still need to
include the property. This service was removing the property altogether
because of the top-level compacting

In practice the frontend apps are unlikely to be relying on the
existence of keys whose values are nil, but we think that producing a
schema-valid response is preferable to a minimal one, barring any
performance issues

This issue was surfaced when trying to expand the compacting to links
and all nested objects. This led to more validation errors before closer
inspection revealed the issue with the existing processing

NOTE: this is a red commit. The live_content_spec won't pass with these
changes. Another change to this file is necessary to produce a valid
content item for news articles based on their current query - it feels
preferable to break that into a separate commit and update the
tests after

```rb
def edition
 deep_transform_hash(query_result.dig("data", "edition"))
end

def deep_transform_hash(hash)
 hash.map { deep_transform(_1.to_s, _2) }.compact.to_h
end

def deep_transform(key, value)
 case [key, value]
 in [String, Hash]
   new_value = deep_transform_hash(value)
   [key, new_value] unless new_value == {}
 in [String, [Hash, *]]
   new_value = value.map(&method(:deep_transform_hash))
   [key, new_value] unless new_value == []
 in [String, nil | "" | [] | {}]
   nil
 else
   [key, value]
 end
end
```
@yndajas yndajas force-pushed the dont-compact-graphql-content-items branch from f643a49 to 5dbfc92 Compare September 10, 2025 13:10
This didn't surface validation issues introduced when adding in some
compacting via the GraphqlContentItemService. On the world index,
For example:
- the schema requires a description property even if it's nil
- the service would remove the property from the real content item
  because the description is nil
- the test would include a description via the factory-generated
  edition, so we weren't testing the behaviour when the description is
  nil

I tried using `GovukSchemas::RandomSchemaGenerator` to generate a more
realistic content item, but it's only vaguely realistic. I tried adding
a new strategy to the generator to produce a minimal valid content item,
but it required significant processing to be in the right shape to pass
into a FactoryBot `create` method. I tried PUTting it, but that requires
some stubbing

We still think this provides some value insofar as telling us if we
produce all the expected fields when they are present in the edition,
but we could introduce some other kind of processing in the future with
which this might not catch issues, hence the note

A more thorough approach when introducing new content types is to
validate against a replicated database (real data) using
`script/live_content/validate` (a bulk version is incoming)
This mimics Content Store behaviour and addresses a schema validation
issue. When validating against the schema:
- the property being absent is fine (which is what happened when
  compacting the top-level fields) - presumably it's not required
- the property being an empty hash is fine, since neither of its
  properties are required
- the property being nil is not allowed

We could leave the property out when not selected by a query and
therefore not present in the result, but mimicking Content Store's
behaviour seems preferable and doing so keeps the logic simple

We could update the `edition_type` to return an empty hash when there's
no withdrawn_notice instead of nil, but then when field selections are
made on the withdrawn_notice they get set to nil, and you just end up
with a more complex invalid object (withdrawn_notice's fields can't be
nil)

This was surfaced by news articles in the live_content_spec - news
articles are the only content type whose query currently selects
withdrawn_notice
Removing the compacting has surfaced some more requirements for the
input in the live_content_spec. All of these should be required in order
to publish an edition of the given type, so hopefully we aren't just
filtering out realistic scenarios...

... but this is starting to require quite a bit of test setup overhead
@yndajas yndajas force-pushed the dont-compact-graphql-content-items branch from 5dbfc92 to 40f8ce8 Compare September 10, 2025 13:12
@yndajas yndajas changed the title [PP-7327] Dont compact graphql content items [PP-7327] Dont compact GraphQL content items Sep 10, 2025
@yndajas yndajas changed the title [PP-7327] Dont compact GraphQL content items [PP-7327] Don't compact GraphQL content items Sep 10, 2025
Comment on lines -3 to +16
news_article: { details: { body: "" } },
ministers_index: {
details: { body: "", reshuffle: {} },
},
news_article: {
details: {
body: "",
change_history: [],
emphasised_organisations: [],
first_public_at: "2016-04-11T10:52:00.000+01:00",
image: { url: "" },
political: false,
},
},
role: { details: { body: "", supports_historical_accounts: false } },
Copy link
Member Author

@yndajas yndajas Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: the approach I've taken to extending this is to check the validation errors and add the data into the edition we create. However, we get some of the same validation errors with real data (checked using script/live_content/bulk_validate), so we shouldn't assume these are all/necessarily issues with the shape of the test data 🤔

Interestingly, Content Store responses aren't necessarily valid: there's an error when validating https://www.gov.uk/api/content/government/news/foreign-secretary-welcomes-second-ipcc-report-on-climate-change. But if we do want to aim for fully valid content items, we should probably be relying more on the bulk validation script. I'm losing faith in this spec being useful given the amount of setup combined with its limitations

The errors reported by RSpec are truncated by default. It's useful to
see all of them when there are any so you don't need to rerun the tests
after addressing each one
@yndajas yndajas force-pushed the dont-compact-graphql-content-items branch from 40f8ce8 to bb5e9df Compare September 10, 2025 13:31
@yndajas yndajas marked this pull request as draft September 11, 2025 14:12
@yndajas
Copy link
Member Author

yndajas commented Sep 11, 2025

We probably won't do this since it just leads to other kinds of validation errors. Per the schemas, some fields are required and can be nil, some are optional but can't be nil, some are required and can't be nil, and neither the current nor this approach address all scenarios. We have a plan to do an 'intelligent' schema-aware compact instead

@yndajas
Copy link
Member Author

yndajas commented Sep 20, 2025

Some of these commits were merged separately. The main purpose of this PR has been fulfilled by the new schema-aware compaction

@yndajas yndajas closed this Sep 20, 2025
@yndajas yndajas deleted the dont-compact-graphql-content-items branch September 20, 2025 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants