Skip to content

Conversation

eilmiv
Copy link
Contributor

@eilmiv eilmiv commented Sep 9, 2025

Summary of changes

  • Implement OAI-PMH 2.0 endpoint under /oai-pmh that allows harvesting of visible training materials in Dublin Core and RDF (Bioschemas) metadata standard
  • The OAI-PMH XML is visualized in the browser using XSLT to get an interactive clickable UI (which is commonly used with OAI-PMH)

Motivation and context

This is a relevant step in the mTeSS-X project.

Screenshots

image

Checklist

  • I have read and followed the CONTRIBUTING guide.
  • I confirm that I have the authority necessary to make this contribution on behalf of its copyright owner and agree
    to license it to the TeSS codebase under the
    BSD license.

@eilmiv
Copy link
Contributor Author

eilmiv commented Sep 9, 2025

Remaining TODOs for this pull request:

  • Remove accidentally committed PaN-Services code
  • Improve comments and layout of Dublin Core transformation code
  • Add note about modifications made to oai2xhtml.xsl

@eilmiv eilmiv marked this pull request as ready for review September 10, 2025 08:24
@eilmiv eilmiv marked this pull request as draft September 10, 2025 13:40
@eilmiv eilmiv marked this pull request as ready for review September 10, 2025 14:15
Comment on lines 21 to 28
class PublicMaterial < Material
default_scope { where(visible: true) }

# Pretend to be a regular Material (for URLs in RDF serialization)
def self.model_name
Material.model_name
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, we can pass a scoped relation to OAI::Provider::ActiveRecordWrapper.new instead of needing to make a new class:
https://github.com/code4lib/ruby-oai/blob/master/lib/oai/provider.rb#L249-L253

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Changed in 7c2a9e7.

Comment on lines 12 to 19
class OAIRDF < OAI::Provider::Metadata::Format
def initialize
@prefix = 'rdf'
@schema = 'http://www.openarchives.org/OAI/2.0/rdf.xsd'
@namespace = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
@element_namespace = 'rdf'
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put all classes into their own files. We can have lib/oai/... or something for all the OAI-PMH related code.

See this https://api.rubyonrails.org/classes/ActiveSupport/Inflector/Inflections.html

if you are having issues with it not being able to find classes/modules due to them being called OAI and not Oai

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to initializers in a71a345.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good tests, but might be nice to use an XML parser to parse and check the responses.

We already have a gem (nokogiri) that can do this, see an example here in another repo for the (slightly awkward) syntax:
https://github.com/seek4science/seek/blob/ee17e7b6caa0733be1f922ec7eb98c78379ee67c/test/unit/datacite_metadata_test.rb#L90-L96

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsed everything using nokogiri in 323abaf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only the oai gem is needed here. Those other gems are already included via the linkeddata gem we are using.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed other gems in c757a91.

Comment on lines 206 to 250
# Dublin Core mappings for OAI-PMH
# no mapping needed for contributor, description and title
# coverage and source not mappable
alias_attribute :creators, :authors

def dates
[date_published, date_created, date_modified].compact.map(&:iso8601)
end

def format = 'text/html'

def identifier
if !doi.nil? && !doi.empty?
doi_iri = doi.start_with?('http://', 'https://') ? doi : "https://doi.org/#{doi}"
else
url
end
end

def language = 'en'

def publishers
if content_provider
[content_provider.title]
else
[]
end
end

# currently only url of tess resource, content provider url
def relations
[
"#{TeSS::Config.base_url}#{Rails.application.routes.url_helpers.material_path(self)}"
] + (content_provider ? [content_provider.url] : [])
end

alias_attribute :rights, :licence

def subjects
keywords + scientific_topics.map(&:uri) + operations.map(&:uri)
end

def types
['http://purl.org/dc/dcmitype/Text', 'https://schema.org/LearningResource'] + resource_type
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner if we could move these methods into another class if that's possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to one method in 8b54db2.

@eilmiv eilmiv marked this pull request as draft September 12, 2025 07:12
@@ -0,0 +1,21 @@
class OaiController < ApplicationController
# This view only returns static public content and CSRF token authentication causes problems with OAI-PMH POST requests
skip_before_action :verify_authenticity_token

Check failure

Code scanning / CodeQL

CSRF protection weakened or disabled High

Potential CSRF vulnerability due to forgery protection being disabled or weakened.
@eilmiv eilmiv marked this pull request as ready for review September 30, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants