-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
I've run a few test on this locally over the years, resulting in some pretty great outcomes. I'll start with a few statements and work from there:
- It is possible to use cryptographic hashes to represent URLs.
- The Blake2 algorithm and Kangaroo12 algorithm support variable length outputs depending on the desired collision resistance.
- The JSON-LD Context specifies the context to use when interpreting the semantics of a document, and JSON-LD Contexts are expressed as URLs, as are terms.
- It's possible to use integers as CBOR keys and values.
- It is possible to create a 16-bit lookup table that would store all well known JSON-LD contexts that are associated with standards
What this means is that we can:
- In certain cases, we can compress all JSON-LD Contexts used down to a variable length cryptographic hash... that is, down to a few bytes, and use that as a "base URL" for all terms used in a CBOR-LD document.
- In certain cases, we can compress all expanded terms and RDF Class URLs used in a document down to a few bytes using the same algorithm as in the previous step, but this time, utilizing fewer bytes because the use of the JSON-LD Context cryptographic hash gives us a global identification mechanism. That is, we can compress URLs to smaller than we would normally because we have a JSON-LD Context definition hash at the start of a CBOR payload.
- We can tag these documents as "compressed CBOR-LD" documents.
If we do all of those things, in certain cases, we get:
- single byte to sub-byte values for terms and classes in a CBOR-LD document
- global uniqueness (read: excellent collision resistance) for all terms in a CBOR-LD document while not sacrificing storage size
- An efficient, semantically meaningful normalization mechanism that depends on byte compares (similar to JCS, but w/o having to do tons of string comparisons) -- we could replace RDF Dataset Normalization in certain scenarios.
- An efficient, semantically meaningful binary template format.
In short, we could achieve compression rates up to 75% for small documents.
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Non TR Work