You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've run a few test on this locally over the years, resulting in some pretty great outcomes. I'll start with a few statements and work from there:
It is possible to use cryptographic hashes to represent URLs.
The Blake2 algorithm and Kangaroo12 algorithm support variable length outputs depending on the desired collision resistance.
The JSON-LD Context specifies the context to use when interpreting the semantics of a document, and JSON-LD Contexts are expressed as URLs, as are terms.
It's possible to use integers as CBOR keys and values.
It is possible to create a 16-bit lookup table that would store all well known JSON-LD contexts that are associated with standards
What this means is that we can:
In certain cases, we can compress all JSON-LD Contexts used down to a variable length cryptographic hash... that is, down to a few bytes, and use that as a "base URL" for all terms used in a CBOR-LD document.
In certain cases, we can compress all expanded terms and RDF Class URLs used in a document down to a few bytes using the same algorithm as in the previous step, but this time, utilizing fewer bytes because the use of the JSON-LD Context cryptographic hash gives us a global identification mechanism. That is, we can compress URLs to smaller than we would normally because we have a JSON-LD Context definition hash at the start of a CBOR payload.
We can tag these documents as "compressed CBOR-LD" documents.
If we do all of those things, in certain cases, we get:
single byte to sub-byte values for terms and classes in a CBOR-LD document
global uniqueness (read: excellent collision resistance) for all terms in a CBOR-LD document while not sacrificing storage size
An efficient, semantically meaningful normalization mechanism that depends on byte compares (similar to JCS, but w/o having to do tons of string comparisons) -- we could replace RDF Dataset Normalization in certain scenarios.
An efficient, semantically meaningful binary template format.
In short, we could achieve compression rates up to 75% for small documents.
The text was updated successfully, but these errors were encountered:
I've run a few test on this locally over the years, resulting in some pretty great outcomes. I'll start with a few statements and work from there:
What this means is that we can:
If we do all of those things, in certain cases, we get:
In short, we could achieve compression rates up to 75% for small documents.
The text was updated successfully, but these errors were encountered: