-
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add GitHub Actions workflow for Hugging Face Hub publishing and…
… create space configuration file for ISCC-LAB project
- Loading branch information
Showing
2 changed files
with
61 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
name: Publish on Hugging Face Hub | ||
on: | ||
push: | ||
branches: | ||
- huggingface | ||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Sync with Hugging Face | ||
uses: nateraw/huggingface-sync-action@v0.0.4 | ||
with: | ||
github_repo_id: iscc/iscc-sct | ||
huggingface_repo_id: iscc/iscc-sct | ||
repo_type: space | ||
space_sdk: gradio | ||
hf_token: ${{ secrets.HF_TOKEN }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
--- | ||
title: ISCC-LAB - Semantic-Code Text | ||
emoji: ▶️ | ||
colorFrom: red | ||
colorTo: blue | ||
sdk: gradio | ||
sdk_version: 4.41.0 | ||
app_file: iscc_sct/demo.py | ||
pinned: true | ||
license: CC-BY-NC-SA-4.0 | ||
short_description: Cross Lingual Similarity Preserving Text Simprints | ||
--- | ||
# ISCC-LAB - Semantic-Code Text | ||
|
||
`iscc-sct` is a **proof of concept implementation** of a semantic Text-Code for the | ||
[ISCC](https://core.iscc.codes) (*International Standard Content Code*). Semantic Text-Codes are | ||
short identifiers created from text documents that preserve similarity (in hamming distance) | ||
for semantically similar cross-lingual text inputs. | ||
|
||
## What is the ISCC | ||
|
||
The ISCC is a combination of various similarity preserving fingerprints and an identifier for | ||
digital media content. | ||
|
||
ISCCs are generated algorithmically from digital content, just like cryptographic hashes. However, | ||
instead of using a single cryptographic hash function to identify data only, the ISCC uses various | ||
algorithms to create a composite identifier that exhibits similarity-preserving properties (soft | ||
hash or Simprint). | ||
|
||
The component-based structure of the ISCC identifies content at multiple levels of abstraction. Each | ||
component is self-describing, modular, and can be used separately or with others to aid in various | ||
content identification tasks. The algorithmic design supports content deduplication, database | ||
synchronization, indexing, integrity verification, timestamping, versioning, data provenance, | ||
similarity clustering, anomaly detection, usage tracking, allocation of royalties, fact-checking and | ||
general digital asset management use-cases. | ||
|
||
|
||
## ISCC Status | ||
|
||
The [ISCC](https://iscc.codes) is an ISO Standrad published under | ||
[ISO 24138:2024](https://www.iso.org/standard/77899.html) - International Standard Content Code | ||
within [ISO/TC 46/SC 9/WG 18](https://www.iso.org/committee/48836.html). | ||
|
||
The algorithms of this `iscc-sct` are experimental and not (yet) part of the official standard. |