Skip to content

CLOCKSS

Vincent W.J. van Gerven Oei edited this page Apr 2, 2021 · 3 revisions

Link:

https://clockss.org/

Summary:

CLOCKSS provides a sustainable dark archive to ensure the long-term survival of web-based scholarly content. CLOCKSS (Controlled LOCKSS) employs a unique approach to archiving (LOCKSS - Lots of Copies Keep Stuff Safe) that was initiated by Stanford University librarians in 1999. Digital content is stored in the CLOCKSS archive with no user access unless a “trigger” event occurs. The LOCKSS technology regularly checks the validity of the stored data and preserves it for the long term. CLOCKSS operates 12 archive nodes at institutions worldwide, preserving 200,000 book titles and a growing collection of supplementary materials and metadata information. As of March 2020, 64 titles have been triggered and made available open access. CLOCKSS participants include 300 libraries and 286 publishers.

Format types:

Deliveries should include: both content and all related metadata materials as discrete file data: in a single directory, or in a directory tree, or packaged in a non-proprietary package format (e.g., TAR, ZIP). consistent metadata to content relationships: one metadata file per content file (1:1) or one metadata file per multiple content files (1:M), with metadata file names that include a timestamp or other unique identifier. Final content, no pre-publication content, non-proprietary formats only (e.g., PDF, HTML), a maximum of one non-text format for each item (e.g., PDF, EPUB, MOBI). Metadata text formats (e.g., XML, RIS), standard metadata schemas preferred (e.g., JATS, ONIX, PubMed, Crossref).

Third-party content support:

In addition to the file types above, CLOCKSS also offers web harvesting. Features: To allow CLOCKSS access to the publisher’s source files, the publisher needs to place them on a designated FTP site. CLOCKSS boxes located at Rice, Indiana, and Stanford Universities ingest the content the publisher made available. The content is preserved through a system of audit and repair. The CLOCKSS boxes continually communicate over the internet to audit the content they are preserving. If the content in one CLOCKSS box is damaged or incomplete, that CLOCKSS box will receive repairs of the content based on other CLOCKSS boxes’ holdings and/or by referring to the publisher’s original presentation files. This cooperation between the CLOCKSS boxes avoids the need to back them up individually. It also provides unambiguous reassurance that the system is performing its function and that the correct content is always available.

Costs:

Supporting library fees start at $485 per year for a library with a materials budget under $1 million.

Guidelines

Clone this wiki locally